Match a String with Regular Expression in Bash

Arseni Kavalchuk - Oct 20 - - Dev Community

In Bash, matching strings against regular expressions (regex) is a common task for parsing and validating data. Bash offers multiple ways to perform regex matching, including using the grep command, and more importantly, the =~ test expression for conditional checks directly in Bash scripts. In this article, we'll explore different approaches to matching strings with regex in Bash, with a focus on the =~ operator.

Ways to Check a String Against a Regex in Bash

Bash provides several tools to check strings against regex patterns. Here are the most common methods:

Using the grep Command

The grep command is a powerful tool to search for patterns in files or standard input. It supports regex matching by default and is widely used for string searching.

echo "teststring" | grep -E "test.*"
Enter fullscreen mode Exit fullscreen mode

Using =~ Test Expression

Bash built-in conditional expressions allow regex matching using the =~ operator within the test [[ ]]. This approach is efficient for matching strings in scripts without relying on external commands like grep.

if [[ "teststring" =~ ^test ]]; then
  echo "Matches!"
else
  echo "No match!"
fi
Enter fullscreen mode Exit fullscreen mode

Using awk

Another way to match regex in bash is with the awk command, which has built-in regex support.

echo "teststring" | awk '/test/'
Enter fullscreen mode Exit fullscreen mode

In this article, we will focus on the =~ operator as it is an efficient and versatile tool for regex matching in Bash scripts.

What are Regular Expressions?

Regular Expressions (regex) are sequences of characters that form search patterns, typically used for pattern matching in strings. They allow users to match complex patterns in text, making them a powerful tool for data parsing, validation, and text processing. Regex is used in many programming languages and tools like Perl, Python, JavaScript, grep, and more.

Basic Symbols and Patterns

Regular expressions consist of literals and special characters (metacharacters) that define the search pattern. Some of the most commonly used regex symbols include:

  • Dot .: Matches any single character except a newline.
    • Example: a.b matches acb, a1b, etc.
  • Caret ^: Anchors the match at the start of a string.
    • Example: ^test matches strings starting with test.
  • Dollar $: Anchors the match at the end of a string.
    • Example: test$ matches strings ending with test.
  • Asterisk *: Matches zero or more occurrences of the preceding character or group.
    • Example: ab*c matches ac, abc, abbc, etc.
  • Plus +: Matches one or more occurrences of the preceding character or group.
    • Example: ab+c matches abc, abbc, but not ac.
  • Square Brackets ([]): Matches any one of the enclosed characters.
    • Example: [abc] matches a, b, or c.
  • Escape Sequence \: Escapes special characters, allowing them to be treated as literals.
    • Example: \.com matches .com instead of treating . as a wildcard.
  • Parentheses (): Groups multiple characters or expressions.
    • Example: (ab)+ matches ab, abab, etc.
  • Pipe |: Acts as a logical OR operator to match different alternatives.
    • Example: a|b matches a or b.

Regex Pattern Examples

Here are a few examples of common regex patterns:

  • Email validation: ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$. This pattern matches simple email addresses.

  • Phone number: ^\(\d{3}\) \d{3}-\d{4}$. Matches phone numbers in the format (123) 456-7890.

  • URLs: ^https?:\/\/[^\s/$.?#].[^\s]*$. Matches URLs starting with http or https.

Regular expressions are invaluable in scripting for tasks like validation, substitution, and parsing of structured text.

Using =~ Test Expression in Bash

The =~ operator allows you to perform regex matching directly within Bash scripts, which is especially useful for writing conditional logic based on patterns. Let's explore its usage with an example of matching a host name that follows a canonical domain pattern, such as cdn.mydomain.com.

Example 1: Basic Domain Name Matching

Here is a simple Bash script that uses =~ to check if a string matches a domain pattern:

#!/bin/bash

hostname="cdn.mydomain.com"

if [[ $hostname =~ ^cdn\.[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
  echo "Valid CDN domain"
else
  echo "Invalid domain"
fi
Enter fullscreen mode Exit fullscreen mode

Explanation:

  • ^cdn\. ensures the hostname starts with cdn..
  • [a-zA-Z0-9]+ matches the domain name portion (letters and numbers).
  • \.[a-z]{2,}$ matches the top-level domain (like .com, .net).

Example 2: Matching Subdomains

You can also expand this logic to handle more complex subdomain patterns:


#!/bin/bash

hostname="static.cdn.mydomain.com"

if [[ $hostname =~ ^[a-z]+\.(cdn\.)[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
  echo "Valid subdomain"
else
  echo "Invalid subdomain"
fi
Enter fullscreen mode Exit fullscreen mode

This example allows the first part of the hostname to be dynamic, matching subdomains like static.cdn.mydomain.com or images.cdn.mydomain.com.

Example 3: Extracting Parts of a Domain

In some cases, you may want to capture parts of the string using regex groups and BASH_REMATCH to extract relevant information:

#!/bin/bash

hostname="images.cdn.mydomain.com"

if [[ $hostname =~ ^([a-z]+)\.(cdn\.)[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
  echo "Subdomain: ${BASH_REMATCH[1]}"
else
  echo "No match"
fi
Enter fullscreen mode Exit fullscreen mode

In this case, ${BASH_REMATCH[1]} will store the first captured group, i.e., the subdomain (images in this case).

Summary

Matching strings with regular expressions in Bash is a common task for processing and validating input. While grep and other external tools provide regex capabilities, the =~ operator is a native Bash feature for in-line regex matching, making it efficient and easy to use in scripts. By mastering regular expressions and using the =~ test expression, you can handle complex pattern matching, validate data like domain names, and even extract specific parts of strings for further processing in your Bash scripts.

. . . . .