In Bash, matching strings against regular expressions (regex) is a common task for parsing and validating data. Bash offers multiple ways to perform regex matching, including using the grep
command, and more importantly, the =~
test expression for conditional checks directly in Bash scripts. In this article, we'll explore different approaches to matching strings with regex in Bash, with a focus on the =~
operator.
Ways to Check a String Against a Regex in Bash
Bash provides several tools to check strings against regex patterns. Here are the most common methods:
Using the grep Command
The grep
command is a powerful tool to search for patterns in files or standard input. It supports regex matching by default and is widely used for string searching.
echo "teststring" | grep -E "test.*"
Using =~ Test Expression
Bash built-in conditional expressions allow regex matching using the =~
operator within the test [[ ]]
. This approach is efficient for matching strings in scripts without relying on external commands like grep.
if [[ "teststring" =~ ^test ]]; then
echo "Matches!"
else
echo "No match!"
fi
Using awk
Another way to match regex in bash is with the awk
command, which has built-in regex support.
echo "teststring" | awk '/test/'
In this article, we will focus on the =~
operator as it is an efficient and versatile tool for regex matching in Bash scripts.
What are Regular Expressions?
Regular Expressions (regex) are sequences of characters that form search patterns, typically used for pattern matching in strings. They allow users to match complex patterns in text, making them a powerful tool for data parsing, validation, and text processing. Regex is used in many programming languages and tools like Perl, Python, JavaScript, grep, and more.
Basic Symbols and Patterns
Regular expressions consist of literals and special characters (metacharacters) that define the search pattern. Some of the most commonly used regex symbols include:
- Dot
.
: Matches any single character except a newline.- Example:
a.b
matches acb, a1b, etc.
- Example:
- Caret
^
: Anchors the match at the start of a string.- Example:
^test
matches strings starting with test.
- Example:
- Dollar
$
: Anchors the match at the end of a string.- Example:
test$
matches strings ending with test.
- Example:
- Asterisk
*
: Matches zero or more occurrences of the preceding character or group.- Example:
ab*c
matchesac
,abc
,abbc
, etc.
- Example:
- Plus
+
: Matches one or more occurrences of the preceding character or group.- Example:
ab+c
matchesabc
,abbc
, butnot ac
.
- Example:
- Square Brackets ([]): Matches any one of the enclosed characters.
- Example:
[abc]
matchesa
,b
, orc
.
- Example:
- Escape Sequence
\
: Escapes special characters, allowing them to be treated as literals.- Example:
\.com
matches.com
instead of treating.
as a wildcard.
- Example:
- Parentheses
()
: Groups multiple characters or expressions.- Example:
(ab)+
matchesab
,abab
, etc.
- Example:
- Pipe
|
: Acts as a logical OR operator to match different alternatives.- Example:
a|b
matchesa
orb
.
- Example:
Regex Pattern Examples
Here are a few examples of common regex patterns:
Email validation:
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
. This pattern matches simple email addresses.Phone number:
^\(\d{3}\) \d{3}-\d{4}$
. Matches phone numbers in the format(123) 456-7890
.URLs:
^https?:\/\/[^\s/$.?#].[^\s]*$
. Matches URLs starting withhttp
orhttps
.
Regular expressions are invaluable in scripting for tasks like validation, substitution, and parsing of structured text.
Using =~ Test Expression in Bash
The =~
operator allows you to perform regex matching directly within Bash scripts, which is especially useful for writing conditional logic based on patterns. Let's explore its usage with an example of matching a host name that follows a canonical domain pattern, such as cdn.mydomain.com
.
Example 1: Basic Domain Name Matching
Here is a simple Bash script that uses =~
to check if a string matches a domain pattern:
#!/bin/bash
hostname="cdn.mydomain.com"
if [[ $hostname =~ ^cdn\.[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
echo "Valid CDN domain"
else
echo "Invalid domain"
fi
Explanation:
-
^cdn\.
ensures the hostname starts withcdn.
. -
[a-zA-Z0-9]+
matches the domain name portion (letters and numbers). -
\.[a-z]{2,}$
matches the top-level domain (like.com
,.net
).
Example 2: Matching Subdomains
You can also expand this logic to handle more complex subdomain patterns:
#!/bin/bash
hostname="static.cdn.mydomain.com"
if [[ $hostname =~ ^[a-z]+\.(cdn\.)[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
echo "Valid subdomain"
else
echo "Invalid subdomain"
fi
This example allows the first part of the hostname to be dynamic, matching subdomains like static.cdn.mydomain.com
or images.cdn.mydomain.com
.
Example 3: Extracting Parts of a Domain
In some cases, you may want to capture parts of the string using regex groups and BASH_REMATCH
to extract relevant information:
#!/bin/bash
hostname="images.cdn.mydomain.com"
if [[ $hostname =~ ^([a-z]+)\.(cdn\.)[a-zA-Z0-9]+\.[a-z]{2,}$ ]]; then
echo "Subdomain: ${BASH_REMATCH[1]}"
else
echo "No match"
fi
In this case, ${BASH_REMATCH[1]}
will store the first captured group, i.e., the subdomain (images in this case).
Summary
Matching strings with regular expressions in Bash is a common task for processing and validating input. While grep
and other external tools provide regex capabilities, the =~
operator is a native Bash feature for in-line regex matching, making it efficient and easy to use in scripts. By mastering regular expressions and using the =~
test expression, you can handle complex pattern matching, validate data like domain names, and even extract specific parts of strings for further processing in your Bash scripts.