Regular expression or in short Regex is a string of text that lets you create patterns that help match, locate, and manage text. It’s an important tool in a wide variety of computing applications, from programming languages like JS, Java and Perl, to text processing tools like grep, sed, and vim.
Here are a few helpers to refresh your mind when you need some ‘simple’ regex to do the job.
Characters
Characters | Legend | Example | Sample Match |
---|---|---|---|
[abc], [a-c] | Match the given characters/range of characters | abc[abc] | abca, abcb, abcc |
[^abc], [^a-c] | Negate and match the given characters/range of characters | abc[^abc] | abcd, abce, abc1 |
. | Any character except line break | bc. | bca, bcd, bc1, b. |
\d | Any numeric character (equivalent to [0-9]) | c\d | c1, c2, c3 |
\D | Any non-numeric character (equivalent to [^0-9]) | c\D | ca, c., c* |
\w | Any alphanumeric character (equivalent to [A-Za-z0-9_]) | a\w | aa, a1, a_ |
\W | Any non-alphanumeric character (equivalent to [A-Za-z0-9_]) | a\W | a), a$, a? |
\s | Usually used for white space , but can be used for new line , tab , etc | a\s | a |
\S | Not a white space or equivalent like new line , tab , etc | a\S | aa |
\t | Matches a horizontal tab | T\tab | T ab |
\r | Matches a carriage return | AB\r\nCD | AB |
CD | |||
\n | Matches a linefeed | AB\r\nCD | AB |
CD | |||
\ | Escapes special characters | \d | 0, 1 |
x | y | Matches either “x” or “y” | a |
Assertions
Characters | Legend | Example | Sample Match |
---|---|---|---|
^ | Start of string or start of line depending on multiline mode | ^abc.* | abc, abd, abcd |
$ | End of string or start of line depending on multiline mode | .*xyz$ | xyz, wxyz, abcdxyz |
\b | Matches a word character is not followed by another word-character | My.*\bpie | My apple pie |
\B | Matches a non-word boundary | c.*\Bcat | copycat |
x(?=y) | Lookahead assertion : Matches “x” only if “x” is followed by “y” | \d+(?=€) | $1 = 0. 9 8€ |
x(?!y) | Negative Lookahead assertion : Matches “x” only if “x” is followed not by “y” | \d+\b(?!€) | $ 1 = 0.98€ |
(?<=y)x | Lookbehind assertion : Matches “x” only if “x” is preceded by “y” | (?<=\d)\d | $1 = 0.9*8*€ |
(?<!y)x | Negative Lookbehind assertion : Matches “x” only if “x” is not preceded by “y” | (?<!\d)\d | $ 1 = 0. 9 8€ |
Groups
Characters | Legend | Example | Sample Match |
---|---|---|---|
(x) | Capturing group : Matches x and remembers the match | A(nt | pple) |
(?x) | Capturing group : Matches x and stores it in the mentioned variable | A(?nt | pple) |
(?:name>x) | Non-capturing group : Matches x and does not remember the match | A(?:nt | pple) |
_n_ | Back reference to the last substring matching the n parenthetical | (\d)+(\d)=\2+\1 | 5+6=6+5 |
Quantifiers
Characters | Legend | Example | Sample Match |
---|---|---|---|
x* | Matches the preceding item “x” 0 or more times | a* | a, aa, aaa |
x+ | Matches the preceding item “x” 1 or more times, equivalent to {1,} | a+ | aa, aaa, aaaa |
x? | Matches the preceding item “x” 0 or 1 time | ab? | a, ab |
x{n} | Matches the preceding item “x” n times (n = positive integer ) | ab{5}c | abbbbbc |
x{n,} | Matches the preceding item “x” at least n times (n = positive integer ) | ab{2,}c | abbc, abbbc, abbbbc |
x{n,m} | Matches the preceding item “x” at least n times & at most m times (n<m) | ab{2,3}c | abbc, abbbc |
NOTE
By default quantifiers are greedy (they try to match as much of the string as possible).
The ?
character after the quantifier makes the quantifier non-greedy (it will stop as soon as it finds a match).
For Example: \d+?
for a test string 12345
will match only 1
, but \d+
will match the entire string 12345
Flags
Flags are put at the end of the regular expression. They are used to modify how the regular expression behaves.
For Example: /a/
for a test string a
will match a
only, but adding the flag i
(/a/i
) would match both a
and A
Characters | Legend |
---|---|
d | Generate indices for substring matches |
g | Global search |
i | Case-insensitive search |
m | Multi-line search |
s | Allows . to match newline characters
|
u | Treats a pattern as a sequence of Unicode code points |
y | Perform a sticky search that matches starting at the current position in the target string |
If you wish to test your knowledge: