In This Article
What Is Regex
Regex, or Regular Expression, is a tough thing to learn. Trying to define it in simple words is almost as difficult. In a short sentence, Wikipedia currently describes regular expressions as a sequence of characters that specifies a search pattern in a text.
What that means, is that regexes are used to find characters in a text. With characters I mean letters, numbers, brackets, new line characters and so on. With regexes, you can write an expression that defines which sequence of characters you want to find in the text.
Understanding the syntax of the expression language is what is hard to learn (and remember). We will go through everything you need to know about that in upcoming articles, both for basic and advanced usage. The aim of this first article in the series is to explain why you should learn regex.
Just another thought to keep you awake
Regexes in a Nutshell
Regex on its own is only used to find text occurrences (matches) in a string or text. What we want to do with the matches is up to us. When using regexes, we are most often interested in one of three things.
- find information in a text
- check if a string matches a regex, e.g., email validation
- replace characters in a text
Even if the use cases of regex often boil down to answering a yes or no question, finding text occurrences or to replace some characters, it can be used in a lot of scenarios. We will look at a few of them now.
When Is Regex Used?
Most developers can go a life time without learning regex. Often when regex is a good idea to use, it isn't completely necessary to use it. The alternative solutions do unfortunately often take a lot more time to implement and will not scale for use cases where more data needs to be processed.
The short and boring answer to the question "when is regex used?" is "to a lot of things". The long answer follows here.
Searching Through Files, Manuals and Logs
The first time I ever encountered regexes was when learning about Unix. First experience was to find information in Unix manual pages and log files. It looked pretty much like this.
// Finds all lines that contain the word "error" and later "vehicle" in the file somelogfile.log. Useful to see if we have logged any errors which mention anything about a vehicle.
grep -i 'error.*vehicle' ./somelogfile.log
// Searches through the Unix manual for scp command and outputs what the v flag does.
man scp | grep -i '\-v'
// Output in Mac terminal:
// -v Verbose mode. Causes scp and ssh(1) to print debugging messages
Above examples may look like school book examples, and they are. That's why we go to school, to learn useful stuff! Sure, you can always google for what flags for Unix commands do, may take longer time, but still quick.
Searching through log files is something you will have to do yourself though. That's something I do daily at work to investigate bugs. For big log files I may find hundreds or thousands matches if I simply search for "warning" or "error". Writing regexes definitely helps me filtering among all those results by specifying a more granular search term with a regex.
Regex Search in an IDE
Have you ever accidentally activated that regex checkbox in your IDE when searching for text occurrences in your editor and then wondering why you don't find what you are searching for? Next time you do that, don't uncheck it. Find a way to search for what you want with it activated!
Sign me up as guilty for that, it has happened a lot.
Maybe you know you are invoking a very common function somewhere, but don't remember where it was or what the function is named. Here are some expressions for that! Don't worry if you don't understand them, regex syntax will be explained in another article I will post here in a few weeks.
// Finds a console log that prints a count variable.
conso.*count
// Example: console.log(count)
// Example: console.log(`Total: ${count}`)
// Example*: console.log(itemCount)
// Finds a function which name ends with "count" and has an argument which includes the text "todo".
count\(.*todo
// Example: count(todoItems)
// Example*: updateCount(oldCount, newTodoItem)
// * these examples only match if case sensitivity is disabled, which is done either in the IDE or by using the regex flag i
Classic Example, Form Validation
This article wouldn't be complete if the classic example wasn't mentioned. This is probably what you are thinking of when you read the word regex, to validate a form using regexes. It's a must have.
You can both validate that an email is a valid email, or that a phone number is of the correct format. You can also forbid curses and vulgar words in user names or detect if a credit card number is a MasterCard or a Visa. Maybe you need to ensure that a password contains special characters. For all of these examples, you can use regexes.
I'm fine with regex, but I won't touch CSS
Find and Replace Like a Pro (or Manipulate Strings)
Once in a while I get a really boring task at work. It could be that an API endpoint has been updated and it requires a lot of changes because it has been hardcoded all over the place. It could also be that data has been dumped from a production database into a JSON file to be used as development data, and to do that, all email addresses must be replaced with dummy addresses.
The hardcoded API example sounds like a quick find and replace work, and it is, but you may need to know regex to do that. Look at the following URLs.
- https:// regexexampledomain.com/articles/react/some-react-article
- https:// regexexampledomain.com/articles/react/react-is-awesome
- https:// regexexampledomain.com/articles/javascript/a-javascript-article
The above urls follows the pattern /articles/<category>/<article-name>. Imagine if we have thousands of links like that, with thousands of categories, and we have refactored the site so we don't longer have categories such as react and javascript in the URL.
We could fairly safely find and replace one category after another, e.g., replacing /react/ and /javascript/ with a single slash character to remove the categories. But remember, the site has thousands of categories, it will take time to manually do that.
With regexes, we can update all of those categories using one single find and replace, regardless of how many categories we have.
What about the email obfuscation example? How can we find and replace any email in a JSON file with john.doe@gmail.com? The only thing we need for that is a regex for an email address and a replace function in either a programming language or an IDE. Then we can just hit a button and search through the JSON data to see that it actually replaced the emails successfully.
John Doe didn't get his email obfuscated
Usages in APIs and Web Consoles
Regex is widely spread. A lot of services and APIs support it. This means it's a fairly good chance that a SaaS which deals with a great amount of data offers some kind of regex solution. It could be a search field in a web console for a logging tool or it could be a SDK or API that allows to use regexes. See for example Elasticsearch regex queries.
Interpret and Categorize Data with Regex
So far, most of the regex examples described have involved processing or validating data. However, regexes can also be used to interpret and categorize data.
Let's take an example where the input data is single words which either can be a string, an integer, a float number, an array or an object. Our goal is to detect which one it is.
Input: 402
Output: integer
Input: 402 is an integer
Output: string
Input: 4.2
Output: float
Input: the number 4.2 is a float, not an integer
Output: string
Input: ['hello', 4.2]
Output: array
Input: {"hello": 4.2}
Output: object
The above problem can of course be solved with a lot of if statements and looping through alphabetic letters to detect if the input contains any of those. With regex, we can however solve it much easier. We can use regexes like below to instantly detect the types.
Regex: /^[0-9]+$/
Detects: integer
Regex: /^[0-9]+\.[0-9]+$/
Detects: float
Regex: /^\[.*\]$/
Detects: array
Regex: /^\{.*\}$/
Detects: object
With the above regex, we could detect integers, floats, arrays and objects in a few lines of code. To detect strings, we could create a more complex regex, ensuring that it doesn't start with a bracket and that it contains letters from the alphabet, but it would also be sufficient to use string as a default fallback value when the input isn't of any of the other types.
Using regex in this way can save us some lines of code. In other scenarios it may be crucial to use regex. Take clothing sizes as an example. There are many ways to write a size like XL. What if we want to map all possible ways to write it into one of the ways?
## Examples of input: l, L, large, Large, lg, LG, L slim, L wide
Output: L
## Examples of input: xl, XL, extra large, extra-large, x large, x-large, X-Large
Output: XL
## Examples of input: xxl, XXL, 2xl, 2XL
Output: XXL
Starting to see the problem? This issue is fairly easy to solve with regex. Without it, we kind of would have to hardcode a lot of mappings for the sizes. That's totally a valid solution. But what if the sizes also could be numbers?
Say for example that we have a t-shirt in size L which is described with the European size 52, or maybe as a range from 52-54? We would have a lot of things to hardcode there.
Although regex is of great use in a case like this, we can not solve this problem with a single regex, we need quite a few of them. However, we do not need a lot of them, a single regex can match a lot of different variations of how to write a size.
Learn More About Regex
The use cases of regex we have seen in this article are just examples. These kinds of issues don't always need to be solved with regex. Sometimes it is just overkill to use regex. Sometimes it's just as easy without regex as it is with regex. It is a decision you will have to take when facing a problem. For some problems, you will come to realize that it isn't even feasible to solve the problem without regex.
If you want to learn more about regex. Save this article for later or follow me. I will post another article during next month.