Understanding Regular Expressions once and for all [PART 4]

Svenja Schäfer - Aug 17 '20 - - Dev Community

Originally published at: Codegram's blog

Here comes part 4 of the "Understanding Regular Expressions once and for all" series. Until now, you should be able to match literal characters, numbers, spaces at the end and beginning of a line. Also, you are aware of the optional character - the question mark. And you can check if a character appears minimum once or zero times thanks to + and *. So far, so good. Today, it's time to meet brackets.

DINNER WITH [FRIENDS]

Imagine some friends will come over for dinner. And you have to prepare the evening, including - of course - the dinner itself. You look through all the recipes but this time it's not super important what you cook (ingredients), but for how many people it is. Our task: only matching recipes for the number of friends coming over. Let's say, 5 friends will sit at the table tonight. We could write something like this: servings: 5. But while this is a totally accepted expression, we want to make it a bit more "regex-like". Also, there's always this type of friend who brings someone with and another might cancel, so a range would be better suited. Ranges can look like this: [0-3]. So, our expression could be: servings:\s[4-6]. With this, we would match the recipes for 4, 5 and 6 people.

But what if we run a nice vegan BBQ party? We expect to see about 12 - 16 people. How can we match that? Remember, that we match a single character. Always. If we want to match more, we specifically have to say it. We could end up with something like this: servings:\s[0-9]+ but with that, we would actually match any amount. Useless. We could, however, write this: servings:\s[1][2-6]. Nice, isn't it?

The square brackets also work with any other character. For example, if we want to match all recipes starting with a capital B (like the B in BBQ), we could use this expression: ^B, right? But what, if the recipes we are looking for are ordered under G (like in Grilling)? Easy-peasy with our square brackets: ^[BG]. You see, it doesn't have to be a range we are looking for. Any character inside the square brackets can be a match. Can, not must!

EXCEPTIONS, EXCEPTIONS

Did you see in the previous expression, that the caret character is outside the square brackets? That's not an error but has a good reason. If it were inside like this: [^BG], we would match any recipe that doesn't have a B or G. The ^ inside square brackets means not. Sure, we can use this as well. For example, like this: ^[^AC-FH-Z]. With that, we wouldn't match any starting uppercase letter besides B and G. To exclude any lowercase characters as well, we could change the expression into: ^[^a-zAC-FH-Z].

Yes, square brackets are pretty useful and like the rest of regular expressions powerful. But there are more. More brackets. Curly ones for example.

Remember, when we covered matching multiple digits in Part 3 of this blog series? We used * or + for it. But what if we don't want to end up with endless repetitions of a character but matching a specific amount? Say hello to {}. Whichever character length you want to define, this is the expression for it. So maybe we want to find all digits with the length of 4 in our recipe, like 1000 ml. Just because we want to change it into 1 litre. We could write \d{4}. Even ranges are possible with these neat brackets. To match minimum four and maximum six characters, the regular expression would look like this: \d{4,6}.

See, with only two types of brackets, you can be so much more precise. Imagine what you can do with three types. That's exactly what we will learn in the next part. But until then:
^Ke{2}p\s[n-p]+\slearning$.


Photos by Jessica Ruscello on Unsplash

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .