Using Regex with Python

John Au-Yeung - Feb 7 '20 - - Dev Community

Subscribe to my email list now at http://jauyeung.net/subscribe/

Follow me on Twitter at https://twitter.com/AuMayeung

Many more articles at https://medium.com/@hohanga

Even more articles at http://thewebdev.info/

Python is a convenient language that’s often used for scripting, data science, and web development.

In this article, we’ll look at how to use regex with Python to make finding text easier.

Finding Patterns of Text with Regular Expressions

Regular expressions, or regexes, are descriptions for a pattern of text.

For instance, \d represents a single digit. We can combine characters to create regexes to search text.

To use regexes to search for text, we have to import the re module and then create a regex object with a regex string as follows:

import re  
phone_regex = re.compile('\\d{3}-\d{3}-\d{4}')

The code above has the regex to search for a North American phone number.

Then if we have the following string:

msg = 'Joe\'s phone number is 555-555-1212'

We can look for the phone number inside msg with the regex object’s search method as follows:

import re  
phone_regex = re.compile('\d{3}-\d{3}-\d{4}')  
msg = 'Joe\'s phone number is 555-555-1212'  
match = phone_regex.search(msg)

When we inspect the match object, we see something like:

<re.Match object; span=(22, 34), match='555-555-1212'>

Then we can return a string representation of the match by calling the group method:

phone = match.group()

phone has the value '555-555-1212' .

Grouping with Parentheses

We can use parentheses to group different parts of the result into its own match entry.

To do that with our phone number regex, we can write:

phone_regex = re.compile('(\d{3})-(\d{3})-(\d{4})')

Then when we call search , we can either get the whole search string, or individual match groups.

group takes an integer that lets us get the parts that are matched by the groups.

Therefore, we can rewrite our program to get the whole match and the individual parts of the phone number as follows:

import re  
phone_regex = re.compile('(\d{3})-(\d{3})-(\d{4})')  
msg = 'Joe\'s phone number is 123-456-7890'  
match = phone_regex.search(msg)  
phone = match.group()  
area_code = match.group(1)  
exchange_code = match.group(2)  
station_code = match.group(3)

In the code above, phone should be ‘123–456–7890’ since we passed in nothing to group. Passing in 0 also returns the same thing.

area_code should be '123' since we passed in 1 to group , which returns the first group match.

exchange_code should be '456' since we passed in 2 to group , which returns the 2nd group match.

Finally, station_code should be '7890' since we passed in 3 to group , which returns the 3rd group match.

If we want to pass in parentheses or any other special character as a character of the pattern rather than a symbol for the regex, then we have to put a \ before it.

Matching Multiple Groups with the Pipe

We can use the | symbol, which is called a pipe to match one of many expressions.

For instance, we write the following to get the match:

import re  
name_regex = re.compile('Jane|Joe')  
msg = 'Jane and Joe'  
match = name_regex.search(msg)  
match = match.group()

match should be 'Jane' since this is the first match that’s found according to the regex.

We can combine pipes and parentheses to find a part of a string. For example, we can write the following code:

import re  
snow_regex = re.compile(r'snow(man|mobile|shoe)')  
msg = 'I am walking on a snowshoe'  
snow_match = snow_regex.search(msg)  
match = snow_match.group()  
group_match = snow_match.group(1)

to get the whole match with match , which has the value 'snowshoe' .

group_match should have the partial group match, which is 'shoe' .

Optional Matching with the Question Mark

We can add a question mark to the end of a group, which makes the group optional for matching purposes.

For example, we can write:

import re  
snow_regex = re.compile(r'snow(shoe)?')  
msg = 'I am walking on a snowshoe'  
msg_2 = 'I am walking on snow'  
snow_match = snow_regex.search(msg)  
snow_match_2 = snow_regex.search(msg_2)

Then snow_match.group() returns 'snowshoe' and snow_match.group(1) returns 'shoe' .

Since the (shoe) group is optional, snow_match_2.group() returns 'snow' and snow_match_2.group(1) returns None .

Conclusion

We can use regexes to find patterns in strings. They’re denoted by a set of characters that defines a pattern.

In Python, we can use the re module to create a regex object from a string.

Then we can use it to do searches with the search method.

We can define groups with parentheses. Once we did that, we can call group on the match object returned by search.

The group is returned when we pass in an integer to get it by their position.

We can make groups optional with a question mark appended after the group.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .