One concept that threw me for a loop when I first picked up Python was checking if a string contains a substring. After all, in my first language, Java, the task involved calling a method like indexOf()
or contains()
. Luckily, Python has an even cleaner syntax, and we’ll cover that today.
To summarize, we can check if a string contains a substring using the in
keyword. For example, "Hi" in "Hi, John"
returns true. That said, there are several other ways to solve this problem including using methods like index()
and find()
. Check out the rest of the article for more details.
Problem Description
A common problem in programming is detecting if a string is a substring of another string. For example, we might have a list of addresses stored as strings, and we want to find all addresses on a certain street (e.g. Elm Street):
addresses = [
"123 Elm Street",
"531 Oak Street",
"678 Maple Street"
]
street = "Elm Street"
In that case, we might check which addresses contain the street name (e.g. 123 Elm Street). How do we do something like this in Python?
In most programming languages, there’s usually some substring method. For instance, in Java, strings have an indexOf
method which returns a positive number if the substring was found.
Even without a special method, most languages allow you to index strings like arrays. As a result, it’s possible to manually verify that a string contains a substring by looking for a match directly.
In the following section, we’ll take a look at several possible solutions in Python.
Solutions
As always, I like to share a few possible solutions to this problem. That said, if you want the best solution, I suggest jumping to the last solution.
Checking if String Contains Substring by Brute Force
Whenever I try to solve a problem like this, I like to think about the underlying structure of the problem. In this case, we have a string which is really a list of characters. As a result, what’s stopping us from iterating over those character to find our substring:
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
for address in addresses:
address_length = len(address)
street_length = len(street)
for index in range(address_length - street_length + 1):
substring = address[index:street_length + index]
if substring == street:
print(address)
Here, I’ve written a sort of nasty set of loops which iterate over all addresses, compute lengths of some strings, iterate over all substrings of the appropriate size, and prints the results if a proper substring is found.
Luckily, we don’t have to write our own solution to this. In fact, the entire inner loop is already implemented as a part of strings. In the next section, we’ll look at one of those methods.
Checking if String Contains Substring Using index()
If we want want to check if a string contains a substring in Python, we might try borrowing some code from a language like Java. As mentioned previously, we usually use the indexOf()
method which returns an index of the substring. In Python, there’s a similar method called index()
:
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
for address in addresses:
try:
address.index(street)
print(address)
except ValueError:
pass
Here, we call the index function without storing the result. After all, we don’t actually care what the index is. If the method doesn’t find a matching substring, it’ll throw an exception. Naturally, we can catch that exception and move on. Otherwise, we print out the address.
While this solution gets the job done, there’s actually a slightly cleaner solution, and we’ll take a look at it in the next section.
Checking if String Contains Substring Using find()
Interestingly enough, Python has another method similar to index()
which functions almost identically to indexOf()
from Java. It’s called find()
, and it allows us to simplify our code a little bit:
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
for address in addresses:
if address.find(street) > 0:
print(address)
Now, that’s a solution I can get behind. After all, it’s quite reminscent of a similar Java solution.
Again, it works like index()
. However, instead of throwing an exception if the substring doesn’t exist, it returns -1. As a result, we can reduce our try/except block to a single if statement.
That said, Python has an even better solution which we’ll check out in the next section.
Checking if String Contains Substring Using in
Keyword
One of the cool things about Python is how clean and readable the code can be. Naturally, this applies when checking if a string contains a substring. Instead of a fancy method, Python has the syntax built-in with the in
keyword:
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
for address in addresses:
if street in address:
print(address)
Here, we use the in
keyword twice: once to iterate over all the addresses in the address list and again to check if the address contains the street name. As you can see, the in
keyword has two purposes:
- To check if a value is present in a sequence like lists and strings
- To iterate through a sequence
Of course, to someone coming from a language like Java, this can be a pretty annoying answer. After all, our intuition is to use a method here, so it takes some getting used to. That said, I really like how this reads. As we’ll see later, this is also the fastest solution.
Performance
With all these solutions ready to go, let’s take a look at how they compare. To start, we’ll need to set the solutions up in strings:
setup = """
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
"""
brute_force = """
for address in addresses:
address_length = len(address)
street_length = len(street)
for index in range(address_length - street_length + 1):
substring = address[index:street_length + index]
if substring == street:
pass # I don't want to print during testing
"""
index_of = """
for address in addresses:
try:
address.index(street) # Again, I don't actually want to print during testing
except ValueError:
pass
"""
find = """
for address in addresses:
if address.find(street) > 0:
pass # Likewise, nothing to see here
"""
in_keyword = """
for address in addresses:
if street in address:
pass # Same issue as above
"""
With these strings ready to go, we can begin testing:
>>>> import timeit
>>> min(timeit.repeat(setup=setup, stmt=brute_force))
4.427290499999998
>>> min(timeit.repeat(setup=setup, stmt=index_of))
1.293616
>>> min(timeit.repeat(setup=setup, stmt=find))
0.693925500000006
>>> min(timeit.repeat(setup=setup, stmt=in_keyword))
0.2180926999999997
Now, those are some convincing results! As it turns out, brute force is quite slow. In addition, it looks like the error handling of the index()
solution isn’t much better. Luckily, find()
exists to eliminate some of that overhead. That said, in
is the fastest solution by far.
As is often the case in Python, you’ll get the best performance out of common idioms. In this case, don’t try to write your own substring method. Instead, use the built-in in
keyword.
Challenge
Now that you know how to check if a string contains a substring, let’s talk about the challenge. We’re going to write a simple address search engine which filters on two keywords rather than one: street and number. However, we may not get both pieces of information at the time of search. As a result, we need to deal with finding addresses which exactly match whatever keywords are available.
For this challenge, you can write any solution you want as long as it prints out a list of addresses that exactly matches the search terms. For instance, take the following list of addresses:
addresses = ["123 Elm Street", "123 Oak Street", "678 Elm Street"]
If a user searches just “Elm Street”, then I would expect the solution to return “123 Elm Street” and “678 Elm Street”. Likewise, if a user searches “123”, then I would expect the solution to return “123 Elm Street” and “123 Oak Street”. However, if the user provides both “123” and “Elm Street”, I would expect the solution to only return “123 Elm Street”—not all three addresses.
Feel free to have fun with this. For example, you could choose to write an entire front end for collecting the street and number keywords, or you could assume both of those variables already exist.
In terms of input data, feel free to write your own list of addresses or use my simple example. Alternatively, you can use a website which generates random addresses.
Ultimately, the program needs to demonstrate filtering on two keywords. In other words, find a way to modify one of the solutions from this article to match the street, address, or both—depending on what is available at the time of execution.
In the comments below, I’ll share my solution. Feel free to do the same!
A Little Recap
And with that, we’re finished. As a final recap, here are all the solutions you saw today:
addresses = ["123 Elm Street", "531 Oak Street", "678 Maple Street"]
street = "Elm Street"
# Brute force (don't do this)
for address in addresses:
address_length = len(address)
street_length = len(street)
for index in range(address_length - street_length + 1):
substring = address[index:street_length + index]
if substring == street:
print(address)
# The index method
for address in addresses:
try:
address.index(street)
print(address)
except ValueError:
pass
# The find method
for address in addresses:
if address.find(street) > 0:
print(address)
# The in keyword (fastest/preferred)
for address in addresses:
if street in address:
print(address)
As always, if you liked this article, make sure to give it a share. If you’d like more articles like this to hit your inbox, hop on my mailing list. While you’re at it, consider joining me on Patreon.
If you’re interested in learning more Python tricks, check out some of these related articles:
- How to Print on the Same Line in Python: Print and Write
- How to Format a String in Python: Interpolation, Concatenation, and More
- How to Write a List Comprehension in Python: Modifying and Filtering
Otherwise, that’s all I have. Thanks again for your support!
The post How to Check if a String Contains a Substring in Python: In, Index, and More appeared first on The Renegade Coder.