Here's a subject I never thought I'd write about. But life is full of surprises.
People who learn web development through coding bootcamp aren't (usually) familiar with time complexity. They might have read the words, but that's it. I know I tried to look up the Wikipedia page only to fall asleep at the end of the first paragraph. When I finally woke up, I thought "Nevermind, I'll never have to deal with it anyway".
And boy, was I wrong. I was soon asked during a coding interview to write a pairing algorithm and to calculate its time complexity 🤔. It made me realize that time complexity is more than just a fancy word thrown by an engineer to make you sweat. Knowing the gist of it can be useful outside the coding-interviews-world too.
So, for all my fellow bootcampers out there, here are the very basics of time complexity - so you can shine like a diamond during your next cocktail party.
Time complexity for noobs
Time complexity is how you measure your code's execution time depending on the size of the input.
That's it. You can stop reading right now and go back to your life.
Still with me? Ok, wannabe-CS-nerd, here's an example:
- If your code handles a 10-integer array and a 10,000-integer array during the same amount of time, well done, your code's time complexity is 👌.
- If your code takes 1,000x longer to handle a 10-integer array and a 10,000-integer array, sorry, your code's time complexity is 💩 1.
A much-needed example
Let's take the coding interview classic I had to tackle:
Say you get an array of 10,000 random integers, positive and negative. You want to get all couples (without duplicates) of elements summing to a given number.
Here's a first solution:
def find_pairs(array, sum)
array.repeated_combination(2).find_all { |a, b| a + b == sum }.uniq
end
Let's break it down:
-
repeated_combination(2)
generates an enumerator containing all repeated pairs of integers from the 10,000 array (roughly 50,000,000 elements 😱). - The
find_all
returns an array with all occurrences checking the block. -
uniq
removes the duplicated pairs
But what's more important is: how much time does this method take to run depending on the size of the input?
We'll benchmark our code with benchmark-ips. This gem provides you with the number of time your code runs per second.
To add benchmark-ips
, first run:
gem install benchmark-ips
Then add the benchmark to your code. Here, I'll run my find_pairs(array, sum)
with four different inputs:
- a 10-integer array
- a 100-integer array
- a 1,000-integer array
- a 10,000-integer array
require 'benchmark/ips'
def find_pairs(array, sum)
array.repeated_combination(2).find_all { |a, b| a + b == sum }.uniq
end
########################
# COMPLEXITY BENCHMARK #
########################
# Set up
FIRST_ARRAY = Array.new(10) { rand(-10...20) }
SECOND_ARRAY = Array.new(100) { rand(-100...200) }
THIRD_ARRAY = Array.new(1000) { rand(-1000...2000) }
FOURTH_ARRAY = Array.new(10_000) { rand(-10_000...20_000) }
# Time complexity benchmark
Benchmark.ips do |x|
x.config(:time => 5, :warmup => 2)
# Typical mode, runs the block as many times as it can
x.report("find pairs in 10-integer array") {
find_pairs(FIRST_ARRAY, 8)
}
x.report("find pairs in 100-integer array") {
find_pairs(SECOND_ARRAY, 87)
}
x.report("find pairs in 1000-integer array") {
find_pairs(THIRD_ARRAY, 876)
}
x.report("find pairs in 10_000-integer array") {
find_pairs(FOURTH_ARRAY, 8765)
}
x.compare!
end
Here's the output:
Warming up --------------------------------------
find pairs in 10-integer array
10.685k i/100ms
find pairs in 100-integer array
137.000 i/100ms
find pairs in 1000-integer array
1.000 i/100ms
find pairs in 10_000-integer array
1.000 i/100ms
Calculating -------------------------------------
find pairs in 10-integer array
112.738k (± 1.5%) i/s - 566.305k in 5.024381s
find pairs in 100-integer array
1.388k (± 2.2%) i/s - 6.987k in 5.036686s
find pairs in 1000-integer array
14.208 (± 0.0%) i/s - 72.000 in 5.069451s
find pairs in 10_000-integer array
0.138 (± 0.0%) i/s - 1.000 in 7.241235s
Comparison:
find pairs in 10-integer array: 112737.7 i/s
find pairs in 100-integer array: 1388.0 i/s - 81.22x slower
find pairs in 1000-integer array: 14.2 i/s - 7934.72x slower
find pairs in 10_000-integer array: 0.1 i/s - 816360.51x slower
😓 As you can see, it takes my code 80x longer to run between a 10-integer array and a 100-integer array. But then, it takes 100x longer between the 100-integer array and the 1,000-integer array. And 100x longer between the 1,000-integer array and the 10,000-integer array.
It's safe to say that its time complexity is 💩.
You can already picture the user looking at a blank screen for 10 seconds while your code runs. And that, friends, is not good at all.
Moving from the "👌 and 💩" notation to the Big-0 notation
Time complexity has its own scale: the Big-0 notation 2.
From best to worst:
-
0(1)
: the size of the input doesn't impact your code's runtime -
0(log n)
: 10x the input and your code takes 2x longer to run -
0(n)
: 10x the input and your code takes 10x longer to run -
0(n log n)
: 10x the input and your code takes 50x longer to run -
0(n^2)
: 10x the input and your code takes 100x longer to run -
0(2n)
: 10x the input and your user has fallen asleep while your code still runs
There's really no need to learn it by heart or to be able to calculate the right Big-0 on top of your head (the first reason is: CS engineers and math nerds don't even agree on the calculus).
It's better to understand some rules of thumb:
- the
n
refers to the input size - if you iterate once on an array, the time to run is linearly proportional on the number of elements in the array (hence
0(n)
) - code with a 2-level nested loop usually fits the
0(n^2)
(code with a 3-level nested loop would be0(n^3)
and so on) - time complexity indicates possible performance issues based on the input size
If you want to get a more in-depth explanation, go and check A Rubyist's Guide to Big-O Notation by Honeybadger. It's neat 👌.
Getting back to our real-life example
Remember that?
def find_pairs(array, sum)
array.repeated_combination(2).find_all { |a, b| a + b == sum }.uniq
end
To calculate the time complexity of a piece of code, you just define the time complexity of each element, and the worst bit wins. So let's break down find_pairs()
and see what's going on under the hood.
☝️ First, let's create a 4-integer array
array = Array.new(4) { rand(-10...20) } # => [-6, 4, 14, 3]
✌️ Then, let's define a method find_pairs()
and call repeated_combination(2)
on the array to create pairs
array = Array.new(4) { rand(-10...20) } # => [-6, 4, 14, 3]
def find_pairs(array, sum)
# Calling repeated_combination without a block returns an enumerator
array.repeatead_combination(2)
end
# I add a to_a to transform my enumerator into an array for readability reasons
find_pairs(array, 8).to_a # => [[-6, -6], [-6, 4], [-6, 14], [-6, 3], [4, 4], [4, 14], [4, 3], [14, 14], [14, 3], [3, 3]]
Let's stop there for a moment. repeated_combination(2)
iterates over each element in array
and pair it with each element of array
. repeated_combination(2)
creates a nested-loop.
The first loop first stops on -6
then a second loop starts and iterates over each element. Once the second loop is done, we get back to the first loop which moves onto 4
where the second loop kicks in again. Rinse and repeat until repeated_combination(2)
has iterated over each element and returned all combinations.
A loop inside a loop? 👉 The time complexity of this method is 0(n^2)
☝️✌️ Now let's find_all
pairs whose integers sum to a given number.
array = Array.new(4) { rand(-10...20) } # => [-6, 4, 14, 3]
def find_pairs(array, sum)
all_pairs = array.repeatead_combination(2).find_all { |a, b| a + b == sum }
end
find_pairs(array, 8) # => [[-6, 14], [4, 4]]
find_all
iterates over the enumerator returned by repeated_combination(2)
and checks for pairs that match the block.
A single loop means 0(n)
. Since 0(n)
is much faster than the O(n^2)
from repeated_combination(2)
, it will be of little consequence in the grand scheme of things.
✌️✌️ Finally, let's remove duplicates with uniq
array = Array.new(4) { rand(-10...20) } # => [-6, 4, 14, 3]
def find_pairs(array, sum)
all_pairs = array.repeatead_combination(2).find_all { |a, b| a + b == sum }.uniq
end
find_pairs(array, 8) # => [[-6, 14], [4, 4]]
In this example, we don't have any duplicates. But when you run the algorithm on a 10,000-integer array, chances you'll have some.
uniq
only iterates once on the array, so it's time complexity is 0(n)
. Once again, it won't affect much the time complexity of the whole method.
🖐 Crunching the numbers
Now that we know the time complexity of each element, let's calculate the complexity of the whole method.
0(n^2)
+ 0(n)
+ 0(n)
= 0(n^2)
As I said before 0(n)
is so much faster than 0(n^2)
that it won't affect the final time complexity substantially.
So now, I know that my code's time complexity is 0(n^2)
. You can stick with my 💩 notation but 0(n^2)
will make you look slightly smarter.
This is where knowing the gist of time complexity comes in handy. If you know your code becomes exponentially slower when the input grows, then you know you'll have performance issues when you scale. So you have to make your code better.
Making the algorithm better
Instead of using some of Ruby's built-in methods, I'll use Set. Set creates a collection of unordered values without duplicates with fast lookup capabilities.
require 'set'
def find_pairs(array, sum)
results = Set.new
input = array.to_set
input.each do |element|
other_element = sum - element
if input.include?(other_element)
pair = element > other_element ? [other_element, element] : [element, other_element]
results << pair
end
end
results
end
Let's break it down:
- I create an empty Set to store
results
. - I transform my input array into a Set and store it into
input
. - I iterate over
input
. - I take the first integer (
element
) and subtract it fromsum
. This gives me a potential pairing integer (other_element
). - If
other_element
is included ininput
(my input array turned to a Set), then I make sure thatelement
andother_element
are paired from smallest to biggest, and pushed inresults
. Why bother to sort them? Because if I have duplicates,results
being a Set will only keep the first occurrence 👌.
Here's the benchmark for this new code:
Warming up --------------------------------------
find pairs in 10-integer array
8.171k i/100ms
find pairs in 100-integer array
1.533k i/100ms
find pairs in 1000-integer array
146.000 i/100ms
find pairs in 10_000-integer array
14.000 i/100ms
Calculating -------------------------------------
find pairs in 10-integer array
80.106k (± 8.6%) i/s - 400.379k in 5.037247s
find pairs in 100-integer array
14.005k (±12.4%) i/s - 68.985k in 5.026938s
find pairs in 1000-integer array
1.540k (± 3.8%) i/s - 7.738k in 5.033787s
find pairs in 10_000-integer array
157.341 (± 2.5%) i/s - 798.000 in 5.074419s
Comparison:
find pairs in 10-integer array: 80106.5 i/s
find pairs in 100-integer array: 14005.1 i/s - 5.72x slower
find pairs in 1000-integer array: 1539.6 i/s - 52.03x slower
find pairs in 10_000-integer array: 157.3 i/s - 509.13x slower
Much better! Now, my code only takes 10x longer to run when the input is multiplied by 10. Its time complexity has become linear (0(n)
) and takes less than 0,006s to handle a 10,000-integer input 🥳.
Well, that was super geeky. I hope it'll help fellow bootcampers to wrap their head around time complexity and how it can be useful.
Did I missed something? Lemme know on Twitter or on dev.to ✌️.
Rémi