I often enjoy trying to solve previous years' Advent of Code puzzles. If you don't know what Advent of Code is make sure you check it out.
TLDR: Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like.
This post is about puzzle 2021/day-6 I was trying to solve that's not very interesting as a puzzle but I found it to be a really good example for exploring how to deep dive into your code's performance, find potential bottlenecks and improve it with tools that Go provides.
Each puzzle on Advent of Code contains two parts. You need to solve the 1st to move to the 2nd one and usually the 2nd is a more complicated or demanding variation of the 1st. This is the approach we are going to follow on this post as well.
Tools
The tools we are going to use are
- Benchmarks for measuring and analyzing the performance of go code with help from go tool pprof.
- Benchstat computes statistical summaries and compares benchmarks.
Part 1
I am not going to repeat the problem here for brevity, but if you read the puzzle carefully you can probably come up with a trivial solution as I did the 1st time.
func parseInput(s string) []int {
data := strings.Split(strings.Split(s, "\n")[0], ",")
fish := make([]int, len(data))
for i, d := range data {
fish[i] = MustAtoi(d)
}
return fish
}
func MustAtoi(s string) int {
n, err := strconv.Atoi(s)
if err != nil {
panic(err)
}
return n
}
func Solve(input string, days int) int {
fish := parseInput(input)
for i := 0; i < days; i++ {
tempFish := []int{}
for j := range fish {
if fish[j] == 0 {
fish[j] = 6
tempFish = append(tempFish, 8)
} else {
fish[j]--
}
}
fish = append(fish, tempFish...)
}
return len(fish)
}
The approach for solving it is:
- parsing the input and turning it into a list of numbers and then iterating for the number of
days
- for each day I decrease each fish's timer, if the timer reaches zero resets to
6
- and on a temporary list add a new fish with timer
8
that at the end of the day is going to be appended on the main list.
That's pretty much it, successfully solves the Part 1 and it's fast enough.
ok advent-of-code/2021/go/day-6 0.226s
The purpose of this post though is not to just solve the puzzle.
Before updating the code we need a benchmark of the current implementation as a base
Note: Always measure before optimizing.
var sink int
func BenchmarkSolution(b *testing.B) {
res := 0
input := util.ReadFile("input.txt")
b.ResetTimer()
b.ReportAllocs()
for i := 0; i < b.N; i++ {
res = Solve(input, 80)
}
sink = res
}
Benchmarks
We can run the benchmark with
go test ./... -benchmem -run=^$ -v -bench ^Bench -memprofile naive.out -count=5 | tee naive.txt
Which will produce output similar to the following one:
pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10 235 5210983 ns/op 31653556 B/op 1025 allocs/op
BenchmarkSolution-10 231 5086091 ns/op 31653547 B/op 1025 allocs/op
BenchmarkSolution-10 225 5146135 ns/op 31653549 B/op 1025 allocs/op
BenchmarkSolution-10 231 5191141 ns/op 31653571 B/op 1025 allocs/op
BenchmarkSolution-10 229 5104518 ns/op 31653558 B/op 1025 allocs/op
PASS
ok advent-of-code/2021/go/day-6 8.847s
Note: the flag
-memprofile
memory profiling is enabled which can help to identify memory-related bottlenecks.
You can use the interactive CLI interface for navigating the profiling data
go tool pprof naive.out
# and then list the data for Solve function with
list Solve
From this view we can dig into profiling data collected from the benchmark and it's easy to spot the 3 big offenders that allocate memory.
Reducing unnecessary memory allocations leads to improved performance (e.g memory allocation overhead, garbage collection cache friendly behaviour).
Normally while investigating performance issues, you run the benchmarks against every code change to verify your changes and that you actually improving your code, but for brevity we are going to discuss all the improvements and then run the benchmark to see the final result.
Improvements
The first offender is parseInput
which is allocating an N sized integer slice, where N is the input of the puzzle. Instead of slice of integers []int
we can use a slice of bytes []byte
so we can store the same information with less memory
(and as a bonus we skip parsing from string to integer). We can do this due to the puzzle's constraints, input numbers are from 0
to 8
.
The second offender can be seen if we do list parseInput
data := strings.Split(strings.Split(s, "\n")[0], ",")
is as well allocating space that we throw right after. Instead, it's better parsing char by char.
Here is the updated parseInput
function:
func parseInput(s string) []byte {
fish := make([]byte, 0, len(s)/2)
for i := 0; i < len(s); i++ {
if s[i] == '\n' {
break
}
if s[i] == ',' {
continue
}
fish = append(fish, s[i])
}
return fish
}
The third offender is the tempFish
slice. On every iteration, we are populating the slice only to throw it on the next step and create a new one and we are keep reallocating memory. We can reuse the same slice across iterations by just clearing the slice instead of creating a new one.
The updated code for the Solve
function
func Solve(input string, days int) int {
fish := parseInput(input)
tempFish := []byte{}
for i := 0; i < days; i++ {
for j := range fish {
if fish[j] == '0' {
fish[j] = '6'
tempFish = append(tempFish, '8')
} else {
fish[j]--
}
}
fish = append(fish, tempFish...)
tempFish = tempFish[:0]
}
return len(fish)
}
Results
Rerunning the benchmarks
go test . -benchmem -run=^$ -v -bench ^Bench -count=5 -memprofile improved.out | tee improved.txt
generates the following, much improved, output
pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10 388 2945361 ns/op 2193282 B/op 45 allocs/op
BenchmarkSolution-10 410 2950991 ns/op 2193280 B/op 45 allocs/op
BenchmarkSolution-10 409 2917432 ns/op 2193278 B/op 45 allocs/op
BenchmarkSolution-10 404 2968872 ns/op 2193277 B/op 45 allocs/op
BenchmarkSolution-10 409 2917535 ns/op 2193281 B/op 45 allocs/op
PASS
ok advent-of-code/2021/go/day-6 7.725s
the benchmark is already looking better but benchstat
can help to answer "by how much"
benchstat naive.txt improved.txt
name old time/op new time/op delta
Solution-10 5.20ms ± 7% 2.94ms ± 1% -43.51% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Solution-10 31.7MB ± 0% 2.2MB ± 0% -93.07% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Solution-10 1.02k ± 0% 0.04k ± 0% -95.61% (p=0.008 n=5+5)
So the new solution is 43.51% faster than the old one and requires 93.07% less memory to do the same job.
Part 2
For the second part, the puzzle is running for 256 days. Even with the previous improvements though, at least on my machine, I can't get a result (after waiting for 5 minutes I gave up).
It's time to think the solution a bit more carefully and use a more efficient algorithm, as the solution is O(days*fish)
and the number of fish increases on every iteration. We need to find a way to keep the number of fish constant.
An example input of the puzzle
3,4,3,1,2,2,1,1 ....
We know that a fish can have a timer from 0
to 8
and notice all fish that have the same timer are "synchronized".This means we can group all the fish that have the same timer and change the O(days*fish)
to O(days*8)
which is equal to O(days)
.
We can achieve this by using a map, where the keys are the timer values, hence 0
-8
, and the values are the number of fish that have this as their timer. That would turn the above input to the following structure
{
0: 0,
1: 3,
2: 2,
3: 2,
4: 1,
5: 0,
6: 0,
7: 0,
8: 0
}
and the size of the map will be constant regardless of the number of fish. Every day we are increasing the timers per group instead of one fish at a time.
We are keeping the findings from Part 1 and the new adjusted code
func Solve(input string, days int) int {
fish := parseInput(input)
tmpFish := make(map[uint8]int, 9)
for i := 0; i < days; i++ {
resetCount := 0
for k, v := range fish {
if k == 0 {
resetCount += v
} else {
tmpFish[k-1] = v
}
}
tmpFish[8] += resetCount
tmpFish[6] += resetCount
fish, tmpFish = tmpFish, fish
clear(tmpFish)
}
counter := 0
for _, v := range fish {
counter += v
}
return counter
}
func parseInput(s string) map[uint8]int {
fish := make(map[uint8]int, 9)
for i := 0; i < len(s); i++ {
if s[i] == '\n' {
break
}
if s[i] == ',' {
continue
}
fish[s[i]-'0']++
}
return fish
}
Results
This solution returns a result - if you recall 5 minutes were not enough to get a result previously - running the benchmark against it produces the following output:
pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10 34027 33479 ns/op 2373 B/op 85 allocs/op
BenchmarkSolution-10 35720 33435 ns/op 2374 B/op 85 allocs/op
BenchmarkSolution-10 35521 33486 ns/op 2374 B/op 85 allocs/op
BenchmarkSolution-10 35900 33392 ns/op 2373 B/op 85 allocs/op
BenchmarkSolution-10 35878 33553 ns/op 2373 B/op 85 allocs/op
PASS
ok advent-of-code/2021/go/day-6 7.960s
Running the benchstat
with the previous solution reveals even more impressive results than before improving the performance of the code up to 98.86% and using 99.89% less memory.
benchstat improved.txt new_algo.txt
name old time/op new time/op delta
Solution-10 2.94ms ± 1% 0.03ms ± 0% -98.86% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Solution-10 2.19MB ± 0% 0.00MB ± 0% -99.89% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Solution-10 45.0 ± 0% 85.0 ± 0% +88.89% (p=0.008 n=5+5)
Bonus point
What if I told you there is one more one-line improvement that could give an increase in the code's performance by another 95.90% and reach zero memory allocations?
benchstat new_algo.txt new_algo_array.txt
name old time/op new time/op delta
Solution-10 33.5µs ± 0% 1.4µs ± 2% -95.90% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Solution-10 2.37kB ± 0% 0.00kB -100.00% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Solution-10 85.0 ± 0% 0.0 -100.00% (p=0.008 n=5+5)
You can try changing the data structure from map[uint8]int
to []int
, the rest of the code will still work. Why is that you might be wondering or you might have already guessed but I leave it as an exercise, use the profiling data and the answer is there.
hint: You can run benchmarks with -cpuprofile
instead of -memprofile
and check the top entries with go tool pprof
Feel free to post the answer in the comments 📝
Conclusion
The code for this puzzle was trivial, but the approach of measuring, finding bottlenecks, improving code and repeating until being satisfied with the results will be the same even in production setups. Learn to leverage the tools that the language and the ecosystem provide and you won't be disappointed.
The gopher art on the image is from MariaLetta.