Advent of Code: Investigating performance improvements in Go

Georgios Kampitakis - Aug 26 '23 - - Dev Community

I often enjoy trying to solve previous years' Advent of Code puzzles. If you don't know what Advent of Code is make sure you check it out.

TLDR: Advent of Code is an Advent calendar of small programming puzzles for a variety of skill sets and skill levels that can be solved in any programming language you like.

This post is about puzzle 2021/day-6 I was trying to solve that's not very interesting as a puzzle but I found it to be a really good example for exploring how to deep dive into your code's performance, find potential bottlenecks and improve it with tools that Go provides.

Each puzzle on Advent of Code contains two parts. You need to solve the 1st to move to the 2nd one and usually the 2nd is a more complicated or demanding variation of the 1st. This is the approach we are going to follow on this post as well.

Tools

The tools we are going to use are

  • Benchmarks for measuring and analyzing the performance of go code with help from go tool pprof.
  • Benchstat computes statistical summaries and compares benchmarks.

Part 1

I am not going to repeat the problem here for brevity, but if you read the puzzle carefully you can probably come up with a trivial solution as I did the 1st time.

func parseInput(s string) []int {
    data := strings.Split(strings.Split(s, "\n")[0], ",")
    fish := make([]int, len(data))

    for i, d := range data {
        fish[i] = MustAtoi(d)
    }

    return fish
}

func MustAtoi(s string) int {
    n, err := strconv.Atoi(s)
    if err != nil {
        panic(err)
    }
    return n
}


func Solve(input string, days int) int {
    fish := parseInput(input)

    for i := 0; i < days; i++ {
        tempFish := []int{}

        for j := range fish {
            if fish[j] == 0 {
                fish[j] = 6
                tempFish = append(tempFish, 8)
            } else {
                fish[j]--
            }
        }

        fish = append(fish, tempFish...)
    }

    return len(fish)
}
Enter fullscreen mode Exit fullscreen mode

The approach for solving it is:

  • parsing the input and turning it into a list of numbers and then iterating for the number of days
  • for each day I decrease each fish's timer, if the timer reaches zero resets to 6
  • and on a temporary list add a new fish with timer 8 that at the end of the day is going to be appended on the main list.

That's pretty much it, successfully solves the Part 1 and it's fast enough.

ok advent-of-code/2021/go/day-6 0.226s
Enter fullscreen mode Exit fullscreen mode

The purpose of this post though is not to just solve the puzzle.

Before updating the code we need a benchmark of the current implementation as a base

Note: Always measure before optimizing.

var sink int

func BenchmarkSolution(b *testing.B) {
    res := 0
    input := util.ReadFile("input.txt")

    b.ResetTimer()
    b.ReportAllocs()

    for i := 0; i < b.N; i++ {
        res = Solve(input, 80)
    }

    sink = res
}
Enter fullscreen mode Exit fullscreen mode

Benchmarks

We can run the benchmark with

go test ./... -benchmem -run=^$ -v -bench ^Bench -memprofile naive.out -count=5 | tee naive.txt
Enter fullscreen mode Exit fullscreen mode

Which will produce output similar to the following one:

pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10                 235           5210983 ns/op        31653556 B/op       1025 allocs/op
BenchmarkSolution-10                 231           5086091 ns/op        31653547 B/op       1025 allocs/op
BenchmarkSolution-10                 225           5146135 ns/op        31653549 B/op       1025 allocs/op
BenchmarkSolution-10                 231           5191141 ns/op        31653571 B/op       1025 allocs/op
BenchmarkSolution-10                 229           5104518 ns/op        31653558 B/op       1025 allocs/op
PASS
ok      advent-of-code/2021/go/day-6    8.847s
Enter fullscreen mode Exit fullscreen mode

Note: the flag -memprofile memory profiling is enabled which can help to identify memory-related bottlenecks.

You can use the interactive CLI interface for navigating the profiling data

go tool pprof naive.out
# and then list the data for Solve function with 
list Solve
Enter fullscreen mode Exit fullscreen mode

go tool pprof output screenshot highlighted

From this view we can dig into profiling data collected from the benchmark and it's easy to spot the 3 big offenders that allocate memory.

Reducing unnecessary memory allocations leads to improved performance (e.g memory allocation overhead, garbage collection cache friendly behaviour).

Normally while investigating performance issues, you run the benchmarks against every code change to verify your changes and that you actually improving your code, but for brevity we are going to discuss all the improvements and then run the benchmark to see the final result.

Improvements

The first offender is parseInput which is allocating an N sized integer slice, where N is the input of the puzzle. Instead of slice of integers []int we can use a slice of bytes []byte so we can store the same information with less memory
(and as a bonus we skip parsing from string to integer). We can do this due to the puzzle's constraints, input numbers are from 0 to 8.

The second offender can be seen if we do list parseInput

screenshot of profiling data

data := strings.Split(strings.Split(s, "\n")[0], ",") is as well allocating space that we throw right after. Instead, it's better parsing char by char.

Here is the updated parseInput function:

func parseInput(s string) []byte {
    fish := make([]byte, 0, len(s)/2)

    for i := 0; i < len(s); i++ {
        if s[i] == '\n' {
            break
        }
        if s[i] == ',' {
            continue
        }
        fish = append(fish, s[i])
    }

    return fish
}
Enter fullscreen mode Exit fullscreen mode

The third offender is the tempFish slice. On every iteration, we are populating the slice only to throw it on the next step and create a new one and we are keep reallocating memory. We can reuse the same slice across iterations by just clearing the slice instead of creating a new one.

The updated code for the Solve function

func Solve(input string, days int) int {
    fish := parseInput(input)
    tempFish := []byte{}

    for i := 0; i < days; i++ {
        for j := range fish {
            if fish[j] == '0' {
                fish[j] = '6'
                tempFish = append(tempFish, '8')
            } else {
                fish[j]--
            }
        }

        fish = append(fish, tempFish...)
        tempFish = tempFish[:0]
    }

    return len(fish)
}
Enter fullscreen mode Exit fullscreen mode

Results

Rerunning the benchmarks

 go test . -benchmem -run=^$ -v -bench ^Bench -count=5 -memprofile improved.out | tee improved.txt
Enter fullscreen mode Exit fullscreen mode

generates the following, much improved, output

pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10                 388           2945361 ns/op         2193282 B/op         45 allocs/op
BenchmarkSolution-10                 410           2950991 ns/op         2193280 B/op         45 allocs/op
BenchmarkSolution-10                 409           2917432 ns/op         2193278 B/op         45 allocs/op
BenchmarkSolution-10                 404           2968872 ns/op         2193277 B/op         45 allocs/op
BenchmarkSolution-10                 409           2917535 ns/op         2193281 B/op         45 allocs/op
PASS
ok      advent-of-code/2021/go/day-6    7.725s
Enter fullscreen mode Exit fullscreen mode

the benchmark is already looking better but benchstat can help to answer "by how much"

benchstat naive.txt improved.txt 
name         old time/op    new time/op    delta
Solution-10    5.20ms ± 7%    2.94ms ± 1%  -43.51%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    31.7MB ± 0%     2.2MB ± 0%  -93.07%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10     1.02k ± 0%     0.04k ± 0%  -95.61%  (p=0.008 n=5+5)
Enter fullscreen mode Exit fullscreen mode

So the new solution is 43.51% faster than the old one and requires 93.07% less memory to do the same job.

Part 2

For the second part, the puzzle is running for 256 days. Even with the previous improvements though, at least on my machine, I can't get a result (after waiting for 5 minutes I gave up).

It's time to think the solution a bit more carefully and use a more efficient algorithm, as the solution is O(days*fish) and the number of fish increases on every iteration. We need to find a way to keep the number of fish constant.

An example input of the puzzle

3,4,3,1,2,2,1,1 ....
Enter fullscreen mode Exit fullscreen mode

We know that a fish can have a timer from 0 to 8 and notice all fish that have the same timer are "synchronized".This means we can group all the fish that have the same timer and change the O(days*fish) to O(days*8) which is equal to O(days).

We can achieve this by using a map, where the keys are the timer values, hence 0-8, and the values are the number of fish that have this as their timer. That would turn the above input to the following structure

{
  0: 0,
  1: 3,
  2: 2,
  3: 2,
  4: 1,
  5: 0,
  6: 0,
  7: 0,
  8: 0
}
Enter fullscreen mode Exit fullscreen mode

and the size of the map will be constant regardless of the number of fish. Every day we are increasing the timers per group instead of one fish at a time.

Visual representation of map structure

We are keeping the findings from Part 1 and the new adjusted code

func Solve(input string, days int) int {
    fish := parseInput(input)
    tmpFish := make(map[uint8]int, 9)

    for i := 0; i < days; i++ {
        resetCount := 0

        for k, v := range fish {
            if k == 0 {
                resetCount += v
            } else {
                tmpFish[k-1] = v
            }
        }

        tmpFish[8] += resetCount
        tmpFish[6] += resetCount

        fish, tmpFish = tmpFish, fish
        clear(tmpFish)
    }

    counter := 0
    for _, v := range fish {
        counter += v
    }

    return counter
}

func parseInput(s string) map[uint8]int {
    fish := make(map[uint8]int, 9)

    for i := 0; i < len(s); i++ {
        if s[i] == '\n' {
            break
        }
        if s[i] == ',' {
            continue
        }
        fish[s[i]-'0']++
    }

    return fish
}
Enter fullscreen mode Exit fullscreen mode

Results

This solution returns a result - if you recall 5 minutes were not enough to get a result previously - running the benchmark against it produces the following output:

pkg: advent-of-code/2021/go/day-6
BenchmarkSolution
BenchmarkSolution-10               34027             33479 ns/op            2373 B/op         85 allocs/op
BenchmarkSolution-10               35720             33435 ns/op            2374 B/op         85 allocs/op
BenchmarkSolution-10               35521             33486 ns/op            2374 B/op         85 allocs/op
BenchmarkSolution-10               35900             33392 ns/op            2373 B/op         85 allocs/op
BenchmarkSolution-10               35878             33553 ns/op            2373 B/op         85 allocs/op
PASS
ok      advent-of-code/2021/go/day-6    7.960s
Enter fullscreen mode Exit fullscreen mode

Running the benchstat with the previous solution reveals even more impressive results than before improving the performance of the code up to 98.86% and using 99.89% less memory.

benchstat improved.txt new_algo.txt
name         old time/op    new time/op    delta
Solution-10    2.94ms ± 1%    0.03ms ± 0%  -98.86%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    2.19MB ± 0%    0.00MB ± 0%  -99.89%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10      45.0 ± 0%      85.0 ± 0%  +88.89%  (p=0.008 n=5+5)
Enter fullscreen mode Exit fullscreen mode

Bonus point

What if I told you there is one more one-line improvement that could give an increase in the code's performance by another 95.90% and reach zero memory allocations?

benchstat  new_algo.txt new_algo_array.txt 
name         old time/op    new time/op    delta
Solution-10    33.5µs ± 0%     1.4µs ± 2%   -95.90%  (p=0.008 n=5+5)

name         old alloc/op   new alloc/op   delta
Solution-10    2.37kB ± 0%    0.00kB       -100.00%  (p=0.008 n=5+5)

name         old allocs/op  new allocs/op  delta
Solution-10      85.0 ± 0%       0.0       -100.00%  (p=0.008 n=5+5)
Enter fullscreen mode Exit fullscreen mode

You can try changing the data structure from map[uint8]int to []int, the rest of the code will still work. Why is that you might be wondering or you might have already guessed but I leave it as an exercise, use the profiling data and the answer is there.

hint: You can run benchmarks with -cpuprofile instead of -memprofile and check the top entries with go tool pprof

Feel free to post the answer in the comments 📝

Conclusion

The code for this puzzle was trivial, but the approach of measuring, finding bottlenecks, improving code and repeating until being satisfied with the results will be the same even in production setups. Learn to leverage the tools that the language and the ecosystem provide and you won't be disappointed.


The gopher art on the image is from MariaLetta.

. . . . . . . . . . . . . . .