Illuminating Unnoticed Efficiency Pitfalls with AtomicLongMap

Aemie Jariwala - Apr 13 - - Dev Community

Benchmark testing for AtomicLongMap


Why am I doing this benchmark test?

It's important to verify if a particular change for a package, in this case Google Guava performs in an optimized manner within a multi-threaded environment and doesn't trigger a thread contention state. Well, I came across one such change in the implementation for AtomicLongMap which suffers from thread contention with the new releases.

Which method within AtomicLongMap is suffering from thread contention?

The underlying method - addAndGet(K key, int delta)

When was a new implementation introduced for addAndGet method in Google Guava?

Google Guava - v21.0

Why was this change introduced?

According to the release notes:

AtomicLongMap: added a number of methods such as accumulateAndGet(K, LongBinaryOperator) that take advantage of new Java functional types.

How did I notice it?

I bumped my java package to use GoogleGuava - v32.0.0 from GoogleGuava - v20.0 and my package blew up on load-testing. It took me a bit of digging through profiling and monitoring and was able to pinpoint that the trigger for the issue is none other than AtomicLongMap! This sparked the curiousity into understanding why did a new release start suffering from thread contention and that's the reason for the benchmark testing.

Enough story time, let's dig into the implementation difference

Image description
via Giphy


Google Guava 20.0 Implementation Flow [Psuedo]

addAndGet(K key, int delta) ---> atomic = map.putIfAbsent(key, delta) 
---> if  atomic.get() == 0 : 
       map.replace(atomic.get(), atomic, delta) 
       return delta
---> else: 
        val = atomic.get()
        atomic.compareAndSet(val, val + delta)
        return val + delta
Enter fullscreen mode Exit fullscreen mode

Google Guava 32.0.0 Implementation Flow [Psuedo]

Note: Upon back-tracking, the below implementation flow was introduced in Google Guava v21.0 so this degradation is existing since v21.0 : https://github.com/google/guava/wiki/Release21 however the benchmark testing is carried out using Google Guava v32.0.0

addAndGet(K key, int delta) ---> accumulateAndGet(key, delta, Long::sum as aFunc) ---> 
updateAndGet(key, o -> aFunc.applyAsLong(o, delta) as uFunc) --->
map.compute(key, (k, x) -> uFunc.applyAsLong(x == null ? 0 : x.longValue())) 
Enter fullscreen mode Exit fullscreen mode

What's the possible difference in their behavior within a multi-threaded environment?

It is use of the specific underlying method of ConcurrentMap that is being utilised and the impact in their performance is majorly driven by how well it behaves in highly contended environment.

In case of Google Guava 20.0, putIfAbsent is utilised and when using putIfAbsent, the operation succeeds only if the key is not already present in the map. If the key is present, the operation does not perform any update and returns the existing value associated with the key. This behavior helps reduce contention because it avoids the need for complex coordination between threads when updating values. This is followed by using compareAndSet which is another atomic operation which may offer better performance in a multi-threaded environment compared to the compute method, especially when you're dealing with atomic updates on a single shared variable. This is because compareAndSet operates directly on a single atomic variable and doesn't involve the overhead of interacting with a concurrent data structure like a map.

In case of Google Guava 32.0.0, compute method is utilized and using compute method
with an accumulator function involves a more complex operation. It requires reading the current value associated with the key, applying the accumulatorfunction to compute a new value based on the old value (if any) and the input,
and then updating the map with the new value. In highly contended scenarios, multiple threads attempting to perform such operations concurrently may lead to increased contention and synchronization overhead, potentially affecting performance.

LET'S GET INTO TESTING !!

Image description
via Giphy

Note: Setup of the testing is present in the End of the blog


Case 1 : NumThreads = 2

  1. Who won the battle? Google Guava v20.0
  2. What's the difference in performance? 78.6%

  • Version = 20.0
Trial Time to Execute (s)
1 14
2 14
3 14
4 14
5 14
Average 14
  • Version = 32.0.0
Trial Time to Execute (s)
1 25
2 25
3 25
4 25
5 25
Average 25

Case 2 : NumThreads = 10

  1. Who won the battle? Google Guava v20.0
  2. What's the difference in performance? 73.2%

  • Version = 20.0
Trial Time to Execute (s)
1 73
2 73
3 73
4 74
5 73
Average 73.2
  • Version = 32.0.0
Trial Time to Execute (s)
1 126
2 126
3 127
4 127
5 128
Average 126.8

Case 3 : NumThreads = 32

  1. Who won the battle? Google Guava v20.0
  2. What's the difference in performance? 72.5%

  • Version = 20.0
Trial Time to Execute (s)
1 236
2 235
3 234
4 234
5 235
Average 234.8
  • Version = 32.0.0
Trial Time to Execute (s)
1 403
2 405
3 401
4 410
5 406
Average 405

Case 4 : NumThreads = 64

  1. Who won the battle? Google Guava v20.0
  2. What's the difference in performance? 70.7%

  • Version = 20.0
Trial Time to Execute (s)
1 474
2 474
3 473
4 473
5 472
Average 473.2
  • Version = 32.0.0
Trial Time to Execute (s)
1 804
2 811
3 809
4 809
5 805
Average 807.6

Conclusion

Image description
via Giphy

Google Guava v20.0 has a simplified implementation which performs 70% better than the later releases for AtomicLongMap.java. Will love to hear other people's opinion on the same and why do you think the new implementation of AtomicLongMap is suffering from thread contention!

Setup of the test

It is a simple benchmark test to test the performance of the 2 GoogleGuava releases addAndGet() method defined within the AtomicLongMap class in a multi-threaded environment

  1. Main.java
import java.time.Clock;
import main.java.com.benchmark.Benchmark;

import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class Main {
    private static Clock clock = Clock.systemDefaultZone();
    private static Benchmark benchmark = new Benchmark();
    private static final long NUM_CALLS = 1000000000;
    private static final int NUM_THREADS = 1;

    public static void main(String[] args) {
        long current = clock.millis();
        System.out.println("Current time : " + current);

        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        IncrementTask task = new IncrementTask(benchmark, NUM_CALLS);

        for (int threadPool = 0; threadPool < NUM_THREADS * 2; ++threadPool) {
            executor.submit(task);
        }

        executor.shutdown();

        try {
            executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        long completed = clock.millis();
        long difference = completed - current;
        System.out.println("Completed time: " + completed);
        System.out.println("Time difference for Google Guava: " + difference/1000 + " seconds.");
    }
}

class IncrementTask implements Runnable {

    private Benchmark benchmark;
    private long numCalls;

    public IncrementTask(Benchmark benchmark, long numCalls) {
        this.benchmark = benchmark;
        this.numCalls = numCalls;
    }

    @Override
    public void run() {
        long threadId = Thread.currentThread().getId();
        String threadName = Thread.currentThread().getName();
        System.out.println("Task executed by Thread " + threadId + " (" + threadName + ")");

        synchronized (benchmark) {
            for (int trial = 0; trial < numCalls ; ++trial) {
                benchmark.increment(1);
            }
        }
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. Benchmark.java
public class Benchmark {

    private final AtomicLongMap<String> counters =  AtomicLongMap.create();
    private final String KEY = "Key";

    public void increment(int delta) {
        counters.addAndGet(KEY, delta);
    }
}
Enter fullscreen mode Exit fullscreen mode
. . . . . . . . . . . .