Benchmark testing for AtomicLongMap
Why am I doing this benchmark test?
It's important to verify if a particular change for a package, in this case Google Guava
performs in an optimized manner within a multi-threaded environment and doesn't trigger a thread contention state. Well, I came across one such change in the implementation for AtomicLongMap
which suffers from thread contention with the new releases.
Which method within AtomicLongMap is suffering from thread contention?
The underlying method - addAndGet(K key, int delta)
When was a new implementation introduced for addAndGet
method in Google Guava?
Google Guava - v21.0
Why was this change introduced?
According to the release notes:
AtomicLongMap: added a number of methods such as accumulateAndGet(K, LongBinaryOperator) that take advantage of new Java functional types.
How did I notice it?
I bumped my java package to use GoogleGuava - v32.0.0
from GoogleGuava - v20.0
and my package blew up on load-testing. It took me a bit of digging through profiling and monitoring and was able to pinpoint that the trigger for the issue is none other than AtomicLongMap
! This sparked the curiousity into understanding why did a new release start suffering from thread contention and that's the reason for the benchmark testing.
Enough story time, let's dig into the implementation difference
Google Guava 20.0 Implementation Flow [Psuedo]
- Link to code: AtomicLongMap.java
addAndGet(K key, int delta) ---> atomic = map.putIfAbsent(key, delta)
---> if atomic.get() == 0 :
map.replace(atomic.get(), atomic, delta)
return delta
---> else:
val = atomic.get()
atomic.compareAndSet(val, val + delta)
return val + delta
Google Guava 32.0.0 Implementation Flow [Psuedo]
Note: Upon back-tracking, the below implementation flow was introduced in Google Guava v21.0 so this degradation is existing since v21.0 : https://github.com/google/guava/wiki/Release21 however the benchmark testing is carried out using Google Guava v32.0.0
- Link to code: AtomicLongMap.java
addAndGet(K key, int delta) ---> accumulateAndGet(key, delta, Long::sum as aFunc) --->
updateAndGet(key, o -> aFunc.applyAsLong(o, delta) as uFunc) --->
map.compute(key, (k, x) -> uFunc.applyAsLong(x == null ? 0 : x.longValue()))
What's the possible difference in their behavior within a multi-threaded environment?
It is use of the specific underlying method of ConcurrentMap that is being utilised and the impact in their performance is majorly driven by how well it behaves in highly contended environment.
In case of Google Guava 20.0, putIfAbsent
is utilised and when using putIfAbsent, the operation succeeds only if the key is not already present in the map. If the key is present, the operation does not perform any update and returns the existing value associated with the key. This behavior helps reduce contention because it avoids the need for complex coordination between threads when updating values. This is followed by using compareAndSet
which is another atomic operation which may offer better performance in a multi-threaded environment compared to the compute method, especially when you're dealing with atomic updates on a single shared variable. This is because compareAndSet
operates directly on a single atomic variable and doesn't involve the overhead of interacting with a concurrent data structure like a map.
In case of Google Guava 32.0.0, compute
method is utilized and using compute
method
with an accumulator function involves a more complex operation. It requires reading the current value associated with the key, applying the accumulatorfunction to compute a new value based on the old value (if any) and the input,
and then updating the map with the new value. In highly contended scenarios, multiple threads attempting to perform such operations concurrently may lead to increased contention and synchronization overhead, potentially affecting performance.
LET'S GET INTO TESTING !!
Note: Setup of the testing is present in the End of the blog
Case 1 : NumThreads = 2
- Who won the battle? Google Guava v20.0
- What's the difference in performance? 78.6%
- Version = 20.0
Trial | Time to Execute (s) |
---|---|
1 | 14 |
2 | 14 |
3 | 14 |
4 | 14 |
5 | 14 |
Average | 14 |
- Version = 32.0.0
Trial | Time to Execute (s) |
---|---|
1 | 25 |
2 | 25 |
3 | 25 |
4 | 25 |
5 | 25 |
Average | 25 |
Case 2 : NumThreads = 10
- Who won the battle? Google Guava v20.0
- What's the difference in performance? 73.2%
- Version = 20.0
Trial | Time to Execute (s) |
---|---|
1 | 73 |
2 | 73 |
3 | 73 |
4 | 74 |
5 | 73 |
Average | 73.2 |
- Version = 32.0.0
Trial | Time to Execute (s) |
---|---|
1 | 126 |
2 | 126 |
3 | 127 |
4 | 127 |
5 | 128 |
Average | 126.8 |
Case 3 : NumThreads = 32
- Who won the battle? Google Guava v20.0
- What's the difference in performance? 72.5%
- Version = 20.0
Trial | Time to Execute (s) |
---|---|
1 | 236 |
2 | 235 |
3 | 234 |
4 | 234 |
5 | 235 |
Average | 234.8 |
- Version = 32.0.0
Trial | Time to Execute (s) |
---|---|
1 | 403 |
2 | 405 |
3 | 401 |
4 | 410 |
5 | 406 |
Average | 405 |
Case 4 : NumThreads = 64
- Who won the battle? Google Guava v20.0
- What's the difference in performance? 70.7%
- Version = 20.0
Trial | Time to Execute (s) |
---|---|
1 | 474 |
2 | 474 |
3 | 473 |
4 | 473 |
5 | 472 |
Average | 473.2 |
- Version = 32.0.0
Trial | Time to Execute (s) |
---|---|
1 | 804 |
2 | 811 |
3 | 809 |
4 | 809 |
5 | 805 |
Average | 807.6 |
Conclusion
Google Guava v20.0
has a simplified implementation which performs 70% better than the later releases for AtomicLongMap.java
. Will love to hear other people's opinion on the same and why do you think the new implementation of AtomicLongMap
is suffering from thread contention!
Setup of the test
It is a simple benchmark test to test the performance of the 2 GoogleGuava releases addAndGet()
method defined within the AtomicLongMap
class in a multi-threaded environment
- Main.java
import java.time.Clock;
import main.java.com.benchmark.Benchmark;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
public class Main {
private static Clock clock = Clock.systemDefaultZone();
private static Benchmark benchmark = new Benchmark();
private static final long NUM_CALLS = 1000000000;
private static final int NUM_THREADS = 1;
public static void main(String[] args) {
long current = clock.millis();
System.out.println("Current time : " + current);
ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
IncrementTask task = new IncrementTask(benchmark, NUM_CALLS);
for (int threadPool = 0; threadPool < NUM_THREADS * 2; ++threadPool) {
executor.submit(task);
}
executor.shutdown();
try {
executor.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
}
long completed = clock.millis();
long difference = completed - current;
System.out.println("Completed time: " + completed);
System.out.println("Time difference for Google Guava: " + difference/1000 + " seconds.");
}
}
class IncrementTask implements Runnable {
private Benchmark benchmark;
private long numCalls;
public IncrementTask(Benchmark benchmark, long numCalls) {
this.benchmark = benchmark;
this.numCalls = numCalls;
}
@Override
public void run() {
long threadId = Thread.currentThread().getId();
String threadName = Thread.currentThread().getName();
System.out.println("Task executed by Thread " + threadId + " (" + threadName + ")");
synchronized (benchmark) {
for (int trial = 0; trial < numCalls ; ++trial) {
benchmark.increment(1);
}
}
}
}
- Benchmark.java
public class Benchmark {
private final AtomicLongMap<String> counters = AtomicLongMap.create();
private final String KEY = "Key";
public void increment(int delta) {
counters.addAndGet(KEY, delta);
}
}