50 million messages per second on a single machine is mind blowing!
We have measured this for a micro benchmark of Akka 2.0.
As promised in Scalability of Fork Join Pool I will here describe one of the tuning settings that can be used to achieve even higher throughput than the amazing numbers presented previously. Using the same benchmark as in Scalability of Fork Join Pool and only changing the configuration we go from 20 to 50 million messages per second.
The micro benchmark use pairs of actors sending messages to each other, classical ping-pong. All sharing the same fork join dispatcher.
Hardware and configuration:
- Processor: 48 core AMD Opteron (4 dual-socket with 6 core AMD® Opteron™ 6172 2.1 GHz Processors)
- Memory: 128 GB ECC DDR3 1333 MHz memory, 16 DIMMs
- OS: Ubuntu 11.10
- JVM: OpenJDK 7, version “1.7.0_147-icedtea”, (IcedTea7 2.0) (7~b147-2.0-0ubuntu0.11.10.1)
- JVM settings: -server -XX:+UseNUMA -XX:+UseCondCardMark -XX:-UseBiasedLocking -Xms1024M -Xmx2048M -Xss1M -XX:MaxPermSize=128m -XX:+UseParallelGC
- Akka version: 2.0
- Dispatcher configuration other than default:
parallelism 48 of fork-join-exector
throughput as described
Here is the result of using different values for the throughput setting of the dispatcher. 5 is the default value. The test was run with 96 actors and each test result was based on at least 15 seconds of execution time (960 million messages), long warmup excluded.
As you see the number of processed messages per second increase dramatically with increased throughput configuration setting up to 20.
When using even higher throughput values the curve becomes more flat, but with a maximum above 50 million messages per second.
What is the magic behind the throughput setting?
It configures how many messages an actor should process in a batch. For example throughput=20 means that once the dispatcher schedules a thread for the actor it will continue to process 20 messages, if the mailbox isn’t empty, before returning the thread to the pool.
The trade-off is that other actors that use the same dispatcher might have to wait longer before they get a chance to run, i.e. you trade higher throughput for increased latency. It is the classic tradeoff of throughput vs fairness. The optimal value depends on your use case, e.g. how long the message processing time is.
A related configuration setting is throughput-deadline-time, which defines how long time the actor is allowed to continue to process messages from the mailbox before the thread is returned to the pool.
Finally, let’s take a look at how the message throughput (msg/s) scales with number of actors when using throughput configuration value 200.
As you can see, we now get more than 50 million messages per second. Not bad at all. Download Akka 2.0 yourself and give her a spin.