A Gentle Introduction to Akka Streams by Michael Hamrah
Nice introductory blog post on akka-streams by Michael Hamrah - A Gentle introduction to Akka Streams
Creating Reactive Streams components on Akka Streams
Piotr Trzepil recently posted a nice tutorial on creating Reactive Streams Publishers and Subscribers using the Akka provided ActorPublisher and ActorSubscriber classes. Read more here: https://liferepo.blogspot.co.uk/2015/01/creating-reactive-streams-components-on.html
Great collection of Akka Notes by Arun Manivannan
Arun Manivannan has been writing up a very comprehensive guide / collection of notes along with code samples. You can read them on his blog: rerun.me – Akka Notes.
“Event sourcing and external service integration” sample ported to Akka Persistence
Last year Martin Krasser published a very nice blog post titled “Event sourcing and external service integration” using the Eventsourced library. Since then Eventsourced has evolved into Akka Persistence and introduced new abstractions to make building event sourced applications even easier.
Recently on the akka-user mailing list Iain Hull has posted that he took upon himself the exercise of translating that sample application into current-day Akka Persistence. We hope this sample code will prove useful if you’re just starting with akka-persistence and would like to see a nice sample application.
Case Study: WhitePages Rebuilds with Scala and Akka to Improve Scaling
A new case study about WhitePages’ rewrite to an Scala and Akka based system just came out. Our favourite quote is “Using Akka for one crucial, high-throughput service […] reduced the number of servers required from 60 to 5 with improvements to latency, stability, and consistency”.
Read the entire study here: Case Study: WhitePages rebuilds with Scala and Akka to improve scaling
Routers and load balancing for building scalable highly available apps revisited
Marek Prochera revisits Akka’s different router configurations and deployment patterns in his latest blog post: Building a scalable and highly available reactive applications with Akka! Load balancing revisited.
Akka 2.0.2 released!
Ladies and gentlemen,
We recently released another incremental release in the Akka 2.0 series called 2.0.2 which features several bugfixes, performance enhancements, documentation improvements and deprecations for the upcoming 2.1 release.
Cheers,
√
Watch the Routees
I answered a question in the Akka mailinglist with example code that might be interesting if you need to do something similar. The use case is to perform a job consisting of a bunch of tasks. Delegate the tasks to worker actors, behind a router. Aggregate the results of workers and reply with one answer. Additionally, a worker should retry failed messages a few times, and then stop. If one worker terminates due to failure the whole job should be aborted.
Here is the example code:
The retry is implemented by forwarding the failing message to itself in preRestart. This places the message last in the mailbox, so ordering of messages is assumed to be unimportant for the processing of the tasks. Note that forward is necessary to retain the master as sender.
Workers are restarted in case of failure, and stopped after 2 restarts. This is handled by the supervisor of the worker routees, i.e. the router. A supervisorStrategy is specified when creating the router.
To be able to abort the whole job when one worker is stopped the master subscribe to termination, death watch, of the workers. References to the workers is obtained by sending a CurrentRoutees message to the router, which replies with RouterRoutees. context.watch(actorRef) initiates the death watch subscription. Terminated is delivered when a worker has been stopped.
When the master stops itself the children are also stopped, i.e. the router and its worker routees are stopped together with the master.
Aggregation is implemented by correlating the replies from the workers with a message id.
50 million messages per second - on a single machine
50 million messages per second on a single machine is mind blowing!
We have measured this for a micro benchmark of Akka 2.0.
As promised in Scalability of Fork Join Pool I will here describe one of the tuning settings that can be used to achieve even higher throughput than the amazing numbers presented previously. Using the same benchmark as in Scalability of Fork Join Pool and only changing the configuration we go from 20 to 50 million messages per second.
The micro benchmark use pairs of actors sending messages to each other, classical ping-pong. All sharing the same fork join dispatcher.
Hardware and configuration:
- Processor: 48 core AMD Opteron (4 dual-socket with 6 core AMD® Opteron™ 6172 2.1 GHz Processors)
- Memory: 128 GB ECC DDR3 1333 MHz memory, 16 DIMMs
- OS: Ubuntu 11.10
- JVM: OpenJDK 7, version “1.7.0_147-icedtea”, (IcedTea7 2.0) (7~b147-2.0-0ubuntu0.11.10.1)
- JVM settings: -server -XX:+UseNUMA -XX:+UseCondCardMark -XX:-UseBiasedLocking -Xms1024M -Xmx2048M -Xss1M -XX:MaxPermSize=128m -XX:+UseParallelGC
- Akka version: 2.0
- Dispatcher configuration other than default:
parallelism 48 of fork-join-exector
throughput as described
Here is the result of using different values for the throughput setting of the dispatcher. 5 is the default value. The test was run with 96 actors and each test result was based on at least 15 seconds of execution time (960 million messages), long warmup excluded.

As you see the number of processed messages per second increase dramatically with increased throughput configuration setting up to 20.
When using even higher throughput values the curve becomes more flat, but with a maximum above 50 million messages per second.

What is the magic behind the throughput setting?
It configures how many messages an actor should process in a batch. For example throughput=20 means that once the dispatcher schedules a thread for the actor it will continue to process 20 messages, if the mailbox isn’t empty, before returning the thread to the pool.
The trade-off is that other actors that use the same dispatcher might have to wait longer before they get a chance to run, i.e. you trade higher throughput for increased latency. It is the classic tradeoff of throughput vs fairness. The optimal value depends on your use case, e.g. how long the message processing time is.
A related configuration setting is throughput-deadline-time, which defines how long time the actor is allowed to continue to process messages from the mailbox before the thread is returned to the pool.
Finally, let’s take a look at how the message throughput (msg/s) scales with number of actors when using throughput configuration value 200.

As you can see, we now get more than 50 million messages per second. Not bad at all. Download Akka 2.0 yourself and give her a spin.
Happy hAkking.
Why no mailboxSize in Akka 2 ?
Akka 1.x exposed a method to query the mailbox size, which was available both from inside an actor and from outside. We removed this method in 2.0 for a number of reasons, and since this topic came up on the mailing list multiple times already, here is my attempt at making the rationale accessible to a broader audience and disburden the akka-user group.
What are the problems with querying the mailbox size?
- it takes O(n) time to get some size answer from a concurrent queue, i.e. querying hurts when you will feel the pain the most (it might even take several seconds in case of durable mailboxes at the “wrong moment”)
- the answer is not correct, i.e. it does not need to match the real size at either the beginning or the end of processing this request
- making it “more correct” involves book-keeping which severely limits scalability
- and even then the number might have changed completely by the time you receive it in your code (e.g. your thread is scheduled out for 100ms and you get 100.000 new messages during that time).
So, no mailboxSize in the default implementation. (There are even more reasons as soon as you consider remote actor references and the asynchronous nature of everything within Akka, just to name a few.)
Why would you want to use it anyway?
Assuming there would be a method to query the mailbox size from within an actor (doing it from without is really impossible in a distributed setting), and assuming that the actor detects that messages are piling up faster than it can process them, what would its plan of action be? Slowing down the senders—by way of a bounded queue—is basically the only thing it can do itself. Other than that only the supervisor can do something, but your application is likely not prepared to handle that, since all the supervisor can do is terminate the poor guy, invalidating all the references the senders have.
But I want to check periodically and interrupt my long-running task when new messages arrive …
That is not a good idea, because this way you block the processing of internal messages (system messages) which are used in the implementation of supervision, actor selections and other things. It is a much better approach to break up long-running tasks into smaller packages and have the actor send these to itself continuously until the job is done; that way it stays fully reactive and does not hog resources uncontrollably. Or you hand off the big pieces of work to Futures, compose those with the awesome Future API and feed the result back to the target actor using pipeTo (or callbacks as per onSuccess, etc.).
So, what is the recommended way?
When designing an application, you will immediately spot (most of) the hot spots and create these actors using Routers. And if you forget one, don’t worry, it is quite painless to insert “.withRouterConfig(RoundRobinRouter(10))” when testing reveals a bottle-neck. This gets even better when using Resizers, which add elasticity to the scaling.
Okay, I considered all this and I still want mailbox metrics.
In case you cannot do without, it is quite easy to write your own mailbox implementation, building on the traits in the akka.dispatch package and inserting book-keeping code into enqueue() and dequeue(). Then you could either use down-casting (evil) or keep track of your mailboxes in an akka.actor.Extension (recommended) to access the stats from within your actor and do whatever is necessary.
But wait: did I mention that it might even be easier to tag latency-critical (but not too high frequency) messages with timestamps and react on the age of a message when processing it?
So, in summary: while there still is a way to get the mailbox size, you will probably never actually need it.
PS: In case you need mailbox metrics for monitoring and in general operating a deployed system, you might want to have a look at the Typesafe Console.