In Akka 2, there is a nifty little thing called the BalancingDispatcher, which will magically distribute work to a collection of Actors in the most efficient way possible (i.e. it’s a work stealing dispatcher). The SmallestMailboxRouter has this kind of feel as well. However, the
BalancingDispatcher and the
SmallestMailboxRouter differ in how the choice is made about who, and when to deliver the incoming message. The
BalancingDispatcher dispatches a message to an Actor only when that Actor would otherwise be idle. The
SmallestMailboxRouter dispatches the message to the Actor with the least number of messages in its Mailbox even while the Actor is currently working on other things.
Often times, people want the functionality of the
BalancingDispatcher with the stipulation that the Actors doing the work have distinct Mailboxes on remote nodes. In this post we’ll explore the implementation of such a concept.
I’m going to assume you understand Akka to a reasonably competent level. Most people don’t get to the point of asking how to do something like this without understanding a fair bit about Akka in the first place, so I’m going to pretend that my assumption is reasonable. :)
First, take a quick look at the
BalancingDispatcher. Essentially, it works by sharing a single Mailbox amongst all of its Actors. It doesn’t really “steal” work from its Actors; it just intelligently distributes that work by ensuring that the next message gets sent to an idle Actor, and if one isn’t available, then the message doesn’t get sent until that changes.
BalancingDispatcherdistributes work to Actors evenly by having them all share the same Mailbox.
SmallestMailboxRouter differs because the Actors to which it routes all have distinct Mailboxes. So, while the
BalancingDispatcher makes its decision on which Actor gets the message at message processing time, the
SmallestMailboxRouter makes its decision on which Actor gets the message at message delivery time.
SmallestMailboxRoutersends messages to Actors immediately, regardless as to how much time they may send working on their messages in the future. The key is that it sends the message to the Actor with the smallest Mailbox.
We’re going to rely on two key components: the Master and Worker. The Master will help coordinate the work and the Worker will execute on that work. The qualities of the pattern are as follows:
- Implement the fundamental concept of the
BalancingDispatcher- that which distributes work only to those that can work on it immediately - while still having distinct Mailboxes per Worker.
- We must have distinct Mailboxes per Worker since the Workers are on remote nodes and if they didn’t have Mailboxes then there would be nowhere to deliver their messages.
- Preserve the original sender of the work. This allows the Worker to respond to the original sender when the work is complete.
- Provide a mechanism for the Master to detect death of a Worker and reassign the work if need be.
- Allow the capacity of the system to increase by the addition of Workers dynamically.
The Pattern Overview
In order to deliver on our goal we are going to invert the work-sending paradigm of the
BalancingDispatcher. The Master will not push work to the Workers; instead, the Workers will pull that work from the Master on-demand. There are a number of advantages to inverting this message flow, as we’ll see.
One of the key points to remember about this pattern is that it’s entirely event driven with no polling. Message number 2 might make it look like there is polling, but you can rest assured that it isn’t - message number 2 is sent as the result of a previous event, not a timer.
The Master and its Workers
In order for the pattern to be valid we must have concrete instances of the Master (on some node) and the Workers (on the same node, or other nodes). The relationship between the two is key: Workers identify themselves to the Master - the Master doesn’t spawn Workers. A variant on this pattern would be to have the Master do exactly that, but this definition does not include that notion.
The caveat, therefore, is that the Master must be running before the Workers try to register with it, and the Workers must be aware of the Master’s location. Another variant could use failure / retry handling in the Workers to ease the need for this startup ordering, but that’s a detail we’re not concerned with here.
This constraint opens up the flexibility of the system:
- There’s no clunky configuration of the Master required. We don’t need to “reconfigure it” every time we want to add a new Worker.
- You can increase capacity dynamically. If you need more Workers, just instantiate them and they’ll get work assigned to them as soon as it becomes available.
This is essentially how Hadoop's MapReduce system works, and it's the same way many systems before it have worked as well.
Now that we know what the pattern looks like we can describe the operation of the Master. For the most part, our Master will take on the role of the
BalancingDispatcher's single Mailbox. It will collect up work to be done, and distribute it as required amongst the Workers, but only at the moment when those Workers can actually work on it.
Above we see that “Worker 3” is in a position to ask for work, while the others are busy. The Master will give that work out to “Worker 3” and hold on to the rest, waiting until another Worker (or it may actually end up being “Worker 3” again) asks for work.
We start with the message protocol that passes between the Master and its Workers. The following defines both messages sent to and messages sent from the Master.
The Master Code
Next we see the master code. Its job is to queue work sent to it from the outside world, and handle the three messages that can be sent from the Workers. It also puts Deathwatch on the Workers in order to take action should any of those Workers croak.
The Worker’s job is to switch between two states: idle and working. When it’s idle, it’s looking for work and when it’s working it’s trying to go idle. The Worker responds to messages differently depending on what state it’s currently in.
We implement the Worker as an abstract base class, from which you would derive and implement the
doWork() method. It’s intended that this
doWork() method do its job asynchronously but that’s not a requirement. However, given how simple it is to spawn of work in a Future with Akka, why wouldn’t you?
Implementing a Worker
In order to use the Worker, we need to create a derivation. For the purposes of this example, and testing, we’re going to create a Worker that sends a string to the original work requester.
Note how we’ve spawned the work off into its own Future that will execute asynchronously. When it’s complete it will inform the Worker of that fact using the WorkComplete message.
doWork()method. Send the work off to the future, which will allow your Worker to remain responsive to the Master.
And we now verify that everything works by implementing a test.
Possible Extensions / Modifications
What’s been presented is merely a pattern as instruction on the mechanism for achieving a
BalancingDispatcher-style work distribution system across multiple nodes. There are a number of variants you could make of this pattern.
- Forget the whole “idle” thing. When a Worker sees that there’s no more work to be done, it can stop itself (i.e.
context.stop(self)). Given the current implementation, the Master would detect this and remove it from the list of possible candidates.
- Remove the side-effects. Have a tighter relationship between the Master and its Workers; the Worker pulls work from the Master and actually returns results back to the Master. Personally I can’t see a reason to do this.
- Have the Master spawn the children as needed. This would create them as direct children, which is certainly different than what we have here but if that’s what you want to do then do it. The pull semantics don’t change, however. You still pull - you just create the Worker as part of the Master logic.
- Any retry-logic that you might want would have to live in the Master, but idempotency and retry are way beyond the scope of this article. You can implement retry in the Master and / or dovetail this concept into the Clustering features that are coming up in the 2.x stream of Akka.
- If you’re smart, and implement
doWork()in the future, then you can have the Worker be responsive to the Master’s requests for status (for example).
Let’s take a different look at how you might use this, just to illustrate that it’s still a very general implementation. If you want to download a ton of pages in parallel across a huge farm of nodes, collect them up, and then dump them to third party (say, a page renderer) when they’re all in… piece of cake.
The key to this pattern isn’t the implementation or anything fancy. What’s important is the inversion of what, for some, is their natural inclination. When people think of an on-demand, event-based system, they think push, but push doesn’t work here. You can’t push to an Actor unless you know it will be available to do your work, and since you can’t peek into its Mailbox, you’re not going to be able to find out.
Alternatively you might think of implementing a model where the Master can ask the Worker what it’s doing and push work it’s way when it says it’s idle. This is a bad idea for a couple of reasons.
- What if the Worker says “I’m busy, leave me alone”? When can the Master ask again?
- It’s starting to look like polling, and polling is going to slow things down a lot.
- What if the Worker says “I’m idle”, but one millisecond later it becomes busy?
- The Master will send it work to do, and it won’t do that work for (possibly) a very long time.
You can try to work around those issues by setting up some sort of transactional nature between the Master and the Worker - something like, “If you’re not busy, tell me you won’t do anything else until I send you something”. But why? Invert from push to pull, and none of these problems exist. With pull, you could even have a single Worker service many Masters without conflict, should you choose to do so.
The bones of this pattern should give you highly reactive, fast, and efficient use of your Workers across multiple nodes. Implement and alter to suit your needs, and then collect your profit!
About the Author
Derek Wyatt is a Software Developer and Architect at Primal, helping to create Interest Networks using graphs and other cool stuff. He is also the author of an upcoming book on Akka to be published by Artima Publishing. You should go get a copy when it comes out… really. It’ll have pictures! Nothing dirty, though…