typical multi-threaded application in Java contains numerous synchronized methods and statements. They might also contain calls to the methods wait() and notify() that were introduced with Java 1.0, but these methods provide very primitive functionality and are easily misused. Java 5 introduced the java.util.concurrent package, which provides some higher-level abstractions away from wait() and notify(). However, it can still be a challenge to appropriately use the synchronized and volatile keywords. Even when used correctly, getting them used efficiently can require complicated orchestrations of locks.
The biggest criticism of Java’s synchronization is performance. Synchronization blocks become overly encompassing too easily. Although a synchronization block on its own is far from slow, when overly encompassing, it becomes a contested synchronization block. Contested synchronized blocks, or other blocking operations, are slow and require the OS to put threads to sleep and use interrupts to activate them. This puts pressure on the scheduler, resulting in significant performance degradation.
The actor model (native to some programming languages such as Scala) is a pattern for concurrent computation that enables applications to take full advantage of multi-core and multi-processor computing. The fundamental idea behind the actor model is that the application is broken up into “actors” that perform particular roles. Every method call (or message) to an actor is executed in a unique thread, so you avoid all of the contested locking issues typically found in concurrent applications. This allows for more efficient concurrent processing while keeping the complexity of actor implementations low, as there is no need to consider concurrent execution within each actor implementation.
The class in Listing 1 shows what an actor class might look like. This class takes a string of words and saves them to an XML file, and includes a calculated code for every character stored. The code might be used later as an index or to find similar text blocks. Notice that this class is not thread safe and you can only use each instance from a single thread. This is normal, because each actor is used from only one thread. It is common not to have any synchronized or volatile keywords present in an actor class because they are not needed.
Long-lived, normally synchronized objects used by different threads are better off with a dedicated thread—free from any synchronization issues. Each method call is placed in the queue (the order within the queue is not important) waiting until the actor is available to process the call. Think of this queue like your email in-box: messages are received at any time and are acted on when time permits. Typically, calls are asynchronous and do not block, so the calling thread continues execution and avoids any need to rely on thread interrupts. When callers need a result, you can pass a callback object as part of the parameters to allow the actor to notify the caller. In some cases, it is desirable to block the caller until the actor processes the message.
You can separate the storage actor in Listing 1 into a second actor as shown in Listing 2. In this way, the storage actor calls an instance of HexCoderActor with itself as the callback. The storage actor does not wait for the HexCoder to generate the hex code, but instead continues with other items in its queue. This allows the storage actor’s thread to specialize in writing the resulting XML file, while the text code is calculated asynchronously in another thread. Notice how these classes can take advantage of concurrent threads without any special keywords or deep knowledge of concurrent programming.
Every actor needs a manager to allocate and manage its thread. Each actor also needs a proxy to send messages to its queue. Implementing a basic actor manager is straightforward. In Listing 3, shows such a manager written in Java 5. It uses Java’s Proxy object to dynamically wrap an actor, implementing all of the actor’s interfaces. Every method call on the proxy is then queued in an ExecutorService—void methods are asynchronous and other method calls block until the executor has finished executing and the result is available.
Exception Handling and Worker Services
In every program, it is important to test and have proper exception handling. This becomes even more important with multi-threaded programming, because asynchronous execution quickly becomes difficult to debug. Because execution is not done sequentially, a sequential debugger is less useful. Similarly, stack traces are shorter and do not give caller details. In these situations, it is best to either have the actor handle exceptions itself or enable callbacks to handle both successful results and exceptions.
You should also consider that calls to an actor do carry some overhead when compared to sequential calls. You need to queue messages passed to a separate thread and you cannot optimize with compilers in the same manner as sequential calls. This makes the actor model less applicable to smaller, faster objects that are better implemented as immutable or stateful. However, there are also advantages to running actors in a dedicated thread. By avoiding “synchronized” and “volatile” keywords, the on-board chip memory does not need to sync up with the main memory as often, since the actor’s thread is the only thread that can access its variables. Modern compilers can also observe that the head-lock of the queue is only used from its actor thread and optimize it away, making it possible for actors to run without any interruption or mandatory memory flushing. Therefore, use actors for specialized worker services.
An example of worker services is an importing and indexing service. Consider the task of retrieving remote data, processing it locally, and storing it into a local database. You might break this up into three steps:
- Retrieve data.
- Process data.
- Store result.
In this example, the remote data is not retrieved by a single connection, but rather in multiple files that are listed in index files, mixed in with the data files. The remote data is in a format that you cannot process directly and you need to pre-process or format it first. Furthermore, you need to convert the data because it uses a different vocabulary. This creates six steps:
- Retrieve index or data file.
- Format the file for parsing.
- Convert data.
- If index, then list data files and go to step 1.
- Process data files.
- Insert data.
These six steps fit well into the actor model. Think of each of these steps as a job that one or more individuals (actors) need to perform.
Included in this article is an implementation of the above actor model for retrieving remote recipes from multiple sites in multiple formats. Each recipe is listed in one or more index files on the web, and the recipe is in HTML. The program retrieves these silos of information, harvests meaningful data, indexes it, and makes it available in a graphical user interface.
|Author’s Note: Read the terms and conditions of any web site before harvesting its contents.|
With the stage set, let’s introduce the actors:
|RoundRobin||UrlConsumer||Distributes URLs to other actors|
|UrlResolver||UrlConsumer||Retrieves data streams for another actor|
|XhtmlTransformer||StreamTransformer||Formats HTML into XHTML for parsing|
|StyleSheetTransformer||StreamTransformer||Converts remote XML format into local data format|
|RdfParser||StreamConsumer||Parses data stream into data structure|
|SeeAlsoExtractor||RdfConsumer||Extracts URLs from index data|
|IngredientProcessor||RdfConsumer||Applies local processing rules on data|
|RDFInserter||RdfConsumer||Inserts data into a database|
Listing 4 shows how these actors are connected to one another. The manage() methods are typed versions of the ActorManager#manage(Object) in Listing 3.
A ClusterMap and Main class are also provided in the download archive. To run the example, execute the Main class with the following two arguments:http://www.kraftcanada.com/en/search/SearchResults.aspx?gcatid=86 and http://www.cookingnook.com/free-online-recipes.html
|Figure 1. ClusterMap: The tortilla soup recipe is revealed after clicking certain ingredients.|
The Main class then opens the ClusterMap and begins harvesting the recipes. After a few recipes are harvested, select the check-box on the left to see the number of recipes that are harvested and click the clear button at the top to update the list of words extracted from the ingredients section. In this way, you can index and search multiple distinct recipe sites. For example, to find recipes that include lemon, cheddar, and garlic (yum), click on these ingredients and the Tortilla Soup recipe is revealed to include all three ingredients from the recipes harvested (see Figure 1).
In a multi-core system, the program uses over 30 threads to orchestrate the retrieval and processing of the data—downloading and processing as quickly as the remote host provides the data. In spite of the multi-threaded performance, there is no need to consider typical multi-threaded challenges, freeing the developer from worrying about the constraint on what each actor should do.
The actor model is a powerful metaphor to assist in creating multi-threaded applications, and by assigning remote addresses and enabling remote communication between actors, you can extend the model to assist in distributed challenges as well. By including life-cycle and dependency management and making actors aware of their environment, they can become agents, participating in a self-organizing system. This architecture has worked well for many distributed problems such as on-line trading, disaster response, and modelling social structure. It has also been the source of inspiration for many service-oriented architectures.
In essence, the actor model abstracts the nitty-gritty of multi-processor programming away from the developer. This reduces concurrency issues and improves the flexibility of the system. This simple model has a low learning curve, so new developers can quickly see how actors are implemented and understand how they fit together. By managing the actors properly, you can leverage the same implementations from multi-processor systems onto distributed networked systems in a gradual manner that can scale with the development demands.