Writing Parallel Programs with Erlang

or years, heating problems have prevented CPU manufacturers from building viable CPUs with clock rates higher than 4.0GHz. This single-core processor speed limitation means that the only viable way to make programs run faster is to run them on more than one processor. Consequently, manufacturers started making processors with multiple cores (i.e., multiprocessor CPUs). In fact, you can expect to see processors with more than 100 cores in the near future. In order to fully exploit the potential of these new processors, software developers will be forced to write parallel programs. As Herb Sutter wrote regarding Moore’s law: “The Free Lunch Is Over.” The transition won’t be easy, though, because writing concurrent programs is hard work.

The main framework for parallel computing in the past 30 years has been single processes, threads, or programs that communicate with one another by reading and writing in a shared memory area and using locks and semaphores. In multiprocessor CPUs, the individual cores talk to each other mainly by sending asynchronous messages. This framework requires you to pay a lot of attention to mutually exclusive algorithms to avoid the simultaneous use of common resources or side effects such as deadlocks from using semaphores.

This article introduces the functional language Erlang, a good choice for writing parallel programs, and explains how you can use it to fully exploit current and future multicore CPUs.

Meet Erlang

Designed at Ericsson laboratories in 1986, Erlang (named for Danish mathematician Agner Krarup Erlang) is used largely in the telecommunications industry. Nevertheless, it is a general-purpose, functional programming language that belongs to a more comprehensive rather than imperative class of declarative languages. One of its most important features is its implementation of concurrent computation according to the Actor model, a mathematical schema that implies that everything is an actor (much like the object-oriented philosophy of everything is an object). Because an actor sends and receives messages and accordingly acts and creates other actors, the Actor model is inherently concurrent, whereas the object-oriented paradigm is practically sequential.

The Erlang software concurrency paradigm leaves no space for state. You can assign variables only once and must perform iteration through recursion. No state means you do not need memory protection, which minimizes side effects as well as the amount of data shared between processes.

Erlang functions are similar to mathematical functions: with the same input they return the same value, regardless of the context of the call. Erlang functions pass each parameter by value, so anything that you can reference is a value. This feature is called reference transparency.

Although variable assignments are subject to strict rules, Erlang partially makes up for that restriction using a powerful mechanism called pattern matching. This feature is similar to the way human beings interpret reality?by determining similarities between real objects and the conceptual schemas in their minds.

A Mathematical Breakdown of Erlang

Number theory contains numerous basic calculations that can benefit from parallelism. For example, consider the calculation of the factorial of a number and the associative property of multiplication. If the factorial of n is:

 
[1] n! = 1 * 2 * 3 *...* n

And if q is an integer such as:

q < nt = n/q

You can write [1] as:

n! = a * b * c * ...* j

Where:

a = 1 * 2 * 3 *...* qb = [(q+1) * (q+2) * ... * 2q)c = [(2q+1) * (2q+2) * ... * 3q)...j = [((t-1)q+1) * ((t-1)q+2) * ... * tq]

If n/q results in a remainder:

n! = a * b * c * ...* j * k

Where:

k = [(tq + 1) * (tq + 2) * ... * n]
Figure 1. Simple Flowchart for the Code: You can see the interactions among the main functions in the code here.

That is, you can divide the whole interval of numbers in t+1 smaller intervals, calculate the partial products, and then multiply them again to obtain the final result.

With a multiprocessor machine, I wondered if by calculating a, b, ..., k simultaneously and then multiplying the results together I could shorten the total elapsed time. In an attempt to solve this problem, I wrote a small piece of Erlang code. While not a usable implementation of the factorial function, the code does serve to explore Erlang syntax and its potential for concurrency computation (see Figure 1).

The remainder of this article shows you how you can use Erlang's message-passing method to enable parallel processes to work together concurrently. During the analysis of the code, I explain some syntax features and show you how recursion, pattern matching, and concurrency work in Erlang. Covering all Erlang syntax and features is beyond the scope of this article, but you can find a lot of documentation on the Internet.

Anatomy of an Erlang Program

Erlang programs start with declarations that contain the dash character (-) in position 0 and that make assertions about the naming, availability, and definitions of functions. Take a look at a couple of declarations:

-module(fact). 

Modules

Erlang uses modules. The module fact resides in a file named fact.erl, and it requires the module declaration at the top (click here to download the file). As you likely noticed, all statements have to be terminated with a dot:

-export([start/2]).

In each module, you can write a lot of functions, but only the ones exported can be seen and called outside the module itself. The notation start/2 means that the exported function named start accepts two arguments:

start(ProcNum, Number) ->    ComputationStart = erlang:now(),    ...

This is the way you define a function. Again, notice that the whole definition is a statement and that the single instructions that compose it have to be separated by a comma.

Variables

Variables start with a capital letter. To avoid hidden effects, you can bind them to a constant value only once. A variable can be bound or not but the compiler will issue a warning if you use an unbound variable!

Erlang provides a few types of variables. The most used are numbers, atoms, lists, and tuples. Atoms are constant values with names that start with small letters. Otherwise, you have to enclose them in quotation marks. Lists and tuples respectively store a variable number and a fixed number of numbers, atoms, lists, and tuples. You enclose lists in brackets and tuples in curly brackets. Erlang is a dynamically typed language, so it performs variable type checking at run time.

The following line is the Erlang syntax to call a function that is exported by a module. In this case, the function seq of the module lists returns a list containing all integers from 1 to Number. The basic installation of Erlang provides a set of utility modules such as lists:

compute(ProcNum, lists:seq(1,Number)),

Note the call to the function compute. Here is the definition of that function:

compute(_, Numbers) when length(Numbers) == 1 ->    ...compute(ProcNum, Numbers) when ProcNum == 1 ->    ...compute(ProcNum, Numbers) ->    ...

You can define a function in more than one way. In this case, the three variants of the function differ by the presence of what is called a guard clause. But, as you will see further in the code, the same function can also have different definitions based on the number and the kind of input parameters.

Recursion and Pattern Matching

The following code contains the definitions of the function create_intervals, which divides the total interval of numbers between 1 and n into a number of smaller intervals that can be further treated by parallel processes.

create_intervals(ListLen, Delta) when ListLen =< Delta ->    ...create_intervals(ListLen, Delta) ->    create_intervals(Delta + 1, ListLen, Delta, [[1, Delta]]).create_intervals(StartIndex, ListLen, Delta, L) when ... ->    ...create_intervals(StartIndex, ListLen, Delta, L) ->    create_intervals(...).

A for loop could be a natural choice here, but Erlang substitutes iteration with recursion. Therefore, create_intervals calls itself and recursively builds a list of lists, with two atoms representing the extreme indexes of smaller intervals.

If you look at the same compute function, which divides a list of numbers into groups, calculates the single products for each group, and returns them in lists shorter than the initial one, you realize that applying it recursively at the end of the process results in a number that is the product of the numbers in the initial list. For this reason, the last line in the compute function is:

compute(ProcNum, Products).

Coming back to the function create_interval, note that it has multiple definitions that differ for guard clauses as well as for different input patterns.

Erlang uses pattern matching intensively both in choosing the right function to call and in assigning variables. In fact, you can read the operator = not as "equal to" but as "match to," an assertion that two items are equal also with regard to their shape. It is a subtle distinction but an important concept to keep in mind when reading or writing Erlang code. For example, when you enter the following code:

A = 1.

The system tries to match variable A with 1. If the system uses A for the first time the match succeeds and binds the variable A to 1, from then on A will match only 1 and any other match/assignment will fail.

You can foresee the potential of pattern matching with this example:

L = [a, b, c, d].[A | B] = L.

The notation | separates the head of a list (first item) from the tail (the remaining items). So, in this statement, pattern matching binds the variable A to atom a and B to the list [b, c, d].

Also, if you write:

[C, D | E] = L.

C will be bound to a, D to b, and E to the list [c,d].

Creating Parallel Processes

Erlang's support for concurrency enables you to exploit symmetric multiprocessor (SMP) architectures and multicore CPUs. In fact, processes should be the main driver when you design an Erlang application. The parallelism is provided by Erlang and not by the host operating system. Nevertheless, an Erlang process is halfway between a native thread and a green one: it originates within the Erlang Run Time System (ERTS) but doesn't share any state with other processes.

The following line in the compute function shows how you can create parallel processes in Erlang.

ProductsCollector_PID = spawn(fact, products_collector, [[], self()]),

The function spawn/3 allows you to create another process parallel to the one in which the function is launched. The first parameter is the module from which the process is spawned, the second one is the function to run in parallel and its list of parameters. The whole line returns a special variable, a PID variable, that is bound to the PID of the new spawned process. This value is precious because it is essential for talking to the new process.

Using the flowchart in Figure 1, you can figure out that the function compute spawns the function products_collector and multiple instances of the function partial_compute in parallel. Each instance of partial_compute will calculate the products of a single interval of numbers, in which the initial interval is divided.

Obviously, you cannot predict the time each instance of partial_compute requires to multiply the numbers in its interval. For this reason, I needed an additional process, products_collector, which receives the results from various instances of partial_compute as soon as they finish and can determine when all partial_compute functions are exited.

For more detail, here is how multiple instances of partial_compute are spawned:

lists:map(fun(X) ->             spawn(fact, partial_compute, [X,Numbers,ProductsCollector_PID])          end,           Intervals),

The function lists:map takes two arguments. The second is a list, while the first is the inline definition of a function that has to be applied to the elements of the list. The lists:map function returns another list with the results. This is an example of how in Erlang you can pass names of functions as arguments to other functions.

Talking Between Processes

In Erlang, processes don't share anything and are completely separate from each other. Their only means of communication are messages. As previously stated, the products_collector processes receive messages from (1) partial_compute functions when they finish their calculations and from (2) the function compute when all the partial_compute functions have exited and are asking for the resulting list. Here is the syntax for sending messages in both cases:

Scenario 1ProductsCollector_PID ! {get_products,self()},Scenario 2ProductsCollector_PID ! {add_product,multiplication(FactorsInTheInterval)}.

The schema in both lines sends a message composed of the PID of the process to which the message is sent, the operator !, and the message itself in the form of a tuple containing the information. In the first scenario, for example, the message contains the atom get_product and the PID of the process that is sending the message. This is necessary if you want the contacted process to answer back to the message.

All the sent messages are asynchronous, which means that the program doesn't wait for an answer (if one is due). It immediately executes the next instruction.

Now take a closer look to the function products_collector and the way the processes wait for a message to arrive and perform actions.

products_collector(L, Loop_PID) ->    receive        {add_product, Product} ->             Loop_PID ! {done},            products_collector([Product|L], Loop_PID);        {get_products, Client_PID} ->             Client_PID ! {products, L},            products_collector(L, Loop_PID);        {done} ->             noop    end. 

Using email as a metaphor, the processes have an inbox where all the arriving messages are queued and dealt with using the FIFO schema and the pattern-matching mechanism.

To truly grasp how all this works, it is very important to understand how the receive ... end construct works. Say you have a queue of messages that have arrived and a set of tuples representing different patterns. This is what happens when?and only when?a new message arrives:

  • The first message in the queue (i.e., the first arrived) is matched against the first pattern in the set:
    • For a match:
      • The statements that follow are executed.
      • The message is removed from the queue.
    • For no match:
      • The message is matched against the next pattern in the set.
  • If the first message doesn't match any pattern in the list, it is set apart in a so-called save queue and the second message is handled. All the messages in the save queue will no longer be matched, even after the arrival of a new message.

One of the receive ... end construct's additional features is a timer that stops the waiting for a new message after a time, performs some actions, and then takes messages from the save queue and puts them back into the mailbox in their original order.

How to Run the Code in Debian Linux

In 1998, Ericsson released Erlang as an open-source product and created the web site Erlang.org, which provides the code as well as binaries of the Erlang interpreter for Linux, Mac, and Windows environments. The code download also includes a set of module libraries and a real-time distributed database called Mnesia. The entire framework is called the Erlang Open Telecom Platform (OTP).

In Debian Linux, you simply run the following command:

apt-get install erlang 

The complete Erlang OTP platform, with all its tools and utility functions, will be installed on your machine.

You have to compile Erlang code in order to obtain the pseudocode for the interpreter, which is a file with .beam extension.

erlc fact.erl

Although quite verbose, the compiler is clear and helpful in understanding the philosophy of Erlang. For example, it is able to recognize when a variable is unbound and yet is going to be used.

After compilation, you can run all the functions exported in the module fact. This is very useful while debugging, where you need to find out which function doesn't work properly. Run the Erlang emulator erl and enter the following in its environment:

Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]Eshell V5.7.2  (abort with ^G)1> 

You can recompile the code inside the emulator as follows:

c("fact.erl").

Typically, the function start represents the main to run the program:

2> fact:start(2,12).Time elapsed for factorial of 12 using 2 parallel processes: 1.45e-4 secondsok2> 

Try to calculate the factorial of a large integer with a number of parallel processes greater than 1 and you will see the differences in elapsed time between a traditional CPU and a multicore machine.

As previously stated, you can also test single functions if you export them and use them with the right input:

3> fact:compute(2,[123,234,456,345]).45279842403> 4> fact:multiplication([123,234,456,345]).45279842404>

Here is how you launch the debugger:

Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-poll:false]Eshell V5.7.2  (abort with ^G)1> c("fact.erl",[debug_info]).{ok,fact}2> im().<0.46.0>3> ii(fact).{module,fact}4> iaa([init]).true5> fact:start(2,12).

The statements starting with i launch a graphical processes monitor and a typical debugging window showing the source code (see Figure 2 and Figure 3). Of course, the system will open one of these windows for each process spawned during execution.


Figure 2. Processes Monitor: You can track all the processes managed by the Erlang emulator, which is useful.
 
Figure 3. Debugging Window: Here is a typical debugging window. You can launch one for each process spawned by the Erlang emulator.

The Next Step: LYME

You now hopefully have enough Erlang knowledge to make parallel processes work together concurrently. Nowadays, you will find a lot of production applications developed in Erlang, including Yahoo! del.icio.us, the chat backend of Facebook, Twitterfall (Twitter search engine for trends), and Amazon SimpleDB, many of which take advantage of multicore machines. Why not add your applications to the list? If you want to really dive into the language, the LYME (Linux, Yaws, Mnesia, Erlang) solution stack offers a good environment for writing fully functional solutions for the parallel world.

Share the Post:
Share on facebook
Share on twitter
Share on linkedin

Overview

Recent Articles: