Six Ways to Say “Hello” in Chapel | Part 3

This article concludes the introduction to Chapel via simple “Hello world” programs that I started in parts one and two of this series. In the previous articles, we’ve looked at serial and data-parallel approaches to saying “hello” in Chapel. This time around, we’ll look at task-parallel ways to do so.

Concurrent Hello World

The following program uses concurrent tasks to print out its “hello” messages:

Image 1

This program replaces the data-parallel forall-loop that we’ve used in previous versions with a coforall-loop. Mnemonically, “coforall” can be thought of as meaning “concurrent forall.” Coforall-loops differ from forall-loops in that they create a distinct task for each iteration of the loop. Because of this, the body of a coforall-loop can use arbitrary synchronization to coordinate between its iterations.

Note that the default number of iterations for this loop is defined by referring to here — the locale, or compute node, on which the task is running. Specifically, we query the maximum number of parallel tasks that it can support (maxTaskPar). In practice, this will typically be the number of processor cores. Thus, running on my four-core laptop, running this program results in four messages:

Image 2

As with our other parallel versions, the messages are printed in an unpredictable order due to the use of a parallel loop. However, unlike the versions that used forall-loops, each message here is guaranteed to be printed by its own unique task.

By default, task-parallel constructs like this coforall-loop create tasks that will run on the same locale as their parent task. Thus, the program above is a shared-memory computation suitable for laptops and desktops. Next, let’s extend this example to support distributed-memory task parallelism.

Distributed Concurrent Hello World

Here’s an program that says “hello” in Chapel using a task-parallel style across all the nodes and cores of a distributed memory system:

Image 3

This program contains only two new concepts. The first is a built-in array, Locales, that drives the first coforall-loop. Its elements represent the set of locales on which the Chapel program is running. Thus, the first line says “create a task for every compute node, referring to it as loc.”

The second new concept is the on-clause seen in the second line. An on-clause tells the current task to execute on the locale specified by an expression. In this case, the expression loc indicates that each task should execute on its corresponding locale. Thus, at this point in the program’s execution, each compute node will be executing a distinct task.

Next, each task reaches the second coforall-loop which creates a task per processor core, as in the first program above. At this point, a task will be running on every core available to the program. Each task then prints out a “hello” message indicating its locale ID along with a unique ID for that locale. Here’s the output running on a four-node system with two cores per node:

Image 4

Once again, just a few lines of code permit us to make use of all the processors on a distributed memory system!

This program illustrates an important characteristic of Chapel: The specification of parallelism and locality are expressed using independent features in Chapel (the coforall loops and on-clauses in this example). From this perspective, the forall-loops of the previous article can be viewed as abstractions that permit a user’s algorithmic code to be unencumbered by complex implementation details related to tasking and locality. This permits a user to simply say “use a cyclic distribution” rather than implementing one explicitly.

This series has given a brief illustration of Chapel’s serial, data-parallel, task-parallel, and locality-oriented features through a series of “Hello world” programs. Such programs are, by nature, trivial compared to real computations that actual parallel programmers will want and need to write; yet they can be very useful for quickly introducing and illustrating new languages. Chapel strives to make such simple cases trivial while also making more complex cases far easier than in current parallel programming models.

For more question about Chapel, join our live Reddit AMA taking place this Wednesday, October 14th beginning at 8:30 a.m. PT. Click here for more details.

Speak Your Mind

Your email address will not be published. Required fields are marked *