This article continues the introduction to Chapel via simple “Hello world” programs that I started in part one of this series. Continuing where we left off:
Distributed Parallel Hello World
My last post ended with the following parallel, distributed-memory Chapel program, sans explanation:
Here’s how this program works: As in previous examples, the first line declares a configuration constant, n, indicating how many messages to print. The next line is a use statement, which makes a module’s contents available to the current scope. In this case, we are ‟use”-ing a standard library module, CyclicDist, which supports the cyclic distribution of rectangular index sets to compute nodes (or locales in Chapel terminology).
The key to this program is the following line, which declares a domain named MessageSpace. In Chapel, domains represent sets of indices. The expression in curly brackets defines the domain’s index set — in this case the integers 1 through n. This domain’s value also includes a dmapped clause, which specifies how it should be implemented on the system. In this case, I’m using the Cyclic distribution (defined by the CyclicDist module), requesting that it start by mapping index 1 to the first locale. From there, it deals out the remaining indices to the locales in a round-robin, or cyclic, manner. Chapel’s domains are used to specify iteration spaces, as in this program’s forall-loop. Though not shown here, domains are also used to declare and operate on arrays, and they can be multidimensional or unstructured.
In the previous article, we learned that a forall-loop’s iterand controls: (1) the tasks that implement the parallel loop and (2) the mapping of iterations to tasks. In addition, a forall-loop’s iterand specifies where its tasks should execute on the system. In this program, MessageSpace is the loop’s iterand, so it controls these policy decisions. By default, cyclically distributed domains like MessageSpace create a task per available core on each of the locales to which the domain is distributed. As we’ll see below, the domain’s indices are dealt out to the locales in a cyclic manner.
The only other new concept in this program is the reference to here when writing the messages. This is a built-in identifier that refers to the locale on which the current task is running. In this case, we are querying the locale for its unique ID so that we can print out which compute node owns each iteration. Here’s the output when running on four nodes of my Cray® XC40™ supercomputer:
As you can see from the output, locale i owns iterations i+1, i+5, i+9, … due to the use of the Cyclic distribution. Though not evident from the output, each locale also chunks the iterations it owns across its processor cores, resulting in two levels of parallelism being used in this forall-loop: multicore and multinode. Five lines of code to express a multilevel parallel computation across all the processors in a distributed system! How do the tools you use today compare?
Of course, the Cyclic distribution is just one strategy for distributing indices to locales and implementing parallel loops over those indices. A user can also specify other distributions in the domain’s dmapped clause, resulting in completely different implementations without touching the computation itself. Programmers also have the option of writing their own parallel iterators and distributions within Chapel. This permits them to create custom parallel abstractions, should the standard library be insufficient for a given computation or architecture. These features provide an excellent separation of concerns, permitting parallel algorithms to be developed independently of their mapping to a parallel system. As a result, computational scientists and parallel programming experts can each focus on their individual areas of expertise within a single Chapel program without stepping on each other’s toes.
The features we’ve used so far in this series are data-parallel in nature, focusing on parallelism that’s driven by collections of indices or data. In the next article, we’ll switch to looking at ways of saying “hello” using Chapel’s features for concurrent, task-parallel programming. In the meantime, if you want to try running or modifying the example above, download Chapel and give it a try.