next up previous
Next: Synchronization Graphs Up: Compilation of Data Previous: Compilation of Data

Introduction

Data parallel programming has emerged as the preferred paradigm for coding a large class of scientific applications. The popularity of data parallel languages arises in a large part from their sequential semantics. Similar to sequential languages, most data parallel languages provide a single locus of control and a global name space that can considerably simplify the design of parallel programs in numerous application areas. The primary constructs provided by data parallel languages include parallel operations (e.g. parallel assignments, parallel function calls and parallel loops), parallel data combinations (e.g. reductions and parallel prefix operations), and data placement pragmas. The pragma or implementation hints are used to specify the distribution of the program data structures among the local processor memories of a distributed memory multicomputer. In the absence of such pragmas, the compilers use various heuristics to generate efficient data distributions e.g. [Kno90,Who91,GB92].

Parallel assignments are perhaps the simplest and most common form of a parallel data operator. For instance, languages like HPF provide a FORALL statement that can be used to specify parallel execution of an assignment operation on sections of an array. When compiled for execution on a distributed memory multicomputer, the compiler typically generates an SPMD (single program multiple data) program with one process assigned to each processor. Each process executes the corresponding parallel operation on a part of the array section using various heuristics like the owner computes rule[pin89,CK88,ZBG86]. As the globally addressed data structure is decomposed among the local memory of the multiple processors, communication may be necessary among the processors to implement the corresponding data parallel operation. Parallel operations may specify either a static or a dynamic communication pattern. In the former case, given a data distribution, the communication needed in the target SPMD program are known at compile time; in the latter case, the communication pattern is known only at run-time.

In addition to communication, the processes in the SPMD program also need to be synchronized. The parallel operations supported by a specific data parallel language assume an implicit synchronization granularity. These barrier synchronizations are what allows a data parallel language to have a single locus of control and essentially sequential semantics. For instance, the HPF FORALL statement assumes implicit synchronization at the level of expression evaluation. This requires that, at least logically, the right-hand side of parallel operations must be evaluated for all elements, before the values can be assigned to any corresponding left-hand side. Other languages like Kali[MR90] assume that the parallel operations are synchronized only at the beginning of successive iterations of its for loop. The target code generated for these constructs on distributed memory computers must implement the semantics of the corresponding barrier synchronizations.

Existing approaches for the compilation of parallel assignments on distributed memory architectures can be briefly summarized as follows: for each parallel operation, the compiler attempts to deduce the communication pattern at compile time. If this is possible, the corresponding send, receive and possibly other communication commands can be generated at compile time and additional synchronizations are unnecessary. If the communication pattern for a statement cannot be deduced at compile time, an additional global synchronization step is included in the compiled code at the specified synchronization granularity[BCF94]. For parallel operations that appear within loops, the inspector-executor method [SCM90] is used to eliminate the global synchronization step for subsequent repetitions of a repetitive communication pattern.

The work described in this chapter extends the compilation of parallel assignments for multicomputers in the following directions:

In the following section, we show how synchronization graphs can be constructed for the parallel assignment statement using dependence analysis tools that have been developed for sequential programs. In section 2.3 we describe multicomputer implementation of these statements using synchronization graphs and barriers. In section 2.4, we show how barrier synchronizations used to implement static communication patterns can be replaced by cheaper pairwise and clustered communication operations. In section 2.5, we present the adaptive synchronization method for dynamic communication patterns and compare its performance with the standard techniques for their compilation. Section 2.6 shows a comparison of our work with existing work. Section 2.7 contains a brief summary.



next up previous
Next: Synchronization Graphs Up: Compilation of Data Previous: Compilation of Data



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997