next up previous
Next: Definitions and Assumptions Up: Simulation of Data Previous: Introduction

Compilation of Data parallel Programs

  Data parallel programming is defined as single threaded, global name space, loosely synchronous parallel computation[Hig93]. A data parallel program, in order to be executed on a multicomputer, is first translated into a message passing SPMD program. The SPMD program is then compiled and run on each of (a subset of) the nodes of the multicomputer. The operations in the data parallel program that lead to communication (and synchronization) in the corresponding SPMD program may be classified into the following categories:

From the point of view of communication optimization, the communication patterns resulting from these operations may be classified as predictable or unpredictable [Pra]. Predictable communication patterns are those in which every processor knows which other processor needs its data. Unpredictable communication patterns are those in which a processor does not know who needs its data and hence generally need costly global synchronizations to implement them. For the purposes of this chapter, we introduce another classification of communication patterns. Communication patterns may also be classified as either deterministic or non-deterministic. Just as the (un)predictable classification identifies the patterns that are easy to implement, the (non)deterministic classification identifies the patterns that are easy to simulate. A deterministic communication pattern is one in which every processor receives a deterministic set of messages (unchanged over executions of the program). A non-deterministic communication pattern is one in which the sequence of messages that is accepted at a processor cannot be predetermined. We next present some examples of data parallel code and their equivalent SPMD code, in order to explain the communication pattern classification. We assume the following Maisie[Bag94] like syntax for message passing calls in the SPMD code:

We assume HPF syntax for the data parallel code. PROCESSORS tells us the rectilinear processors arrangement, which is mapped onto the actual physical processors. For convenience, we assume that this is the same as the number of physical processors. The DISTRIBUTE directive tells us how the data is mapped onto the processor arrangement. In order to simplify presentation, we assume simple block mappings for all the array data. Changing the data mapping will not affect the type of communication pattern generated, although the communication pattern itself will change. The FORALL statement describes a synchronous parallel assignment. Using this terminology, the following data parallel code shows a predictable communication pattern implemented as a deterministic communication pattern (henceforth referred to as Example 1):


Source Code:

INTEGER a(0:7)

!HPF$ PROCESSORS procs(0:7)

!HPF$ DISTRIBUTE (BLOCK) ONTO procs:: a

FORALL (i=0:7) a(i) = a(MOD(i+1,8));

SPMD code for FORALL statement for processor i (C syntax):

int i,a;

send to (i-1)%8 message datamsg{i, a}; /* s1 */

brecv mtype(datamsg) st /* s2 */

( msg.datamsg.sender == (i+1)%8){

a = msg.datamsg.value;

}

The following is an unpredictable communication pattern (henceforth referred to as Example 2) implemented in two ways: as a deterministic and as a non-deterministic communication pattern:

Source Code:

INTEGER a(0:7), b(0:7)

!HPF$ PROCESSORS procs(0:7)

!HPF$ DISTRIBUTE (BLOCK) ONTO procs:: a,b

FORALL (i=0:7) a(i) = a(MOD(b(i),8));

Deterministic Implementation:

int a, b, i,j, source;

source = b%8;

for (j=0;j<8;j++){

if (j != i) send to j message datamsg{i,a};

}

for (j=0;j<7;j++){

brecv mtype (datamsg) {

if ( msg.datamsg.sender == source)

a = msg.datamsg.value;

}

}

Non-deterministic Implementation:

int a, b, i, source, recvd;

source = b%8;

send to source message rqstmsg{i,&a};

recvd = FALSE;

while (!recvd) {

brecv mtype (datamsg){ ;

a = msg.datamsg.value;

recvd = TRUE;

}

or mtype (rqstmsg){

send to msg.rqstmsg.sender

datamsg{i, *( msg.rqstmsg.wantaddr)};

}

}

barrier;

In the deterministic implementation each processor broadcasts to all other processors and selects the required value from the incoming messages. Clearly, a lot of unnecessary messages are sent. In the non-deterministic implementation each processor places a request, and then waits for a reply; while waiting, it services incoming requestsgif. Since the number of incoming requests is unknown the pattern is non-deterministic.

In general, we expect data distribution operations to be predictable, as well as data combination operations. Data assignment operations may be unpredictable and hence possibly produce non-deterministic communication patterns as shown in the example above. However, to the best of our knowledge most data parallel compilers exclusively produce deterministic communication patterns. These can be simulated with relatively little overhead as shown in the next section.



next up previous
Next: Definitions and Assumptions Up: Simulation of Data Previous: Introduction



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997