Data parallel programming is defined as single threaded, global name space, loosely synchronous parallel computation[Hig93]. A data parallel program, in order to be executed on a multicomputer, is first translated into a message passing SPMD program. The SPMD program is then compiled and run on each of (a subset of) the nodes of the multicomputer. The operations in the data parallel program that lead to communication (and synchronization) in the corresponding SPMD program may be classified into the following categories:
;):
This statement describes the
send of a message <msg> to a process <dest>.
<msgtype> is a user defined message type, and in the programs we
consider, messages can be of type
data or request. <msg> is the body of the message.
In a data message we define it to contain fields for the sender id and
the data value. In a
request message it contains fields for
the sender id and the address of the requested
value
or
mtype(<msgtype2>) st <guard2>
... ;): This is a
a blocking selective receive statement. Upon execution, a process waits
until a message of any of the listed message types (<msgtype1>,
<msgtype2> etc) is deposited in its receive buffer. If the message
satisfies the corresponding guard (<guard1>, <guard2>..), then it
is an enabling message, and the process may resume execution.
The guard is a side-effect free boolean expression
that may reference local variables or message fieldsWe assume HPF syntax for the data parallel code. PROCESSORS tells us the rectilinear processors arrangement, which is mapped onto the actual physical processors. For convenience, we assume that this is the same as the number of physical processors. The DISTRIBUTE directive tells us how the data is mapped onto the processor arrangement. In order to simplify presentation, we assume simple block mappings for all the array data. Changing the data mapping will not affect the type of communication pattern generated, although the communication pattern itself will change. The FORALL statement describes a synchronous parallel assignment. Using this terminology, the following data parallel code shows a predictable communication pattern implemented as a deterministic communication pattern (henceforth referred to as Example 1):
The following is an unpredictable communication pattern (henceforth referred to as Example 2) implemented in two ways: as a deterministic and as a non-deterministic communication pattern:Source Code:
INTEGER a(0:7)
!HPF$ PROCESSORS procs(0:7)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs:: a
FORALL (i=0:7) a(i) = a(MOD(i+1,8));
SPMD code for FORALL statement for processor i (C syntax):
int i,a;
send to (i-1)%8 message datamsg{i, a}; /* s1 */
brecv mtype(datamsg) st /* s2 */
( msg.datamsg.sender == (i+1)%8){
a = msg.datamsg.value;
}
In the deterministic implementation each processor broadcasts to all other processors and selects the required value from the incoming messages. Clearly, a lot of unnecessary messages are sent. In the non-deterministic implementation each processor places a request, and then waits for a reply; while waiting, it services incoming requestsSource Code:
INTEGER a(0:7), b(0:7)
!HPF$ PROCESSORS procs(0:7)
!HPF$ DISTRIBUTE (BLOCK) ONTO procs:: a,b
FORALL (i=0:7) a(i) = a(MOD(b(i),8));
Deterministic Implementation:
int a, b, i,j, source;
source = b%8;
for (j=0;j<8;j++){
if (j != i) send to j message datamsg{i,a};
}
for (j=0;j<7;j++){
brecv mtype (datamsg) {
if ( msg.datamsg.sender == source)
a = msg.datamsg.value;
}
}
Non-deterministic Implementation:
int a, b, i, source, recvd;
source = b%8;
send to source message rqstmsg{i,&a};
recvd = FALSE;
while (!recvd) {
brecv mtype (datamsg){ ;
a = msg.datamsg.value;
recvd = TRUE;
}
or mtype (rqstmsg){
send to msg.rqstmsg.sender
datamsg{i, *( msg.rqstmsg.wantaddr)};
}
}
barrier;
In general, we expect data distribution operations to be predictable, as well as data combination operations. Data assignment operations may be unpredictable and hence possibly produce non-deterministic communication patterns as shown in the example above. However, to the best of our knowledge most data parallel compilers exclusively produce deterministic communication patterns. These can be simulated with relatively little overhead as shown in the next section.