Flow dependence arcs of the synchronization graph represent all and only the set of remote data accesses (reads or writes) of a thread. Hence, a simple implementation of the synchronization graph involves embedding a remote data read or remote data write call at the sink statement of each flow dependence arc. Figure 2.7 illustrates this idea.
Figure 2.7: Synchronization Graph Implementation using Barriers
The synchronization graph for the code in Figure 2.7(a)
is shown in Figure 2.7(b). There is one flow dependence
due to the reference
in statement s2. The
code shown in Figure 2.7(c) shows the SPMD program for
each thread. Each thread sends a request to its right neighbor
for its value of a. While the thread is waiting for a reply, it must
reply to incoming requests, otherwise deadlock may occur. Hence
in the while loop it accepts messages of type rply and
rqst. Not all requests may get serviced at this
point in the code, since a thread might receive its reply before
getting a request. Ideally, an interrupt driven service
mechanism (e.g. using active messages [Eic92])
is required, to provide immediate response to incoming
requests, irrespective of what point in the program a thread
is at. Such a mechanism can be approximated by polling for
arrived requests at regular intervals in the program. The function
shown in Figure 2.8 can be used for this purpose.
Clearly, if this function is inserted at regular intervals in the SPMD code shown in Figure 2.7(c), all requests would get serviced. However, if requests get serviced before statement s1 or after statement s3 the replies would be incorrect, since a would not have the desired value. Requests arriving before s1 are called early requests and those after s3, late requests. In order to prevent early requests a barrier must be placed at some point after statement s1. To prevent late requests a barrier must be placed before statement s3. Hence the barriers in Figure 2.7(c).
In essence, the barriers enforce the dependences shown in the
synchronization graph. The first barrier enforces the flow dependence
and the second barrier the anti dependence. Notice too that the
barriers must be responsive, otherwise deadlock
may occur. A responsive barrier is one
in which a thread can respond to requests while waiting at the
barrier
.
Figure 2.7(d) shows an implementation of a responsive
barrier. The underlying barrier algorithm is that of a
simple centralized barrier,
in which every thread sends a barrier message to a central thread, which
waits until all barrier messages have arrived, and then sends a barrier release
message to each waiting thread. This algorithm is shown only for
simplicity in presentation; in practice more balanced algorithms
are used.
Figure 2.9 shows another example of synchronization
graph implementation using barriers. The
flow dependence arc from statement s2 to itself in
Figure 2.9(b) leads to write requests
rather than read requests. Only in programs
in which the owner computes rule cannot be applied, will such flow
dependence arcs occur. In the corresponding
SPMD program in Figure 2.9(c),
thread i, after calculating the LHS of statement
s2 and assigning to a temporary variable, sends a write request to
thread
. While waiting for the acknowledgement, it services incoming
requests. Only one barrier is required to enforce the two flow dependences
of the synchronization graph, since
the dependence arcs overlap.
Figure 2.9: Synchronization Graph with Write Flow Dependences