next up previous
Next: Barrier Weakening for Up: Compilation of Data Previous: Other Optimizations

Barrier Weakening for Static Communication Patterns

  The method described in the previous section works in exactly the same way for all types of communication patterns. Consider the flow dependence from s1 to s2 with label in Figure 2.7(a) and the flow dependence from s1 to s3 with label in Figure 2.10. In the first case, the communication pattern is static, since in every execution of the code, thread i will always request data from thread . In the second case, the communication pattern is dynamic, since thread i requests data from thread , which may be different in each execution of the code. Some static communication patterns may be revealed by dependence analysis. In other words, it may be possible to calculate at compile time all pairs of communicating threads. In such a case, it is possible to dramatically reduce the cost of implementing such a dependence arc. An improved implementation of the example in Figure 2.7(a) is shown in Figure 2.11. Thread i anticipates that thread needs its data and sends it without waiting for a request.

 
Figure 2.11:  Improved Implementation of Static Communication Pattern

The following improvements are seen over the barrier implementation:

  1. Fewer messages: In the barrier implementation two messages were required, a request and a reply message. In the improved implementation only one message is required.
  2. Reduced waiting time: In the barrier implementation, a thread waits at a receive for at least two message delays minus the computation overlap time achieved by pulling requests away from receives. In the improved implementation, the waiting time may be zero.
  3. No barriers: A barrier is not needed to enforce the flow dependence since the owner of data automatically sends the correct copy of the data. A barrier is not needed to enforce the following anti dependence for the same reason.

To apply this optimization, the label of every flow dependence arc of the synchronization graph needs to be examined by the dependence solver. The ideal case is an exact analysis for each arc. In such a case the dependence arc can be implemented with maximum efficiency. Dependence solvers generally return a superset of the actual solutions. In the implementation, this results in useless data messages being sent. Consequently, if the analysis yields a large superset of the actual solutions, which will happen mostly with dynamic communication patterns, it is better to use the barrier implementation.

These optimizations can be incorporated in the barrier method as follows: first all flow dependence arcs are examined by the solver. All those that can be optimized are removed from the synchronization graph. If there are antidependences originating from the same reference, they are removed as well. The remaining synchronization graph is handled as described in the previous section. Notice that all communication must now use tags as described in the previous section. As mentioned earlier, this is to ensure that a message meant for a specific receive is not picked up by an earlier receive. Also, at all blocking points in the program, a thread must service incoming requests, in order to prevent deadlock (hence the code in Figure 2.11).



next up previous
Next: Barrier Weakening for Up: Compilation of Data Previous: Other Optimizations



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997