The method described in the previous section works in exactly the same
way for all types of communication patterns. Consider the flow dependence
from s1 to s2 with label
in Figure 2.7(a) and the
flow dependence from s1 to s3 with
label
in Figure 2.10.
In the first case, the communication pattern is static,
since in every execution of the code, thread i will always request
data from thread
. In the second case, the communication pattern
is dynamic, since thread i requests data from thread
, which
may be different in each execution of the code. Some static communication
patterns may be revealed by dependence analysis. In other words,
it may be possible to calculate at compile time all pairs of communicating
threads. In such a case, it is possible to dramatically
reduce the cost of implementing such a dependence arc. An improved
implementation of the example in Figure 2.7(a) is shown
in Figure 2.11. Thread i anticipates that thread
needs its data and sends it without waiting for a request.
Figure 2.11: Improved Implementation of Static Communication Pattern
The following improvements are seen over the barrier implementation:
To apply this optimization, the label of every flow dependence arc of the synchronization graph needs to be examined by the dependence solver. The ideal case is an exact analysis for each arc. In such a case the dependence arc can be implemented with maximum efficiency. Dependence solvers generally return a superset of the actual solutions. In the implementation, this results in useless data messages being sent. Consequently, if the analysis yields a large superset of the actual solutions, which will happen mostly with dynamic communication patterns, it is better to use the barrier implementation.
These optimizations can be incorporated in the barrier method as follows: first all flow dependence arcs are examined by the solver. All those that can be optimized are removed from the synchronization graph. If there are antidependences originating from the same reference, they are removed as well. The remaining synchronization graph is handled as described in the previous section. Notice that all communication must now use tags as described in the previous section. As mentioned earlier, this is to ensure that a message meant for a specific receive is not picked up by an earlier receive. Also, at all blocking points in the program, a thread must service incoming requests, in order to prevent deadlock (hence the code in Figure 2.11).