next up previous
Next: Conditionals Up: Synchronization Graphs Previous: Sequential loops enclosing

Influence of owner computes rule on synchronization graph

  We assume that the threads of a par statement are partitioned across the processors using the owner computes rule [pin89,CK88,ZBG86]. In cases where the owner computes rule is not applicable, some additional transformations are needed on the synchronization graph. Consider the following example:

 
 range ={1..N-1}

int a[N],c[N];

par (I)

{

a[f(i)] = c[i]; /* s1 */

.. = a[g(i)]; /* s2 */

a[i] = ; /* s2 */

}

As mentioned earlier, we assume all arrays are partitioned across the processors using a simple block mapping. If we partition the threads of the par statement across the processors in the exactly the same way, the owner computes rule is automatically enforced for some assignment statements but not all. In the example shown above, the owner computes rule is enforced for statement s2, but not s1. However, if can be calculated at compile time, the owner computes rule can be enforced by transforming s1 to , in a simple source to source transformation. We assume this transformation is performed whenever possible. If the function is not invertible, the owner computes rule is not obeyed for statement s1: and the thread that assigns to it may be on different processors. Hence, for such a statement, an additional step would need to be taken at runtime to send data to the processor that owns it. The dependence graph does not naturally reveal this additional synchronization requirement. In the case shown above, the dependence graph shows a flow dependence between statements s1 and s2. If exact dependence analysis were possible, this means that all pairs of threads can be identified such that . Clearly it is not sufficient for thread i to communicate only with thread j. We also want to force thread i to communicate with thread , which is the ``rightful" owner of the data, in the sense that it is on the same processor as . To remedy this situation, a dummy statement is added immediately after s1. This ensures a flow dependence with the label from s1 to the dummy statement, and hence makes sure that the owner receives the new valuegif. In practice, we simply add the flow dependence from s1 to itself, and delete the dummy statement. Notice that this has two additional useful consequences:
  1. The output dependence from s1 to s3 becomes intra thread and is automatically enforced, and may be deleted from the synchronization graph.
  2. The label of the flow dependence from s1 to s2 gets simplified to to .
The initial and transformed synchronization graphs are shown in Figure 2.6.

 
Figure 2.6:   Initial and Transformed Synchronization graphs



next up previous
Next: Conditionals Up: Synchronization Graphs Previous: Sequential loops enclosing



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997