We assume that the threads of a par statement are partitioned across the processors using the owner computes rule [pin89,CK88,ZBG86]. In cases where the owner computes rule is not applicable, some additional transformations are needed on the synchronization graph. Consider the following example:
As mentioned earlier, we assume all arrays are partitioned across the processors using a simple block mapping. If we partition the threads of the par statement across the processors in the exactly the same way, the owner computes rule is automatically enforced for some assignment statements but not all. In the example shown above, the owner computes rule is enforced for statement s2, but not s1. However, ifrange={1..N-1}
int a[N],c[N];
par (I)
{
a[f(i)] = c[i]; /* s1 */
.. = a[g(i)]; /* s2 */
a[i] = ; /* s2 */
}
can be calculated at compile time, the owner
computes rule can be enforced by transforming s1
to
, in a simple source to source transformation.
We assume this transformation is performed whenever possible.
If the function
is not invertible, the owner computes
rule is not obeyed for statement s1:
and the thread that assigns to it may be
on different processors. Hence, for such a statement, an additional step would
need to be taken at runtime to send data to the processor
that owns it. The dependence graph does not naturally reveal
this additional synchronization requirement.
In the case shown above,
the dependence graph shows a flow dependence between
statements s1 and s2. If exact dependence analysis were possible,
this means that all pairs of threads
can be identified such that
. Clearly it is not sufficient for thread i to communicate
only with thread j. We also want to force thread i to communicate with
thread
, which is the ``rightful" owner of the data,
in the sense that it is on the same processor as
.
To remedy this situation, a dummy
statement
is added immediately after s1.
This ensures a flow dependence with the label
from
s1 to the dummy statement, and hence makes sure that the owner
receives the new value
.
Figure 2.6: Initial and Transformed Synchronization graphs