next up previous
Next: Evaluation of definition Up: Barrier Weakening for Previous: Classification of dynamic

Barrier elimination for reader unknown patterns

As described in section 2.3.2, the barrier implementation of a flow dependence arc that leads to read requests requires two barriers: one to enforce the flow dependence itself (i.e. prevent early requests), and one to enforce the subsequent anti dependence (i.e. prevent late requests). Using the methods presented until this point, the implementation of a reader unknown communication pattern cannot be optimized further.

Consider the data parallel code in Figure 2.12(a) and its corresponding synchronization graph in Figure 2.12(b). The flow dependence between s1 and s2 would be implemented by a barrier between the two statements, and the anti dependence between s2 and s1 would be implemented by a barrier before s1, since the anti dependence is carried across the iterations of the sequential loop. The barrier for enforcing the anti dependence can be avoided by using the standard technique for eliminating anti-dependence: make a new copy of variable for every iteration. The barrier for enforcing the flow dependence can be avoided if each request can be associated with the iteration in which it is generated; subsequently if it arrives early at the destination, it can be stored in a temporary buffer and serviced only after the corresponding copy is defined. It follows that a temporary array of K elements may be used to avoid any barrier synchronizations for K iterations. So, within these K iterations, threads adaptively synchronize only with those threads from which they need data. After K iterations have been executed, the copies can be reused as discussed subsequently.

The concept of variables which queue requests until definition is essentially that of I-structures [ANP89] or definition variables [Cha91]. Figure 2.12(c) shows the use of definition variables in implementing the code of Figure 2.12(a). Each definition variable consists of three fields: a value, a defined/undefined bit, and a pointer to a list of early requests. In the implementation shown, a barrier is executed only once in every K iterations. Hence, each thread needs to have an array of K definition variables. At statement s1, after a is defined, the corresponding definition variable is defined as well. All requests to it must be replied to as well, and that function is performed by the routine service. Subsequently each thread places a request to the thread from which it needs data. We assume a third value for the read/write bit (no longer just a bit) which identifies the request as being a definition variable request. The function essentially does just that, while servicing incoming requests. Note that this function, as well as the function poll, must be modified to enqueue early requests for definition variables. After K iterations, the definition variables need to be reused, so a barrier is executed to synchronize all threads, and subsequently all definition variables are initialized to the undefined state by the routine . The code for this and other routines is straightforward and is omitted.

 
Figure 2.12:  Eliminating barriers for reader unknown patterns



next up previous
Next: Evaluation of definition Up: Barrier Weakening for Previous: Classification of dynamic



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997