Two optimizations can be made to the barrier implementation in the previous section:
Figure 2.10: Example in which request cannot be pulled up to source
An important consequence of separating requests from receives is that it is possible for many replies to be present when a receive is executed. The message tag is used to disambiguate replies in these situations. Each flow dependence arc in the synchronization graph is assigned a unique integer tag at compile time. All communication due to that arc uses the integer tag. Hence, at a receive statement for a particular dependence arc, only the reply with the tag corresponding to that dependence arc should be accepted.
threads per processor.
If the threads on a processor execute asynchronously, there will be
many messages exchanged between threads across processors. Since message
startup is extremely costly, it is very beneficial to coalesce messages.
This is done by having all the threads on a processor execute synchronously
within a single process, so that at the point of sending a request, the
requests of all threads can be collected and sent as one message per remote
processor. Hence each message now contains the requests, replies or
acknowledgements of many threads.