We present only the subset of collective communication functions implemented by MPISIM. In order to guarantee completion of a collective communication function at any process, all processes in the communicator must call the function. The programmer is required to write programs assuming such synchronous semantics, although the implementation is free to implement only the amount of synchronization required by the actual call. An important feature of collective communication calls is that there are no non-blocking counterparts. Consequently, it is not possible for a process to have more than one collective communication call pending at any time.
The simplest collective communication function is the barrier, which has the prototype:
int MPI_Barrier(MPI_Comm comm)MPI_Barrier returns when all processes in communicator comm have called it. Clearly, the semantics of MPI_Barrier require it to be a globally synchronizing call.
The broadcast function has the following prototype:
int MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm)In MPI_Bcast, the process ranked root in communicator comm broadcasts count elements of datatype datatype contained in buffer to all other processes in the communicator. The data is received by each other process in the area pointed to by buffer. MPI_Bcast is not generally implemented as a globally synchronizing call.
There are two reduce functions with the following prototypes:
int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) int MPI_Allreduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)MPI_Allreduce combines each corresponding element of the input buffer of each process that is a member of the communicator comm, using the operator pointed to by op, and returns the result in each process's output buffer. The input buffer is pointed to by sendbuf and contains count elements of datatype datatype. The output buffer is pointed to by recvbuf, and has the same size. MPI_Reduce does the same, but returns the result only to the process ranked root. MPI_Reduce is not implemented as a globally synchronizing call, but MPI_Allreduce is, since it is required to be.