One of the applications of the simulator is in predicting
how program performance is affected by factors such as communication
latency and even processor speed. Figure 3.2 shows these results
for a matrix multiplication program. The X axis of the graph
shows the scaled message latency on a logarithmic scale. So, at
x=0, the communication latency is that of the IBM-SP2. At x=1 the
latency is twice that of the SP2 and so on.
The Y axis
shows the scaled processor speed on a similar logarithmic scale (x=0
processor speed is that of a SP2 processor, x=1 is that of a processor
with half the speed).
Consider the 8 processor implementation.
When the latency is about 16 times that of the SP2 (about the
same as on an ethernet with TCP/IP), program
performance becomes less sensitive to processor speed. At this point (x=4),
on increasing
the processor speed by a factor of 16 (from y=-2 to y=2),
only a factor of 4 improvement
is achieved in program performance. As message latency drops to 0
(x=-4 on the graph),
the sensitivity increases (program performance improves exactly as
processor speed). Also, the compiler does a fairly good job
of overlapping communication and computation as long as the message
latency remains below 4 times that of the SP2. This is evidenced
by the fact that program execution time is totally insensitive to
latency increase until x=2. Also, the 32 processor implementation
of the program is more sensitive to latency increase after
x=2, since the synchronization overheads are higher for a greater
number of processors. Very similar results are obtained for the
Gauss-Jordan elimination program (Figure 3.3).
In this way, the simulator can be used to predict in what environments
a program would perform well, and where performance bottlenecks exist.
Also, due to the fact that the simulator can itself be run in parallel,
the results can be obtained quickly. The simulator execution times
for Gauss Jordan elimination, for example, are shown in Figure 3.4.
is about 11 times faster than
, meaning
that the 128 processor problem can be simulated 11 times faster using
16 processors than sequentially.
Figure 3.2: Matrix multiplication
Figure 3.3: Gauss Jordan Elimination
Figure 3.4: Simulator Characteristics for Gauss Jordan Elimination