next up previous
Next: Related Work Up: Experiments Previous: Validation

Simulator Predictions

One of the applications of the simulator is in predicting how program performance is affected by factors such as communication latency and even processor speed. Figure 3.2 shows these results for a matrix multiplication program. The X axis of the graph shows the scaled message latency on a logarithmic scale. So, at x=0, the communication latency is that of the IBM-SP2. At x=1 the latency is twice that of the SP2 and so on. The Y axis shows the scaled processor speed on a similar logarithmic scale (x=0 processor speed is that of a SP2 processor, x=1 is that of a processor with half the speed). Consider the 8 processor implementation. When the latency is about 16 times that of the SP2 (about the same as on an ethernet with TCP/IP), program performance becomes less sensitive to processor speed. At this point (x=4), on increasing the processor speed by a factor of 16 (from y=-2 to y=2), only a factor of 4 improvement is achieved in program performance. As message latency drops to 0 (x=-4 on the graph), the sensitivity increases (program performance improves exactly as processor speed). Also, the compiler does a fairly good job of overlapping communication and computation as long as the message latency remains below 4 times that of the SP2. This is evidenced by the fact that program execution time is totally insensitive to latency increase until x=2. Also, the 32 processor implementation of the program is more sensitive to latency increase after x=2, since the synchronization overheads are higher for a greater number of processors. Very similar results are obtained for the Gauss-Jordan elimination program (Figure 3.3). In this way, the simulator can be used to predict in what environments a program would perform well, and where performance bottlenecks exist. Also, due to the fact that the simulator can itself be run in parallel, the results can be obtained quickly. The simulator execution times for Gauss Jordan elimination, for example, are shown in Figure 3.4. is about 11 times faster than , meaning that the 128 processor problem can be simulated 11 times faster using 16 processors than sequentially.

  
Figure 3.2: Matrix multiplication

  
Figure 3.3: Gauss Jordan Elimination

  
Figure 3.4: Simulator Characteristics for Gauss Jordan Elimination



next up previous
Next: Related Work Up: Experiments Previous: Validation



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997