next up previous
Next: Predicting the Execution Up: Simulation of Message Previous: Simulator Goals

Simulation Model

  We assume the target program executes as one process per processor. In the simulation model, there is a logical process (LP) corresponding to each process of the target program execution. The simulation model may be executed on any number of host processors from one to the total number of LPs in the simulation model. LPs must be distributed on the host processors using some partitioning scheme. We assume a block partitioning scheme. The simulation technique we use is called execution driven simulation (or direct execution): an LP simulates its allotted process by executing the target program with the inputs of its allotted process. Each LP has a message queue associated with it, which stores incoming messages until they are accepted. At any point in the simulation, the simulation time of an LP is the time the LP assigns to the corresponding point in the target program execution.

To reproduce an execution trace of the target program, an LP in the simulation model starts by setting its simulation time to zero, and takes the following actions, when executing the target program:

  1. (When executing a local code block) Predicts the duration of execution of each local statement (of the local code block) on the target machine, so that it can assign a correct simulation timestamp to the following statement (local or other). In most cases, we are not interested in the simulation timestamp of each individual statement in the local code block. Consequently, the LP can just as well predict the execution time of the whole local code block, without affecting the rest of the trace.
  2. (When executing a send statement) Predicts the receive timestamp of each message that it sends. The receive timestamp is the time at which the message would have been received in the target program execution. An LP calculates the receive timestamp as the simulation timestamp of the send statement plus the predicted communication latency of the message.
  3. (When executing a receive statement) Accepts messages from its queue in receive timestamp order, rather than the order in which they are physically deposited in its queue. Upon accepting the message, the simulation time of the corresponding target process is set to the maximum of the simulation timestamp of the receive statement and the receive timestamp of the accepted message. If there is more than one matching message with the same receive timestamp, the same rule must be used to break the tie as would be used in the target program execution. An example of a rule is: select the message of the LP with the least id.

We focus on modeling the standalone execution of the target program. Consequently, when predicting the execution duration of local code blocks and the communication latencies of messages, we do not model delays occurring due to the sharing of the target processor and the interconnection network with other programs. Notice that the standalone execution of the target program defines the ideal target trace of the simulation model. However, this definition is not precise, because a target program may have several traces even when executing in standalone mode. Only if the state of the target machine when starting the standalone execution is fixed, can a single trace be obtained. In practice, (a) fixing the state of the target machine to collect the ideal target trace is an unrealistic task, and more importantly (b) we have found that for all the programs we have come across, the variation in execution time among the different traces obtained in standalone mode is well within the resolution of our simulation model, and consequently we do not need to isolate a single ideal target trace.

Section 4.4 contains a detailed discussion on the prediction of the execution time of local code blocks.

The communication latency of a message is most accurately predicted using a simulator for the communication protocol and the interconnection network. Such a simulator captures the queuing delays encountered by each message both in software (i.e. communication protocol) and the hardware (i.e. interconnection network), due to other messages of the target program. However, we have found that these queuing delays are not significant in many cases, and hence we treat the communication latency of a message as a function only of its size. Consequently, given a message size, communication latency is predicted using a simple analytic model.

Accepting messages in receive timestamp order is the job of the simulation protocol. In Section 4.5 we motivate and describe the simulation protocols we use for executing the simulation model in parallel. Section 4.6 contains a brief summary of the work most closely related to ours.



next up previous
Next: Predicting the Execution Up: Simulation of Message Previous: Simulator Goals



Andy Kahn
Wed Jun 25 20:28:02 PDT 1997