Q-Logic IB6054601-00 D 用户手册

下载
页码 122
IB6054601-00 D
A-1
Appendix A
Benchmark Programs
Several MPI performance measurement programs are installed from the 
mpi-benchmark
 RPM. This Appendix describes these useful benchmarks and how 
to run them. These programs are based on code from the group of Dr. Dhabaleswar 
K. Panda at the Network-Based Computing Laboratory at the Ohio State University. 
For more information, see:
http://nowlab.cis.ohio-state.edu/
These programs allow you to measure the MPI latency and bandwidth between two 
or more nodes in your cluster. Both the executables, and the source for those 
executables, are shipped. The executables are shipped in the 
mpi-benchmark
 
RPM, and installed under 
/usr/bin
. The source is shipped in the 
mpi-devel
 RPM 
and installed under 
/usr/share/mpich/examples/performance
.
The examples given below are intended only to show the syntax for invoking these 
programs and the meaning of the output. They are NOT representations of actual 
InfiniPath performance characteristics.
A.1
Benchmark 1: Measuring MPI Latency Between Two Nodes
In the MPI community, latency for a message of given size is defined to be the time 
difference between a node program’s calling 
MPI_Send
 and the time that the 
corresponding 
MPI_Recv
 in the receiving node program returns. By latency, alone 
without a qualifying message size, we mean the latency for a message of size zero. 
This latency represents the minimum overhead for sending messages, due both to 
software overhead and to delays in the electronics of the fabric. To simplify the 
timing measurement, latencies are usually measured with a ping-pong method, 
timing a round-trip and dividing by two.
The program 
osu_latency
, from Ohio State University, measures the latency for a 
range of messages sizes from 0 to 4 megabytes. It uses a ping-pong method, in 
which the rank 0 process initiates a series of sends and the rank 1 process echoes 
them back, using the blocking MPI send and receive calls for all operations. Half 
the time interval observed by the rank 0 process for each such exchange is a 
measure of the latency for messages of that size, as defined above.   The program 
uses a loop, executing many such exchanges for each message size, in order to 
get an average. It defers the timing until the message has been sent and received 
a number of times, in order to be sure that all the caches in the pipeline have been 
filled.