Escali Escali, LLC Network Router 4.4 Manual Do Utilizador

Página de 81
Scali MPI Connect Release 4.4 Users Guide 
37
Chapter 4 
Profiling with Scali MPI Connect
The Scali MPI communication library has a number of built-in timing and trace facilities. These 
features are built into the run time version of the library, so no extra recompiling or linking of 
libraries is needed. All MPI calls can be timed and/or traced. A number of different environment 
variables control this functionality. In addition an implied barrier call can be automatically 
inserted before all collective MPI calls. All of this can give detailed insights into application 
performance. 
The trace and timing facilities are initiated by environment variables that either can be set and 
exported or set at the command line just before running mpimon. 
There are different tools available that can be useful to detect and analyze the cause of 
performance bottlenecks:
• Built-in proprietary trace and profiling tools provided with SMC
• Commercial tools that collect information during run and postprocesses and presents 
results afterwards such as Vampir from Pallas GmbH. See http://www.pallas.de for 
more information.
The main difference between these tools is that the SMC built-in tools can be used with an 
existing binary while the other tools require reloading with extra libraries.
The powerful run time facilities Scali MPI Connect trace and Scali MPI Connect timing can be 
used to monitor and keep track of MPI calls and their characteristics. The various trace and 
timing options can yield many different views of an application's usage of MPI. Common to most 
of these logs are the massive amount of data which can sometimes be overwhelming, 
especially when run with many processes and using both trace and timing concurrently. 
The second part shows the timing of these different MPI calls. The timing is a sum of the timing 
for all MPI calls for all MPI processes and since there are many MPI processes the timing can 
look unrealistically high. However, it reflects the total time spent in all MPI calls. For situations 
in which benchmarking focuses primarily on timing rather than tracing MPI calls, the timing 
functionality is more appropriate. The trace functionality introduces some overhead and the 
total wall clock run time of the application goes up. The timing functionality is relatively light 
and can be used to time the application for performance benchmarking.
4.1 Example
To illustrate the potential of tracing and timing with Scali MPI Connect consider the code 
fragment below (full source reproduced in A-2).
int main( int argc, char** argv )
{
   MPI_Init( &argc, &argv );
   MPI_Comm_rank( MPI_COMM_WORLD, &rank );
   MPI_Comm_size( MPI_COMM_WORLD, &size );
   /* read image from file */
   /* broadcast to all nodes */
   MPI_Bcast( &my_count, 1, MPI_INT, 0, MPI_COMM_WORLD );
   /* scatter the image */
   MPI_Scatter( pixels, my_count, MPI_UNSIGNED_CHAR, recvbuf,
                my_count, MPI_UNSIGNED_CHAR, 0, MPI_COMM_WORLD );
   /* sum the squares of the pixels in the sub-image */