Escali 4.4 사용자 설명서

다운로드
페이지 81
Scali MPI Connect Release 4.4 Users Guide 
47
Chapter 5 
Tuning SMC to your application
Scali MPI Connect allows the user to exercise control over the communication mechanisms 
through adjustment of the thresholds that steer which mechanism to use for a particular 
message. This is one technique that can be used to improve performance of parallel 
applications on a cluster.
Forcing size parameters to mpimon is usually not necessary. This is only a means of 
optimising SMC to a particular application, based on knowledge of communication patterns. For 
unsafe MPI programs it may be necessary to adjust buffering to allow the program to complete.
5.1 Tuning communication resources
The communication resources allocated by Scali MPI Connect are shared among the MPI 
processes in the node. 
• Communication buffer adaption: If the communication behaviour of the application is 
known, explicitly providing buffersize settings to mpimon, to match the requirement of 
the application, will in most cases improve performance. 
 
Example: Application sending only 900 bytes messages. 
Set channel_inline_threshold 964 (64 added for alignment) and increase the channel-
size significantly (32-128 k).
 
 
Setting eager_size 1k and eager_count high (16 or more).  
If all messages can be buffered, the transporter-{size, count} can be set to low values to 
reduce shared memory consumption. 
• How do I control shared memory usage?  
Adjusting SMC buffer sizes
• How do I calculate shared memory usage?  
The buffer space required by a communication channel is approximately:  
 
chunk-size = (2 * channel-entry-size * channel-entry-count) 
                + (transporter-size * transporter-count)  
                + (eager-size       * eager-count) 
                +4096 (give-or-take-a-few-bytes) 
Total-usage = chunk-size * no-of-processes
5.1.1 Automatic buffer management 
The pool-size is a limit for the total amount of shared memory. The automatic buffer size 
computations is based on full connectivity, i.e. all communicating with all others. Given a total 
pool of memory dedicated to communication, each communication channel will be restricted to 
use a partition of only(P = number of processes):
chunk = inter_pool_size / P
The automatic approach is to downsize all buffers associated with a communication channel 
until it fits in its part of the pool. The automatic chunk size is calculated to wrap a complete 
communication channel.