Справочник Пользователя для Escali 4.4

Скачать
Страница из 81
Section:  
Scali MPI Connect Release 4.4 Users Guide 
55
B-1.2 Why can I not start mpid?
mpid opens a socket and assigns a predefined mpid port number (see /etc/services for more 
information), to the end point. If mpid is terminated abnormally, the mpid port number cannot 
be re-used until a system defined timer has expired. To resolve:
‹ Use netstat -a | grep mpid to observe when the socket is released. When the socket is 
released, restart mpid again.
B-1.2.1 Bad clean up
V
A previous SMC run has not terminated properly.
‹ Check for mpi-processes on the nodes using /opt/scali/bin/scaps.
‹ Use /opt/scali/sbin/scidle 
‹ Use /opt/scali/bin/scash to check for leftover shared memory segments on all nodes 
(ipcs for Solaris and Linux).
Note: core dumping takes time.
B-1.2.2 Space overflow
V
The application has required too much SCI or shared memory resources.
‹ The mpimon pool-size specifications are too large, and must be reduced.
B-1.3 Why does my program terminate abnormally?
B-1.3.1 Core dump
V
The application core dumps.
‹ Use a debugger to locate the point of violation. The application may need to be recompiled 
to include symbolic debug information (-g for most compilers).
‹ Define SCAMPI_INSTALL_SIGSEGV_HANDLER=1 and attach to the failing process with the 
debugger.
B-1.4 General problems
V
Are you reasonably certain that your algorithms are MPI safe?
‹ Check if every send has a matching receive.
V
The program just hangs
‹ If the application has a large degree of asynchronicity, try to increase the channel-size
V
The program terminates without an error message
‹ Investigate the core file, or rerun the program in a debugger.