Escali Escali, LLC Network Router 4.4 사용자 설명서
Section:
Scali MPI Connect Release 4.4 Users Guide
55
B-1.2 Why can I not start mpid?
mpid opens a socket and assigns a predefined mpid port number (see /etc/services for more
information), to the end point. If mpid is terminated abnormally, the mpid port number cannot
be re-used until a system defined timer has expired. To resolve:
Use netstat -a | grep mpid to observe when the socket is released. When the socket is
information), to the end point. If mpid is terminated abnormally, the mpid port number cannot
be re-used until a system defined timer has expired. To resolve:
Use netstat -a | grep mpid to observe when the socket is released. When the socket is
released, restart mpid again.
B-1.2.1 Bad clean up
V
V
A previous SMC run has not terminated properly.
Check for mpi-processes on the nodes using /opt/scali/bin/scaps.
Use /opt/scali/sbin/scidle
Use /opt/scali/bin/scash to check for leftover shared memory segments on all nodes
Use /opt/scali/sbin/scidle
Use /opt/scali/bin/scash to check for leftover shared memory segments on all nodes
(ipcs for Solaris and Linux).
Note: core dumping takes time.
B-1.2.2 Space overflow
V
V
The application has required too much SCI or shared memory resources.
The mpimon pool-size specifications are too large, and must be reduced.
B-1.3 Why does my program terminate abnormally?
B-1.3.1 Core dump
V
V
The application core dumps.
Use a debugger to locate the point of violation. The application may need to be recompiled
to include symbolic debug information (-g for most compilers).
Define SCAMPI_INSTALL_SIGSEGV_HANDLER=1 and attach to the failing process with the
debugger.
B-1.4 General problems
V
Are you reasonably certain that your algorithms are MPI safe?
Check if every send has a matching receive.
V
The program just hangs
If the application has a large degree of asynchronicity, try to increase the channel-size.
V
The program terminates without an error message
Investigate the core file, or rerun the program in a debugger.