Q-Logic IB6054601-00 D Manual De Usuario
C – Troubleshooting
InfiniPath MPI Troubleshooting
InfiniPath MPI Troubleshooting
C-22
IB6054601-00 D
Q
If this file is not present or the node has not been rebooted after the
infinipath
RPM has been installed, a failure message similar to this will be generated:
$ mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000
node-00:1.ipath_update_tid_err: failed: Cannot allocate memory
mpi_latency:
/fs2/scratch/infinipath-build-2.0/mpi-2.0/mpich/psm/src
mq_ips.c:691:
mq_ipath_sendcts: Assertion ‘rc == 0’ failed. MPIRUN: Node program
unexpectedly quit. Exiting.
You can check the
ulimit -l
on all the nodes by running
ipath_checkout
. A
warning will be given if
ulimit -l
is less that 4096.
There are two possible solutions to this. If InfiniPath is not installed on the node
where you start the job, set this value in the following way (as root).
where you start the job, set this value in the following way (as root).
# ulimit -l 65536
Or, if you have installed InfiniPath on the node, reboot it to insure that
/etc/initscript
is run.
C.8.12
Error Messages Generated by mpirun
In the sections below, types of mpirun error messages are described. They fall into
these categories:
these categories:
■
Messages from the InfiniPath Library
■
MPI messages
■
Messages relating to the InfiniPath driver and InfiniBand links
Messages generated by mpirun follow a general format:
program_name: message
function_name: message
Messages may also have different prefixes, such and ipath_ or psm_, which will
indicate in which part of the software the errors are occurring.
indicate in which part of the software the errors are occurring.
C.8.12.1
Messages from the InfiniPath Library
These messages may appear in the
mpirun
output.
The first set are error messages, which indicate internal problems and should be
reported to Support.
reported to Support.
Trying to cancel invalid timer (EOC)
sender rank rank is out of range (notification)
sender rank rank is out of range (ack)
Reached TIMER_TYPE_EOC while processing timers