Q-Logic IB6054601-00 D Manuel D’Utilisation

Page de 122
B – Integration with a Batch Queuing System
Lock Enough Memory on Nodes When Using SLURM
B-4
IB6054601-00 D
Q
The following command will terminate all processes using the InfiniPath 
interconnect:
/sbin/fuser -k /dev/ipath 
For more information, see the man pages for 
fuser(1)
 and 
lsof(8).
NOTE:
Run these commands as root to insure that all processes are reported.
B.2
Lock Enough Memory on Nodes When Using SLURM
This is identical to information provided in 
. It is repeated here for 
your convenience. 
InfiniPath MPI requires the ability to lock (pin) memory during data transfers on each 
compute node. This is normally done via 
/etc/initscript
, which is created or 
modified during the installation of the infinipath RPM (setting a limit of 64MB, 
with the command "
ulimit -l 65536
").
Some batch systems, such as SLURM, propagate the user’s environment from the 
node where you start the job to all the other nodes. For these batch systems, you 
may need to make the same change on the node from which you start your batch 
jobs.
If this file is not present or the node has not been rebooted after the infinipath 
RPM has been installed, a failure message similar to this will be generated:
mpirun -m ~/tmp/sm -np 2 -mpi_latency 1000 1000000
node-00:1.ipath_update_tid_err: failed: Cannot allocate memory 
mpi_latency:
/fs2/scratch/infinipath-build-1.3/mpi-1.3/mpich/psm/src 
mq_ips.c:691:
mq_ipath_sendcts: Assertion ‘rc == 0’ failed. MPIRUN: Node program 
unexpectedly quit. Exiting.
You can check the ulimit -l on all the nodes by running ipath_checkout. A 
warning will be given if ulimit -l is less that 4096. 
There are two possible solutions to this. If infinipath is not installed on the node 
where you start the job, set this value in the following way. You must be root to set it:
ulimit -l 65536 
Or, if you have installed infinipath on the node, reboot it to insure that 
/etc/initscript
 is run.