IBM SG24-5131-00 User Manual

Page of 240
Cluster Testing 
137
  • Verify that all sharedvg file systems and paging spaces are accessible (
df 
-k
 and 
lsps -a
).
6.2.2  Node Failure / Reintegration
The following sections deal with issues of node failure and reintegration. 
6.2.2.1  AIX Crash
Perform the following steps in the event of an AIX crash:
  • Check, by way of the verification commands, that all the Nodes in the 
cluster are up and running.
  • Optional: Prune the error log on NodeF (
errclear 0
).
  • If NodeF is an SMP, you may want to set the fast reboot switch (
mpcfg -cf 
11 1
).
  • Monitor cluster logfiles on NodeT.
  • Crash NodeF by entering 
cat /etc/hosts > /dev/kmem
.  (The LED on NodeF 
will display 888.)
  • The OS failure on NodeF will cause a node failover to NodeT.
  • Verify that failover has occurred (
netstat -i
 and 
ping
 for networks, 
lsvg -o
 
and 
vi
 of a test file for volume groups, and 
ps -U <appuid
> for application 
processes).
  • Power cycle NodeF.  If HACMP is not configured to start from /etc/inittab, 
(on restart) start HACMP on NodeF (
smit clstart
).  NodeF will take back 
its cascading Resource Groups.
  • Verify that re-integration has occurred (
netstat -i
 and 
ping
 for networks, 
lsvg -o
 and 
vi
 of a test file for volume groups, and 
ps -U <appuid
> for 
application processes).
6.2.2.2  CPU Failure
Perform the following steps in the event of CPU failure:
  • Check, by way of the verification commands, that all the Nodes in the 
cluster are up and running.
  • Optional: Prune the error log on NodeF (
errclear 0
).
  • If NodeF is an SMP, you may want to set the fast reboot switch (
mpcfg -cf 
11 1
).
  • Monitor cluster logfiles on NodeT.
  • Power off NodeF.  This will cause a node failover to NodeT.