IBM SG24-5131-00 User Manual

Page of 240
138
 
IBM Certification Study Guide  AIX HACMP
  • Verify that failover has occurred (
netstat -i
 and 
ping
 for networks, 
lsvg -o
 
and 
vi
 of a test file for volume groups, and 
ps -U <appuid
> for application 
processes).
  • Power cycle NodeF.  If HACMP is not configured to start from /etc/inittab 
(on restart), start HACMP on NodeF (
smit clstart
).  NodeF will take back 
its cascading Resource Groups.
  • Verify that re-integration has occurred (
netstat -i
 and 
ping
 for networks, 
lsvg -o
 and 
vi
 of a test file for volume groups, and 
ps -U <appuid
> for 
application processes).
6.2.2.3  TCP/IP Subsystem Failure
  • Check, by way of the verification commands, that all the Nodes in the 
cluster are up and running.
  • Optional: Prune the error log on NodeF (errclear 0).
  • Monitor the cluster log files on NodeT.
  • On NodeF, stop the TCP/IP subsystem (
sh /etc/tcp.clean
) or crash the 
subsystem by increasing the size of the sb_max and thewall parameters to 
large values (
no -o sb_max=10000; no -o thewall=10000
) and ping NodeT.  
Note that you should record the values for sb_max and thewall prior to 
modifying them, and, as an extra check, you may want to add the original 
values to the end of /etc/rc.net.
  • The TCP/IP subsystem failure on NodeF will cause a network failure of all 
the TCP/IP networks on NodeF. Unless there has been some 
customization done to promote this type of failure to a node failure, only 
the network failure will occur. The presence of a non-TCP/IP network 
(RS232, target mode SCSI or target mode SSA) should prevent the cluster 
from triggering a node down in this situation.
  • Verify that the network_down event has been run by checking the 
/tmp/hacmp.out file on either node. By default, the network_down script 
does nothing, but it can be customized to do whatever is appropriate for 
that situation in your environment.
  • On NodeF, issue the command 
startsrc -g tcpip
. This should restart the 
TCP/IP daemons, and should cause a network_up event to be triggered in 
the cluster for each of your TCP/IP networks.
6.2.3  Network Failure
  • Check, by way of the verification commands, that all the Nodes in the 
cluster are up and running.
  • Optional: Prune the error log on NodeF (
errclear 0
).