IBM SG24-5131-00 User Manual

Page of 240
Cluster Troubleshooting 
149
7.6  User ID Problems
Within an HACMP cluster, you always have more than one node potentially 
offering the same service to a specific user or a specific user id.
As the node providing the service can change, the system administrator has 
to ensure that the same user and group is known to all nodes potentially 
running an application. So, in case one node is failing, and the application is 
taken over by the standby node, a user can go on working since the takeover 
node knows that user under exactly the same user and group id.
Since user access within an NFS mounted file system is granted based on 
user IDs, the same applies to NFS mounted file systems.
For more information on managing user and group accounts within a cluster, 
refer to Chapter 2.7, “User ID Planning” on page 48, or to Chapter 12, 
“Managing User and Groups in a Cluster” of the 
HACMP for AIX, Version 4.3: 
Administration Guide, SC23-4279.
7.7  Troubleshooting Strategy
In order to quickly find a solution to a problem in the cluster, some sort of 
strategy is helpful for pinpointing the problem. The following guidelines 
should make the troubleshooting process more productive:
  • Save the log files associated with the problem before they become 
unavailable. Make sure you save the /tmp/hacmp.out and /tmp/cm.log files 
before you do anything else to try to figure out the cause of the problem.
  • Attempt to duplicate the problem. Do not rely too heavily on the user’s 
problem report. The user has only seen the problem from the application 
level. If necessary, obtain the user’s data files to recreate the problem.
  • Approach the problem methodically. Allow the information gathered from 
each test to guide your next test. Do not jump back and forth between 
tests based on hunches.
  • Keep an open mind. Do not assume too much about the source of the 
problem. Test each possibility and base your conclusions on the evidence 
of the tests.
  • Isolate the problem. When tracking down a problem within an HACMP 
cluster, isolate each component of the system that can fail and determine 
whether it is working. Work from top to bottom, following the progression 
described in the following section.