IBM SG24-5131-00 Benutzerhandbuch

Seite von 240
148
 
IBM Certification Study Guide  AIX HACMP
and control messages so that the Cluster Manager has accurate information 
about the status of its partner.
When a cluster becomes partitioned, and the network problem is cleared after 
the point when takeover processing has begun so that keepalive packets 
start flowing between the partitioned nodes again, something must be done to 
restore order in the cluster. This order is restored by the DGSP Message.
7.5  The DGSP Message
A DGSP message (short for Diagnostic Group Shutdown Partition
is sent 
when a node loses communication with the cluster and then tries to 
re-establish communication. 
For example, if a cluster node becomes unable to communicate with other 
nodes, yet it continues to work through its process table, the other nodes 
conclude that the “missing” node has failed because they no longer are 
receiving keepalive messages from it. The remaining nodes then process the 
necessary events to acquire the disks, IP addresses, and other resources 
from the “missing” node. This attempt to take over resources results in the 
dual-attached disks receiving resets to release them from the “missing” node 
and the start of IP address takeover scripts.
As the disks are being acquired by the takeover node (or after the disks have 
been acquired and applications are running), the “missing” node completes 
its process table (or clears an application problem) and attempts to resend 
keepalive messages and rejoin the cluster. Since the disks and IP addresses 
are in the process of being successfully taken over, it becomes possible to 
have a duplicate IP address on the network and the disks may start to 
experience extraneous traffic on the data bus.
Because the reason for the “missing” node remains undetermined, you can 
assume that the problem may repeat itself later, causing additional down time 
of not only the node but also the cluster and its applications. Thus, to ensure 
the highest cluster availability, a DGSP message is sent to all nodes in one of 
the partitions. Any node receiving a DGSP message halts immediately, in 
order to not cause any damage on disks or confusion on the networks.
In a partitioned cluster situation, the smaller partition (lesser number of 
nodes) is shut down, with each of its nodes getting a DGSP message. If the 
partitions are of equal size, the one with the node name beginning in the 
lowest name in the alphabet gets shut down. For example, in a cluster where 
one partition has NodeA and the other has NodeB, NodeB will be shut down.