IBM SG24-5131-00 User Manual

Page of 240
Cluster Planning 
45
event to inform system administrators that traffic may have to be rerouted. 
Afterwards, you can use a network_up notification event to inform system 
administrators that traffic can again be serviced through the restored network.
2.6.1.3  Predictive Event Error Correction 
You can specify a command that attempts to recover from an event script 
failure. If the recovery command succeeds and the retry count for the event 
script is greater than zero, the event script is rerun. You can also specify the 
number of times to attempt to execute the recovery command. 
For example, a recovery command can include the retry of unmounting a file 
system after logging a user off and making sure no one was currently 
accessing the file system.
If a condition that affects the processing of a given event on a cluster is 
identified, such as a timing issue, you can insert a recovery command with a 
retry count high enough to be sure to cover for the problem.
2.6.2  Error Notification
The AIX Error Notification facility detects errors that are logged to the AIX 
error log, such as network and disk adapter failures, and triggers a predefined 
response to the failure. It can even act on application failures, as long as they 
are logged in the error log.
To implement error notification, you have to add an object to the Error 
Notification object class in the ODM. This object clearly identifies what sort of 
errors you are going to react to, and how.
By specifying the following in a file:
errnotify:
en_name = "Failuresample"
en_persistenceflg = 0
en_class = "H"
en_type = "PERM"
en_rclass = "disk"
en_method = "errpt -a -l $1 | mail -s ’Disk Error’ root"
and adding this to the 
errnotify
 class through the 
odmadd <filename>
 
command, the specified 
en_method
 is executed every time the error notification 
daemon finds a matching entry in the error report. In the example above, the 
root user will get e-mail identifying the exact error report entry.