IBM SG24-5131-00 User Manual

Cluster Planning

event to inform system administrators that traffic may have to be rerouted.
Afterwards, you can use a network_up notification event to inform system
administrators that traffic can again be serviced through the restored network.

2.6.1.3 Predictive Event Error Correction
You can specify a command that attempts to recover from an event script
failure. If the recovery command succeeds and the retry count for the event
script is greater than zero, the event script is rerun. You can also specify the
number of times to attempt to execute the recovery command.

For example, a recovery command can include the retry of unmounting a file
system after logging a user off and making sure no one was currently
accessing the file system.

If a condition that affects the processing of a given event on a cluster is
identified, such as a timing issue, you can insert a recovery command with a
retry count high enough to be sure to cover for the problem.

2.6.2 Error Notification

The AIX Error Notification facility detects errors that are logged to the AIX
error log, such as network and disk adapter failures, and triggers a predefined
response to the failure. It can even act on application failures, as long as they
are logged in the error log.

To implement error notification, you have to add an object to the Error
Notification object class in the ODM. This object clearly identifies what sort of
errors you are going to react to, and how.

By specifying the following in a file:

errnotify:

en_name = "Failuresample"

en_persistenceflg = 0

en_class = "H"

en_type = "PERM"

en_rclass = "disk"

en_method = "errpt -a -l $1 | mail -s ’Disk Error’ root"

and adding this to the

errnotify

class through the

odmadd <filename>

command, the specified

en_method

is executed every time the error notification

daemon finds a matching entry in the error report. In the example above, the
root user will get e-mail identifying the exact error report entry.