Extreme 3804 Supplementary Manual

Page of 112
 
70
Advanced System Diagnostics and Troubleshooting Guide
Diagnostics
Health Check Functionality
The system health check feature can be configured to operate in one of two mutually-exclusive modes:
alarm-level response action
auto-recovery response action
The first mode—alarm-level—is a user-configurable log level; the second—auto-recovery—automatically 
attempts to diagnose the suspect module and restore it to operation. The choice between these two 
modes normally depends on the network topology, recovery mechanisms implemented, and acceptable 
service outage windows.
These modes are configured by two separate CLI commands, described below.
Alarm-Level Response Action
To configure the switch to respond to a failed health check based on alarm-level, use this command:
config sys-health-check alarm-level [card-down | log | system-down | traps]
where:
Auto-Recovery Response Action
The method for configuring auto-recovery response action depends on the switch platform 
(BlackDiamond vs. Alpine or Summit).
BlackDiamond Switches.  
To configure the switch to respond to a failed health check by attempting to 
perform auto-recovery (packet memory scanning and mapping), use this command:
config sys-health-check auto-recovery <number of tries> [offline | online]
where:
card-down
(BlackDiamond only.) Posts a CRIT message to the log, sends an SNMP trap, and 
turns off the BlackDiamond module.
default
Resets the alarm level to 
log
.
log
Posts a CRIT message to the local system log, NVRAM, and to a remote syslog (if 
configured).
system-down
Posts a CRIT message to the log, sends an SNMP trap, and powers the system down 
(reboots in limited function mode).
traps
Posts a CRIT message to the log and sends an SNMP trap to the configured trap 
receivers.
number of tries
Specifies the number of times that the health checker attempts to auto-recover a faulty 
module. The range is from 0 to 255 times. The default is 3 times.
offline
Specifies that a faulty module is to be taken offline and kept offline if one of the 
following conditions is true:
More than eight defects are detected.
No new defects were found by the memory scanning and mapping process.
The same checksum errors are again detected by the system health checker.
online
Specifies that a faulty module is to be kept online, regardless of memory scanning or 
memory mapping errors.