Extreme 3804 Supplementary Manual

Page of 112
 
52
Advanced System Diagnostics and Troubleshooting Guide
Diagnostics
Background transceiver scanning
Tied to the system health check configuration
Runs in background to detect potential control path faults
Tests internal transceiver data paths
Tests all ASICs for proper read/write operations
The Role of Memory Scanning and Memory Mapping
The memory scanning and memory mapping functions identify and attempt to correct switch fabric 
checksum errors. When you are in the process of implementing the ExtremeWare diagnostics, keep in 
mind that these functions are an underlying base for much of what takes place in the diagnostic tests 
that make up the system health checks diagnostic suite. For more information, see Chapter 3, “Packet 
Errors and Packet Error Detection.”
NOTE
Memory scanning addresses switch fabric checksum errors detected in the packet memory area of the 
switching fabric.
The ExtremeWare memory scanning and memory mapping diagnostics are analogous to hard disk 
scanning tools, which are used to detect and map out bad sectors so that the drive can remain 
operational with no adverse effects on performance, capacity, or reliability. The ExtremeWare memory 
scanning and memory mapping diagnostics are used to identify and correct switch fabric checksum 
errors.
Memory scanning and memory mapping are two separate functions: scanning detects the faulted 
portion of the memory; mapping re-maps the memory to remove the faulted memory section.
Memory scanning is designed to help isolate one of the major root causes of fabric checksum errors: 
single-bit permanent (hard) failures. Memory scanning detects—with a high probability—all current 
single-bit permanent (hard) failures in the switch memory that would result in fabric checksum errors.
Memory mapping can correct up to eight of these detected permanent (hard) single-bit errors by 
reconfiguring the memory maps around the problem areas.
The packet memory scan examines every node of packet memory to detect packet errors by writing data 
to packet memory, then reading and comparing results. The test is invasive and takes the switch fabric 
offline to perform the test.
Errored cell correction:
If the test detects eight or fewer error cells, those error cells will be mapped and excluded from use. 
The module will continue to operate.
If the test detects more than eight error cells, the module is identified as “failed” and must be 
replaced.
You should use this test when the system log displays some intermittent or sporadic error messages that 
might indicate a problem, but do not provide sufficient information to confirm the problem or isolate 
the fault.