Cisco Cisco UCS B420 M3 Blade Server 문제 해결 가이드
Background on Memory Errors
UCS Enhanced Memory Error Management
Page 3
Background on Memory Errors
Memory errors are encountered when an attempt is made to read a memory location. The value read from the
memory does not match the value that is supposed to be there.
memory does not match the value that is supposed to be there.
Classification of Memory Errors
Detected vs. Undetected Errors
In a system without ECC memory, there is no hardware error detection. Hence, memory errors will lead to silent
data corruption, incorrect execution of operating system or application, and eventually system crashes. Cisco’s
UCS Servers use ECC memory. Therefore, powerful error correcting codes such as those provided by the Intel
Xeon processors in UCS servers can detect memory errors so that silent data corruption does not occur.
data corruption, incorrect execution of operating system or application, and eventually system crashes. Cisco’s
UCS Servers use ECC memory. Therefore, powerful error correcting codes such as those provided by the Intel
Xeon processors in UCS servers can detect memory errors so that silent data corruption does not occur.
Hard vs. Soft Errors
Errors that are caused by a persistent physical defect are traditionally referred to as “hard” errors. A hard error
may be caused by an assembly defect like a solder bridge or cracked solder joint, or may be due to a defect in the
memory chip itself. Rewriting the memory location and retrying the read access will not eliminate a hard error.
This error will continue to repeat.
may be caused by an assembly defect like a solder bridge or cracked solder joint, or may be due to a defect in the
memory chip itself. Rewriting the memory location and retrying the read access will not eliminate a hard error.
This error will continue to repeat.
Errors caused by a brief electrical disturbance, either inside the DRAM chip, or on an external interface, are
referred to as “soft” errors. Soft errors are transient and do not continue to repeat. If the soft error was due to a
disturbance during the read operation, then simply retrying the read may yield correct data. If the soft error was
due to a disturbance that upset the contents of the memory array, then rewriting the memory location will correct
the error.
referred to as “soft” errors. Soft errors are transient and do not continue to repeat. If the soft error was due to a
disturbance during the read operation, then simply retrying the read may yield correct data. If the soft error was
due to a disturbance that upset the contents of the memory array, then rewriting the memory location will correct
the error.
Hard errors are typically detected by memory tests run by the UCS BIOS at boot time, and any DIMMs containing
hard errors are mapped out so that they cannot cause errors during runtime. UCS servers employ memory patrol
scrubbing to automatically detect and correct soft errors during runtime.
hard errors are mapped out so that they cannot cause errors during runtime. UCS servers employ memory patrol
scrubbing to automatically detect and correct soft errors during runtime.
Correctable vs. Uncorrectable Errors
Whether a particular error is correctable or uncorrectable depends on the strength of the ECC code employed
within the memory system. Dedicated hardware is able to fix correctable errors when they occur with no impact
on program execution. Uncorrectable errors generally cannot be fixed, and may make it impossible for the
application or operating system to continue execution.
within the memory system. Dedicated hardware is able to fix correctable errors when they occur with no impact
on program execution. Uncorrectable errors generally cannot be fixed, and may make it impossible for the
application or operating system to continue execution.