Руководство По Устранению Ошибки для Cisco Cisco UCS B230 M2 Blade Server
UCS Memory Error Management
UCS Enhanced Memory Error Management
Page 9
Enhanced Memory Error Management
As memory error rates have increased, false positives of DIMM replacements due to correctable errors have also
increased. Replacing these DIMMs can be expensive, time consuming, and results in unwanted system downtime.
In many cases, the DIMMs that were replaced had encountered soft errors, and when analyzed, resulted in a
diagnosis of “No Trouble Found”. Those DIMMs could have continued to operate in the system without increasing
the likelihood of an uncorrectable error.
increased. Replacing these DIMMs can be expensive, time consuming, and results in unwanted system downtime.
In many cases, the DIMMs that were replaced had encountered soft errors, and when analyzed, resulted in a
diagnosis of “No Trouble Found”. Those DIMMs could have continued to operate in the system without increasing
the likelihood of an uncorrectable error.
In response to those false positives, Cisco has developed an Enhanced Memory Error Management algorithm that
detects those DIMMs that put the system at increased risk of encountering an uncorrectable error, and
recommends replacement of only those DIMMs. The algorithm filters out DIMMs which encounter some
correctable errors, but which present a negligible risk of causing an uncorrectable error. The algorithm takes into
account the difference between hard and soft errors. Additionally, the sophisticated algorithm factors in the
robust ECC code that can correct 4-bit symbol errors, not just single-bit errors, and the automatic patrol scrubbing
that takes place in the background. It has been validated by extensive data collection and analysis of the
correlation between correctable errors and uncorrectable errors in Cisco’s own data centers and elsewhere.
detects those DIMMs that put the system at increased risk of encountering an uncorrectable error, and
recommends replacement of only those DIMMs. The algorithm filters out DIMMs which encounter some
correctable errors, but which present a negligible risk of causing an uncorrectable error. The algorithm takes into
account the difference between hard and soft errors. Additionally, the sophisticated algorithm factors in the
robust ECC code that can correct 4-bit symbol errors, not just single-bit errors, and the automatic patrol scrubbing
that takes place in the background. It has been validated by extensive data collection and analysis of the
correlation between correctable errors and uncorrectable errors in Cisco’s own data centers and elsewhere.
The result of employing Cisco’s Enhanced Memory Error Management is reduced operating costs and improved
system availability due to fewer unnecessary DIMM replacements.
system availability due to fewer unnecessary DIMM replacements.
Software Supported
Support for Enhanced Memory Error Management is available on the below releases:
UCS Manager Patch Release 2.2(1b) and newer
UCS Manager Patch Release 2.1(3c) and newer