Intel E7-4890 v2 CM8063601272412 Manuale Utente
Codici prodotto
CM8063601272412
Intel
®
Xeon
®
Processor E7-2800/4800/8800 v2 Product Family
53
Datasheet Volume Two: Functional Description, February 2014
Reliability, Availability, Serviceability, and Manageability
mode all the UC errors are reported as ‘Fatal’, and lead to MCE
1
(Machine Check
Exception) which is an abort class exception resulting in system reset. Such errors are
also called as DUE (Detected but Uncorrected Error).
also called as DUE (Detected but Uncorrected Error).
When the Intel Xeon processor E7 v2 product family is configured in Corrupt Data
Containment mode and when certain types of UC error are detected, it does not lead to
MCE (Machine Check Exception) at the time of detection. Such errors are called as UCR
(Uncorrected Recoverable) errors. Depending upon the point of detection of such UCR
error, it is further classified as UCNA, SRAO, or SRAR and are described below:
Containment mode and when certain types of UC error are detected, it does not lead to
MCE (Machine Check Exception) at the time of detection. Such errors are called as UCR
(Uncorrected Recoverable) errors. Depending upon the point of detection of such UCR
error, it is further classified as UCNA, SRAO, or SRAR and are described below:
• UCNA (Uncorrected No Action Required) - Data is detected with an
uncorrected error and an ‘Error Containment’ bit (also known as Poison Bit) is
attached to the data. It is allowed to reach to its destination without any further
Software or Hardware action, MCE is not triggered at the source of the uncorrected
error.
attached to the data. It is allowed to reach to its destination without any further
Software or Hardware action, MCE is not triggered at the source of the uncorrected
error.
• SRAO (Software Recoverable Action Optional) - Data is detected with an
uncorrected error in a non-execution path. SRAO type of UCR error would trigger
MCE but a system reset is not required.
MCE but a system reset is not required.
• SRAR (Software Recoverable Action Required) - Data or instruction is
detected with UCR error in execution path within the core. SRAR type of UCR error
would trigger MCE and immediate action is required.
would trigger MCE and immediate action is required.
There can still be some errors that would be detected but might not be correctable or
recoverable and are considered either Catastrophic or Fatal. Such catastrophic and fatal
errors are also called “Detectable but Uncorrected Errors (DUE). All the DUEs would
eventually lead to system reset. Signaling of these two kinds of DUEs is different and
further assists in identifying the source of error.
recoverable and are considered either Catastrophic or Fatal. Such catastrophic and fatal
errors are also called “Detectable but Uncorrected Errors (DUE). All the DUEs would
eventually lead to system reset. Signaling of these two kinds of DUEs is different and
further assists in identifying the source of error.
7.1.3
RASM Feature Summary
The Intel Xeon processor E7 v2 product family RAS features can be classified into
following categories:
following categories:
1. Core and Uncore Error Handling features: The processor core and uncore (including
Cbo/LLC, HA, iMC, Intel
®
QPI, and PCU) implement various types of error
detection, correction, containment, and reporting features.
2. Memory RASM features: Features incorporated in the HA and iMC module
supporting robustness of the memory subsystem. Memory RASM features includes
error detection, Error Correction Code (ECC), Sparing, Scrubbing, Mirroring,
Corrupt Data Containment and MCA Recovery.
error detection, Error Correction Code (ECC), Sparing, Scrubbing, Mirroring,
Corrupt Data Containment and MCA Recovery.
3. Intel
®
QPI RASM Features: Features include protocol protection via CRC, Corrupt
Data Containment, and error reporting.
4. IIO Module RASM Features: Integrated Input/Output (IIO) module RASM features
including error detection/correction, PCI Express CRC and retry, and Corrupt Data
Containment. Intel Xeon processor E7 v2 product family IIO also supports IO MCA
to report IIO internal and PCIe uncorrected non-fatal and fatal errors from root
ports and downstream ports/devices.
Containment. Intel Xeon processor E7 v2 product family IIO also supports IO MCA
to report IIO internal and PCIe uncorrected non-fatal and fatal errors from root
ports and downstream ports/devices.
5. System Level RASM and miscellaneous Features: Platform or system level features
including in-band system management, out-of-band system management, and out-
of-band access to MCA banks, socket migration etc.
of-band access to MCA banks, socket migration etc.
1. In this document, MCE (Machine Check Exception) and MCERR (Machine Check Error) are used
interchangeably.