Intel E7-8891 v2 CM8063601377422 Benutzerhandbuch

Produktcode
CM8063601377422
Seite von 504
Intel
®
 Xeon
® 
Processor E7-2800/4800/8800 v2 Product Family
53
Datasheet Volume Two: Functional Description, February 2014
Reliability, Availability, Serviceability, and Manageability
mode all the UC errors are reported as ‘Fatal’, and lead to MCE
1
 (Machine Check 
Exception) which is an abort class exception resulting in system reset. Such errors are 
also called as DUE (Detected but Uncorrected Error). 
When the Intel Xeon processor E7 v2 product family is configured in Corrupt Data 
Containment mode and when certain types of UC error are detected, it does not lead to 
MCE (Machine Check Exception) at the time of detection. Such errors are called as UCR 
(Uncorrected Recoverable) errors. Depending upon the point of detection of such UCR 
error, it is further classified as UCNA, SRAO, or SRAR and are described below:
• UCNA (Uncorrected No Action Required) - Data is detected with an 
uncorrected error and an ‘Error Containment’ bit (also known as Poison Bit) is 
attached to the data. It is allowed to reach to its destination without any further 
Software or Hardware action, MCE is not triggered at the source of the uncorrected 
error. 
• SRAO (Software Recoverable Action Optional) - Data is detected with an 
uncorrected error in a non-execution path. SRAO type of UCR error would trigger 
MCE but a system reset is not required.
• SRAR (Software Recoverable Action Required) - Data or instruction is 
detected with UCR error in execution path within the core. SRAR type of UCR error 
would trigger MCE and immediate action is required.
There can still be some errors that would be detected but might not be correctable or 
recoverable and are considered either Catastrophic or Fatal. Such catastrophic and fatal 
errors are also called “Detectable but Uncorrected Errors (DUE). All the DUEs would 
eventually lead to system reset. Signaling of these two kinds of DUEs is different and 
further assists in identifying the source of error.
7.1.3
RASM Feature Summary
The Intel Xeon processor E7 v2 product family RAS features can be classified into 
following categories:
1. Core and Uncore Error Handling features: The processor core and uncore (including 
Cbo/LLC, HA, iMC, Intel
®
 QPI, and PCU) implement various types of error 
detection, correction, containment, and reporting features. 
2. Memory RASM features: Features incorporated in the HA and iMC module 
supporting robustness of the memory subsystem. Memory RASM features includes 
error detection, Error Correction Code (ECC), Sparing, Scrubbing, Mirroring, 
Corrupt Data Containment and MCA Recovery. 
3. Intel
®
 QPI RASM Features: Features include protocol protection via CRC, Corrupt 
Data Containment, and error reporting. 
4. IIO Module RASM Features: Integrated Input/Output (IIO) module RASM features 
including error detection/correction, PCI Express CRC and retry, and Corrupt Data 
Containment. Intel Xeon processor E7 v2 product family IIO also supports IO MCA 
to report IIO internal and PCIe uncorrected non-fatal and fatal errors from root 
ports and downstream ports/devices.
5. System Level RASM and miscellaneous Features: Platform or system level features 
including in-band system management, out-of-band system management, and out-
of-band access to MCA banks, socket migration etc.
1. In this document, MCE (Machine Check Exception) and MCERR (Machine Check Error) are used 
interchangeably.