Справочник Пользователя для Intel E7-8857 v2 CM8063601275912
Модели
CM8063601275912
Intel
®
Xeon
®
Processor E7-2800/4800/8800 v2 Product Family
57
Datasheet Volume Two: Functional Description, February 2014
Reliability, Availability, Serviceability, and Manageability
7.4.11
Memory Migration
The Intel Xeon processor E7 v2 product family will provide support for migration of
memory to a spare FRU. Only one migration target in the system will be supported at a
time which means that there will only be one master home and one slave home in the
system.
memory to a spare FRU. Only one migration target in the system will be supported at a
time which means that there will only be one master home and one slave home in the
system.
7.5
IIO RAS
7.5.1
IIO RAS Overview
The IIO module RAS features aim to achieve the following:
• Error Containment
• PCI Express soft, uncorrectable error detection and recovery on links
• PCI Express soft, uncorrectable error detection and recovery on links
7.5.2
IIO Module Error Reporting
The IIO module logs and reports the detected errors via “system event” generations. In
the context of error reporting, a system event is an event that notifies the system of
the error. Two types of system events can be generated -- an inband message to the
CPU, and/or out-of-band signaling to the platform. In the case of inband messaging,
the CPU is notified of the error by the inband message (interrupt, failed response, and
so forth). Out-of-band signaling (Error Pins) informs an external agent of the error
events. An external agent such as BMC may collect the errors from the error pins to
determine the health of the system and sends interrupts to CPU accordingly.
the context of error reporting, a system event is an event that notifies the system of
the error. Two types of system events can be generated -- an inband message to the
CPU, and/or out-of-band signaling to the platform. In the case of inband messaging,
the CPU is notified of the error by the inband message (interrupt, failed response, and
so forth). Out-of-band signaling (Error Pins) informs an external agent of the error
events. An external agent such as BMC may collect the errors from the error pins to
determine the health of the system and sends interrupts to CPU accordingly.
7.5.2.1
Error Severity Classification
In the IO module, errors are classified into three severities: Correctable, Uncorrectable,
Fatal. This classification separates those errors resulting in functional failures from
those errors resulting in degraded performance or errors resulting in system resets.
Fatal. This classification separates those errors resulting in functional failures from
those errors resulting in degraded performance or errors resulting in system resets.
7.5.2.1.1
Correctable Errors (Severity 0 Error)
Hardware correctable errors include those error conditions where the system can
recover without any loss of information. Hardware corrects these errors and no
software intervention is required.
recover without any loss of information. Hardware corrects these errors and no
software intervention is required.
7.5.2.1.2
Recoverable Errors (Severity 1 Error)
Recoverable errors are software correctable or software/hardware uncorrectable errors
which cause a particular transaction to be unreliable but the system hardware is
otherwise fully functional. Isolating recoverable from fatal errors provides system
management software the opportunity to recover from the error without reset and
disturbing other transactions in progress. Devices not associated with the transaction in
error are not impacted by the error.
which cause a particular transaction to be unreliable but the system hardware is
otherwise fully functional. Isolating recoverable from fatal errors provides system
management software the opportunity to recover from the error without reset and
disturbing other transactions in progress. Devices not associated with the transaction in
error are not impacted by the error.
7.5.2.1.2.2 Software Correctable Errors
Software correctable errors are considered “recoverable” errors. These errors include
those error conditions where the system can recover without any loss of information.
Software intervention is required to correct these errors.
those error conditions where the system can recover without any loss of information.
Software intervention is required to correct these errors.