Intel 253668-032US User Manual

Page of 806
15-18   Vol. 3
MACHINE-CHECK ARCHITECTURE
processor; the handler must be written to interpret P5_MC_TYPE encodings 
correctly.
15.4 
ENHANCED CACHE ERROR REPORTING
Starting with Intel Core Duo processors, cache error reporting was 
enhanced. In earlier Intel processors, cache status was based on the 
number of correction events that occurred in a cache. In the new paradigm, 
called “threshold-based error status”, cache status is based on the number 
of lines (ECC blocks) in a cache that incur repeated corrections. The 
threshold is chosen by Intel, based on various factors. If a processor 
supports threshold-based error status, it sets IA32_MCG_CAP[11] 
(MCG_TES_P) to 1; if not, to 0. 
A processor that supports enhanced cache error reporting contains hard-
ware that tracks the operating status of certain caches and provides an indi-
cator of their “health”. The hardware reports a “green” status when the 
number of lines that incur repeated corrections is at or below a pre-defined 
threshold, and a “yellow” status when the number of affected lines exceeds 
the threshold. Yellow status means that the cache reporting the event is 
operating correctly, but you should schedule the system for servicing within 
a few weeks.
Intel recommends that you rely on this mechanism for structures supported 
by threshold-base error reporting. 
The CPU/system/platform response to a yellow event should be less severe 
than its response to an uncorrected error. An uncorrected error means that 
a serious error has actually occurred, whereas the yellow condition is a 
warning that the number of affected lines has exceeded the threshold but is 
not, in itself, a serious event: the error was corrected and system state was 
not compromised. 
The green/yellow status indicator is not a foolproof early warning for an 
uncorrected error resulting from the failure of two bits in the same ECC 
block. Such a failure can occur and cause an uncorrected error before the 
yellow threshold is reached. However, the chance of an uncorrected error 
increases as the number of affected lines increases. 
15.5 
CORRECTED MACHINE CHECK ERROR INTERRUPT
Corrected machine-check error interrupt (CMCI) is an architectural 
enhancement to the machine-check architecture. It provides capabilities