IBM 520Q User Manual

Page of 110
78
 
IBM System p5 520 and 520Q Technical Overview and Introduction
3.1  Reliability, availability, and serviceability
Excellent quality and reliability are inherent in all aspects of the IBM System p5 processor 
design and manufacturing. The fundamental objective of the design approach is to minimize 
outages. The RAS features help to ensure that the system operates when required, performs 
reliably, and efficiently handles any failures that might occur. This is achieved using 
capabilities that both the hardware and the operating system AIX 5L provide.
The p5-520 or p5-520Q as a POWER5+ server enhances the RAS capabilities that are 
implemented in POWER4-based systems. RAS enhancements available on POWER5 and 
POWER5+ servers are:
򐂰
Most firmware updates allow the system to remain operational.
򐂰
The ECC has been extended to inter-chip connections for the fabric and processor bus.
򐂰
Partial L2 cache deallocation is possible.
򐂰
The number of L3 cache line deletes improved from two to ten for better self-healing 
capability.
The following sections describe the concepts that form the basis of leadership RAS features 
of IBM System p5 systems in more detail.
3.1.1  Fault avoidance
IBM System p5 servers are built on a quality-based design that is intended to keep errors 
from happening. This design includes the following features:
򐂰
Reduced power consumption and cooler operating temperatures for increased reliability, 
which is enabled by the use of copper circuitry, silicon-on-insulator, and dynamic clock 
gating
򐂰
Mainframe-inspired components and technologies 
3.1.2  First-failure data capture
If a problem should occur, the ability to diagnose that problem correctly is a fundamental 
requirement upon which improved availability is based. The p5-520 and p5-520Q incorporate 
advanced capability in start-up diagnostics and in run-time First-failure data capture (FDDC) 
based on strategic error checkers built into the processors.
Any errors detected by the pervasive error checkers are captured into Fault Isolation 
Registers (FIRs), which can be interrogated by the service processor. The service processor 
has the capability to access system components using special purpose ports or by access to 
the error registers. Figure 3-1 on page 79 shows a schematic of a Fault Register 
Implementation.