FIC m785 Servicehandbuch
Hardware Functional Overview
4-6
FIC M785 Service Manual
Level 1 Execution Trace Cache
In addition to the 8-KB data cache, the Pentium 4 processor includes an Execution Trace
Cache that stores up to 12-K decoded micro-ops in the order of program execution. This
increases performance by removing the decoder from the main execution loop and makes
more efficient usage of the cache storage space since instructions that are branched around
are not stored. The result is a means to deliver a high volume of instructions to the
processor's execution units and a reduction in the overall time required to recover from
branches that have been mis-predicted.
Rapid Execution Engine
Two Arithmetic Logic Units (ALUs) on the Pentium 4 processor are clocked at twice the core
processor frequency. This allows basic integer instructions such as Add, Subtract, Logical
AND, Logical OR, etc. to execute in one-half a clock cycle. For example, the Rapid Execution
Engine on a 2.80 GHz Pentium 4 processor runs at 5.60 GHz.
512-KB or 256-KB, Level 2 Advanced Transfer Cache
512-KB L2 Advanced Transfer Cache (ATC) is available with speeds 1.80A, 2A, 2.20, 2.26,
2.40, 2.50, 2.53, 2.60, 2.66 and 2.80 GHz. 256-KB L2 ATC is available with speeds 1.70 GHz
to 1.90 GHz. The Level 2 ATC delivers a much higher data throughput channel between the
Level 2 cache and the processor core. The Advanced Transfer Cache consists of a 256-bit
(32-byte) interface that transfers data on each core clock. As a result, the Pentium 4
processor at 2.80 GHz can deliver a data transfer rate of 89.6 GB/s. This compares to a
transfer rate of 16 GB/s on the Pentium III processor at 1 GHz. Features of the ATC include:
•
Non-Blocking, full speed, on-die level 2 cache
•
8-way set associativity
•
256-bit data bus to the level 2 cache
•
Data clocked into and out of the cache every clock cycle
Advanced Dynamic Execution
The Advance Dynamic Execution engine is a very deep, out-of-order speculative execution
engine that keeps the execution units executing instructions. The Pentium 4 processor can
also view 126 instructions in flight and handle up to 48 loads and 24 stores in the pipeline. It
also includes an enhanced branch prediction algorithm that has the net effect of reducing the
number of branch mis-predictions by about 33% over the P6 generation processor's branch
prediction capability. It does this by implementing a 4-KB branch target buffer that stores
more detail on the history of past branches, as well as by implementing a more advanced
branch prediction algorithm.
Enhanced Floating-Point and Multimedia Unit
The Pentium 4 processor expands the floating-point registers to a full 128-bit and adds an
additional register for data movement which improves performance on both floating-point and
multimedia applications.
PDF created with FinePrint pdfFactory Pro trial version