Intel IXP42X 用户手册

Intel

IXP42X Product Line of Network Processors and IXC1100 Control Plane Processor

September 2006

Order Number: 252480-006US

141

Intel XScale

Processor—Intel

IXP42X product line and IXC1100 control plane processors

3.7.4.4

Data/Bus Request Buffer Full Mode

The Data Cache has buffers available to service cache misses or uncacheable accesses.

For every memory request that the Data Cache receives from the processor core a

buffer is speculatively allocated in case an external memory request is required or

temporary storage is needed for an unaligned access. If no buffers are available, the

Data Cache will stall the processor core. How often the Data Cache stalls depends on

the performance of the bus external to the IXP42X product line and IXC1100 control

plane processors and what the memory access latency is for Data Cache miss requests

to external memory. If the IXP42X product line and IXC1100 control plane processors

memory access latency is high, possibly due to starvation, these Data Cache buffers

will become full. This performance monitoring mode is provided to see if the IXP42X

product line and IXC1100 control plane processors are being starved of the bus

external to the IXP42X product line and IXC1100 control plane processors, which will

effect the performance of the application running on the IXP42X product line and

IXC1100 control plane processors.

PMN0 accumulates the number of clock cycles the processor is being stalled due to this

condition and PMN1 monitors the number of times this condition occurs.

Statistics derived from these two events:

• The average number of cycles the processor stalled on a data-cache access that

may overflow the data-cache buffers. This is calculated by dividing PMN0 by PMN1.

This statistic lets you know if the duration event cycles are due to many requests or

are attributed to just a few requests. If the average is high, the IXP42X product line

and IXC1100 control plane processors may be starved of the bus external to the

IXP42X product line and IXC1100 control plane processors.

• The percentage of total execution cycles the processor stalled because a Data

Cache request buffer was not available. This is calculated by dividing PMN0 by

CCNT, which was used to measure total execution time.

3.7.4.5

Stall/Write-Back Statistics

When an instruction requires the result of a previous instruction and that result is not

yet available, the IXP42X product line and IXC1100 control plane processors stall in

order to preserve the correct data dependencies. PMN0 counts the number of stall

cycles due to data-dependencies. Not all data-dependencies cause a stall; only the

following dependencies cause such a stall penalty:

• Load-use penalty: attempting to use the result of a load before the load completes.

To avoid the penalty, software should delay using the result of a load until it’s

available. This penalty shows the latency effect of data-cache access.

• Multiply/Accumulate-use penalty: attempting to use the result of a multiply or

multiply-accumulate operation before the operation completes. Again, to avoid the

penalty, software should delay using the result until it’s available.

• ALU use penalty: there are a few isolated cases where back to back ALU operations

may result in one cycle delay in the execution. These cases are defined in

Table 3.9, “Performance Considerations” on page 159

PMN1 counts the number of write-back operations emitted by the data cache. These

write-backs occur when the data cache evicts a dirty line of data to make room for a

newly requested line or as the result of clean operation (CP15, register 7).

Statistics derived from these two events:

• The percentage of total execution cycles the processor stalled because of a data

dependency. This is calculated by dividing PMN0 by CCNT, which was used to

measure total execution time. Often a compiler can reschedule code to avoid these

penalties when given the right optimization switches.