Microchip Technology MA320001 Data Sheet

Page of 214
PIC32MX3XX/4XX
DS61143H-page 38
© 2011 Microchip Technology Inc.
3.2
Architecture Overview
The MIPS32
®
 M4K
®
 Processor Core contains several
logic blocks working together in parallel, providing an
efficient high performance computing engine. The
following blocks are included with the core:
• Execution Unit
• Multiply/Divide Unit (MDU)
• System Control Coprocessor (CP0)
• Fixed Mapping Translation (FMT)
• Dual Internal Bus interfaces
• Power Management
• MIPS16e Support
• Enhanced JTAG (EJTAG) Controller
3.2.1
EXECUTION UNIT
The MIPS32
®
 M4K
®
 Processor Core execution unit
implements a load/store architecture with single-cycle
ALU operations (logical, shift, add, subtract) and an
autonomous multiply/divide unit. The core contains
thirty-two 32-bit general purpose registers used for
integer operations and address calculation. One addi-
tional register file shadow set (containing thirty-two reg-
isters) is added to minimize context switching overhead
during interrupt/exception processing. The register file
consists of two read ports and one write port and is fully
bypassed to minimize operation latency in the pipeline.
 The execution unit includes:
• 32-bit adder used for calculating the data address
• Address unit for calculating the next instruction
address
• Logic for branch determination and branch target
address calculation
• Load aligner
• Bypass multiplexers used to avoid stalls when
executing instructions streams where data
producing instructions are followed closely by
consumers of their results
• Leading Zero/One detect unit for implementing the
CLZ
 and CLO instructions
• Arithmetic Logic Unit (ALU) for performing bitwise
logical operations
• Shifter and Store Aligner
3.2.2
MULTIPLY/DIVIDE UNIT (MDU)
The MIPS32
®
 M4K
®
 Processor Core includes a multi-
ply/divide unit (MDU) that contains a separate pipeline
for multiply and divide operations. This pipeline oper-
ates in parallel with the integer unit (IU) pipeline and
does not stall when the IU pipeline stalls. This allows
MDU operations to be partially masked by system stalls
and/or other integer unit instructions.
The high-performance MDU consists of a 32x16 booth
recoded multiplier, result/accumulation registers (HI
and LO), a divide state machine, and the necessary
multiplexers and control logic. The first number shown
(‘32’ of 32x16) represents the rs operand. The second
number (‘16’ of 32x16) represents the rt operand. The
PIC32MX core only checks the value of the latter (rt)
operand to determine how many times the operation
must pass through the multiplier. The 16x16 and 32x16
operations pass through the multiplier once. A 32x32
operation passes through the multiplier twice.
The MDU supports execution of one 16x16 or 32x16
multiply operation every clock cycle; 32x32 multiply
operations can be issued every other clock cycle.
Appropriate interlocks are implemented to stall the
issuance of back-to-back 32x32 multiply operations.
The multiply operand size is automatically determined
by logic built into the MDU.
Divide operations are implemented with a simple 1 bit
per clock iterative algorithm. An early-in detection
checks the sign extension of the dividend (rs) operand.
If rs is 8 bits wide, 23 iterations are skipped. For a 16-
bit-wide rs, 15 iterations are skipped, and for a 24-bit-
wide rs, 7 iterations are skipped. Any attempt to issue
a subsequent MDU instruction while a divide is still
active causes an IU pipeline stall until the divide
operation is completed.
 lists the repeat rate (peak issue rate of cycles
until the operation can be reissued) and latency (num-
ber of cycles until a result is available) for the PIC32MX
core multiply and divide instructions. The approximate
latency and repeat rates are listed in terms of pipeline
clocks.