Справочник Пользователя для AMD 250
192
Integer Optimizations
Chapter 8
25112
Rev. 3.06
September 2005
Software Optimization Guide for AMD64 Processors
8.9
Optimizing Integer Division
Optimization
When possible, use smaller data types for integer division.
Application
This optimization applies to:
•
32-bit software
•
64-bit software
Rationale
Division by a 16-bit value is significantly faster than division by a 32-bit value—about a 26 clock
latency versus 42. Likewise, division by a 32-bit value is faster than division by a 64-bit value—about
42 clocks versus 74. Refer to IDIV in table 15. In algorithms in which integer division contributes a
substantial component to performance, it may be beneficial to check whether using a smaller divide
type is possible. Study the assembly language output generated by high-level language compilers to
verify that the desired code is generated. Compilers often generate code that converts 16-bit types into
32-bit values that are then used to perform 32-bit division, thus eliminating the advantage of using 16-
bit integer types. If the compiler cannot be coerced into producing the desired code, then compiler
intrinsics or assembly language are required.
latency versus 42. Likewise, division by a 32-bit value is faster than division by a 64-bit value—about
42 clocks versus 74. Refer to IDIV in table 15. In algorithms in which integer division contributes a
substantial component to performance, it may be beneficial to check whether using a smaller divide
type is possible. Study the assembly language output generated by high-level language compilers to
verify that the desired code is generated. Compilers often generate code that converts 16-bit types into
32-bit values that are then used to perform 32-bit division, thus eliminating the advantage of using 16-
bit integer types. If the compiler cannot be coerced into producing the desired code, then compiler
intrinsics or assembly language are required.