User ManualTable of ContentsContents3List of Figures5Revision History7Chapter 1 Introduction91.1 Related Documents10Chapter 2 Experimental Setup132.1 System Used13Figure 1. Quartet Topology14Figure 2. Internal Resources Associated with a Quartet Node152.2 Synthetic Test15Table 1. Data Access Rate Qualifiers162.3 Reading and Interpreting Test Graphs17Figure 3. Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System172.3.1 X-Axis Display172.3.2 Labels Used182.3.3 Y-Axis Display18Chapter 3 Analysis and Recommendations193.1 Scheduling Threads193.1.1 Multiple Threads-Independent Data193.1.2 Multiple Threads-Shared Data203.1.3 Scheduling on a Non-Idle System203.2 Data Locality Considerations20Figure 4. Read-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System21Figure 5. Write-Only Thread Running on Node 0, Accessing Data from 0, 1 and 2 Hops Away on an Idle System223.2.1 Keeping Data Local by Virtue of first Touch223.2.2 Data Placement Techniques to Alleviate Unnecessary Data Sharing Between Nodes Due to First Touch233.3 Avoid Cache Line Sharing253.4 Common Hop Myths Debunked253.4.1 Myth: All Equal Hop Cases Take Equal Time.25Figure 6. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case on an Idle System26Figure 7. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Low Background Load (High Subscription)27Figure 8. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Very High Background Load (High Subscription)28Figure 9. Crossfire 1 Hop-1 Hop Case vs No Crossfire 1 Hop-1 Hop Case under a Very High Background Load (Full Subscription)293.4.2 Myth: Greater Hop Distance Always Means Slower Time.29Figure 10. Both Read-Only Threads Running on Node 0 (Different Cores) on an Idle System30Figure 11. Both Write-Only Threads Running on Node 0 (Different Cores) on an Idle System31Figure 12. Both Write-Only Threads Running on Node 0 (Different Cores) under Low Background Load (High Subscription)32Figure 13. Both Write-Only Threads Running on Node 0 (Different Cores) under Medium Background Load (High Subscription)33Figure 14. Both Write-Only Threads Running on Node 0 (Different Cores) under High Background Load (High Subscription)33Figure 15. Both Write-Only Threads Running on Node 0 (Different Cores) under Very High Background Load (High Subscription)343.5 Locks343.6 Parallelism Exposed by Compilers on AMD ccNUMA Multiprocessor Systems35Chapter 4 Conclusions37Appendix A39A.1 Description of the Buffer Queues39Figure 16. Internal Resources Associated with a Quartet Node39A.2 Why Is the Crossfire Case Slower Than the No Crossfire Case on an Idle System?40A.2.1 What Resources Are Used When a Single Read-Only or Write-Only Thread Accesses Remote Data?40A.2.2 What Resources Are Used When Two Write-only Threads Fire at Each Other (Crossfire) on an Idle System?40A.2.3 What Role Do Buffers Play in the Throughput Observed?41A.2.4 What Resources Are Used When Write-Only Threads Do Not Fire at Each Other (No Crossfire) on an Idle System?41A.3 Why Is the No Crossfire Case Slower Than the Crossfire Case on a System under a Very High Background Load (Full Subscription)?42A.4 Why Is 0 Hop-0 Hop Case Slower Than the 0 Hop-1 Hop Case on an Idle System for Write- Only Threads?42A.5 Why Is 0 Hop-1 Hop Case Slower Than 0 Hop-0 Hop Case on a System under High Background Load (High Subscription) for Write- Only Threads?43A.6 Support for a ccNUMA-Aware Scheduler for AMD64 ccNUMA Multiprocessor Systems43A.7 Tools and APIs for Thread/Process and Memory Placement (Affinity) for AMD64 ccNUMA Multiprocessor Systems44A.7.1 Support Under Linux®44A.7.2 Support under Solaris45A.7.3 Support under Microsoft® Windows®45A.8 Tools and APIs for Node Interleaving in Various OSs for AMD64 ccNUMA Multiprocessor Systems46A.8.1 Support under Linux®46A.8.2 Support under Solaris46A.8.3 Support under Microsoft® Windows®46A.8.4 Node Interleaving Configuration in the BIOS47Size: 632 KBPages: 48Language: EnglishOpen manual