Intel ITB999ASGE1 User Manual

Additional components for Performance and Productivity

Parallel Algorithms
Generic implementation of
common patterns

Generic implementations of parallel patterns such as parallel loops, flow graphs, and pipelines can be an
easy way to achieve a scalable parallel implementation without developing a custom solution from scratch.

Concurrent Containers
Generic implementation of
common idioms for
concurrent access

Intel® Threading Building Blocks (Intel® TBB) concurrent containers are a concurrency-friendly alternative
to serial data containers. Serial data structures (such as C++ STL containers) often require a global lock to
protect them from concurrent access and modification; Intel TBB concurrent containers allow multiple
threads to concurrently access and update items in the container increasing allowed concurrency and
improving an application’s scalability.

Synchronization Primitives
Exception-safe locks,
condition variables, and
atomic operations

Intel TBB provides a comprehensive set of synchronization primitives with different qualities that are
applicable to common synchronization strategies. Exception-safe implementation of locks helps to avoid a
dead-lock in programs which use C++ exceptions. Usage of Intel TBB atomic variables instead of the C-
style atomic API minimizes potential data races.

Scalable Memory Allocators
Scalable memory manager
and false-sharing free
memory allocator

The scalable memory allocator avoids scalability bottlenecks by minimizing access to a shared memory
heap via per-thread memory pool management. Special management of large (≥8KB) blocks allows more
efficient resource usage, while still offering scalability and competitive performance. The cache-aligned
memory allocator avoids false-sharing by not allowing allocated memory blocks to split a cache line.

Create arbitrary task trees

When an algorithm cannot be expressed with high-level Intel TBB constructs, the user can choose to
create arbitrary task trees. Tasks can be spawned for better locality and performance or en-queued to
maintain FIFO-like order and ensure starvation-resistant execution.

Conditional Numerical
Reproducibility

Ensure deterministic associativity for floating-point arithmetic results with the new Intel TBB template
function ‘parallel_deterministic_reduce’.

C++11 Support

Intel TBB can be used with C++11 compilers and supports lambda expressions. For developers using
parallel algorithms, lambda expressions reduce the time and code needed by removing the requirement for
separate objects or classes.

Select the right Intel® Threading Building Blocks (Intel® TBB) license

 Commercial Binary Distribution for customers who may require commercial support services. Attractive pricing available for academic,

student and classroom usage.

 Open Source Distribution can be used under GPLv2 with the runtime exception allowing usage in proprietary applications. Allows support

for additional OSs and hardware platforms. Both source and binary forms are available for download from

http://threadingbuildingblocks.org

 Custom license available if you require the ability to modify or distribute the commercial source code of Intel TBB. Contact your Intel

representative for more information.

What’s New in version 4.2

Feature

Benefit

Support for Latest Intel
Architectures

Take advantage of the newest features in Intel’s latest processors including Transactional Synchronization
Extensions (TSX).  Adds support for Intel® Xeon Phi™ coprocessor for Windows and Intel® Xeon™ Processor
(Ivy Bridge-EP).
Selecting the best models for your application today will set a path for you to take full advantage of
multicore and many-core performance without re-writing your code.  Start today by implementing
parallelism for today’s architecture and be ready for future architectures.

Lower memory overhead

Improved heuristics in the memory allocator reduce memory overhead by intelligently releasing unused or
stale memory.

Improved handling of large
memory requests

Improved handling of large (>8K-128MB) memory requests results in better performance when using
frequent large memory allocations. Use of big memory pages can now be explicitly enabled via a function
call or environment variable.

Better Fork Support

Fork safety through a user enabled API that ensures Intel TBB worker threads are completed before
executing a fork.

PPL* Compatibility

Improved compatibility with Parallel Patterns Library (PPL) by adding concurrent_unordered_multimap and
concurrent_unordered_multiset API’s.

Windows* Store

Customers that use Intel TBB in their applications can now submit and sell their app through the Windows
Store.