Threaded Functions and Problems

The following Intel(R) Math Kernel Library (Intel(R) MKL) function domains are threaded:

Optimization Notice

The Intel® Math Kernel Library (Intel® MKL) contains functions that are more highly optimized for Intel microprocessors than for other microprocessors. While the functions in Intel® MKL offer optimizations for both Intel and Intel-compatible microprocessors, depending on your code and other factors, you will likely get extra performance on Intel microprocessors.

While the paragraph above describes the basic optimization approach for Intel® MKL as a whole, the library may or may not be optimized to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include Intel® Streaming SIMD Extensions 2 (Intel® SSE2), Intel® Streaming SIMD Extensions 3 (Intel® SSE3), and Supplemental Streaming SIMD Extensions 3 (Intel® SSSE3) instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors.

Intel recommends that you evaluate other library products to determine which best meets your requirements.

Threaded LAPACK Routines

In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following LAPACK routines are threaded:

A number of other LAPACK routines, which are based on threaded LAPACK or BLAS routines, make effective use of parallelism:
?gesv, ?posv, ?gels, ?gesvd, ?syev, ?heev, cgegs/zgegs, cgegv/zgegv, cgges/zgges, cggesx/zggesx, cggev/zggev, cggevx/zggevx,
and so on.

Threaded BLAS Level1 and Level2 Routines

In the following list, ? stands for a precision prefix of each flavor of the respective routine and may have the value of s, d, c, or z.

The following routines are threaded for Intel(R) Core™2 Duo and Intel(R) Core™ i7 processors:

Threaded FFT Problems

The following characteristics of a specific problem determine whether your FFT computation may be threaded:

Most FFT problems are threaded. In particular, computation of multiple transforms in one call (number of transforms > 1) is threaded. Details of which transforms are threaded follow.

One-dimensional (1D) transforms

1D transforms are threaded in many cases.

1D complex-to-complex (c2c) transforms of size N using interleaved complex data layout are threaded under the following conditions depending on the architecture:

Architecture

Conditions

Intel(R) 64

N is a power of 2, log2(N) > 9, the transform is double-precision out-of-place, and input/output strides equal 1.

IA-32

N is a power of 2, log2(N) > 13, and the transform is single-precision.

N is a power of 2, log2(N) > 14, and the transform is double-precision.

Any

N is composite, log2(N) > 16, and input/output strides equal 1.

1D real-to-complex and complex-to-real transforms are not threaded.

1D complex-to-complex transforms using split-complex layout are not threaded.

Prime-size complex-to-complex 1D transforms are not threaded.

Multidimensional transforms

All multidimensional transforms on large-volume data are threaded.


Submit feedback on this help topic

Copyright © 2006 - 2010, Intel Corporation. All rights reserved.