The GAP Report in this example recommends using the -parallel option to enable parallelization. From the command-line, execute make gap_par_report, or run the following:
icpc -c -guide -parallel scalar_dep.cpp
The compiler emits the following:
GAP REPORT LOG OPENED ON Wed Jul 28 14:33:09 2010 scalar_dep.cpp(51): remark #30523: (PAR) Loop at line 51 cannot be parallelized due to conditional assignment(s) into the following variable(s): b. This loop will be parallelized if the variable(s) become unconditionally initialized at the top of every iteration. [VERIFY] Make sure that the value(s) of the variable(s) read in any iteration of the loop must have been written earlier in the same iteration. [ALTERNATIVE] Another way is to use "#pragma parallel private(b)" to parallelize the loop. [VERIFY] The same conditions described previously must hold. scalar_dep.cpp(51): remark #30525: (PAR) If the trip count of the loop at line 51 is greater than 188, then use "#pragma loop count min(188)" to parallelize this loop. [VERIFY] Make sure that the loop has a minimum of 188 iterations. Number of advice-messages emitted for this compilation session: 2. END OF GAP REPORT LOG
In the GAP Report, remark #30523 indicates that loop at line 51 cannot parallelize because the variable b is conditionally assigned. Remark #30525 indicates that the loop trip count must be greater than 188 for the compiler to parallelize the loop.
Apply the necessary changes after verifying that the GAP recommendations are appropriate and do not change the semantics of the program.
For this loop, the conditional compilation enables parallelization and vectorization of the loop as recommended by GAP:
#ifdef TEST_GAP #pragma loop count min (188) for (i=0; i<n; i++) { b = A[i]; if (A[i] > 0) {A[i] = 1 / A[i];} if (A[i] > 1) {A[i] += b;} } #else for (i=0; i<n; i++) { if (A[i] > 0) {b=A[i]; A[i] = 1 / A[i]; } if (A[i] > 1) {A[i] += b;} } #endif }
To verify that the loop is parallelized and vectorized:
Add the compiler options -vec-report1 -par-report1.
Add the conditional definition TEST_GAP to compile the appropriate code path.
From the command-line, execute make final, or run the following:
icpc -c -parallel -DTEST_GAP -vec-report1 -par-report1 scalar_dep.cpp
The compiler's -vec-report and -par-report options emit the following output, confirming that the program is vectorized and parallelized:
scalar_dep.cpp(43) (col. 3): remark: LOOP WAS AUTO-PARALLELIZED. scalar_dep.cpp(43) (col. 3): remark: LOOP WAS VECTORIZED. scalar_dep.cpp(43) (col. 3): remark: LOOP WAS VECTORIZED.
For more information on using the -guide, -vec-report, and -par-report compiler options, see the Compiler Options section in the Compiler User Guide and Reference.
This completes the tutorial for Guided Auto-parallelization, where you have seen how the compiler can guide you to an optimized solution through auto-parallelization.
Copyright © 2010, Intel Corporation. All rights reserved.