Implementing Guided Auto-parallelization Recommendations

The GAP Report in this example recommends using the -parallel option to enable parallelization. From the command-line, execute make gap_par_report, or run the following:

icpc -c -guide -parallel scalar_dep.cpp

The compiler emits the following:

GAP REPORT LOG OPENED ON Wed Jul 28 14:33:09 2010
scalar_dep.cpp(51): remark #30523: (PAR) Loop at line 51 cannot be parallelized
due to conditional assignment(s) into the following variable(s): b. This loop will
be parallelized if the variable(s) become unconditionally initialized at the top of
every iteration. [VERIFY] Make sure that the value(s) of the variable(s) read in any
iteration of the loop must have been written earlier in the same iteration.
[ALTERNATIVE] Another way is to use "#pragma parallel private(b)" to parallelize the
loop. [VERIFY] The same conditions described previously must hold.
scalar_dep.cpp(51): remark #30525: (PAR) If the trip count of the loop at line 51 is
greater than 188, then use "#pragma loop count min(188)" to parallelize this loop.
[VERIFY] Make sure that the loop has a minimum of 188 iterations.
Number of advice-messages emitted for this compilation session: 2.
END OF GAP REPORT LOG

In the GAP Report, remark #30523 indicates that loop at line 51 cannot parallelize because the variable b is conditionally assigned. Remark #30525 indicates that the loop trip count must be greater than 188 for the compiler to parallelize the loop.

Apply the necessary changes after verifying that the GAP recommendations are appropriate and do not change the semantics of the program.

For this loop, the conditional compilation enables parallelization and vectorization of the loop as recommended by GAP:

#ifdef TEST_GAP
#pragma loop count min (188)
  for (i=0; i<n; i++) {
        b = A[i];
    if (A[i] > 0) {A[i] = 1 / A[i];}
    if (A[i] > 1) {A[i] += b;}
  }

#else

for (i=0; i<n; i++) {
    if (A[i] > 0) {b=A[i]; A[i] = 1 / A[i]; }
    if (A[i] > 1) {A[i] += b;}
  }
#endif

}

To verify that the loop is parallelized and vectorized:

From the command-line, execute make final, or run the following:

icpc -c -parallel -DTEST_GAP -vec-report1 -par-report1 scalar_dep.cpp

The compiler's -vec-report and -par-report options emit the following output, confirming that the program is vectorized and parallelized:

scalar_dep.cpp(43) (col. 3): remark: LOOP WAS AUTO-PARALLELIZED.
scalar_dep.cpp(43) (col. 3): remark: LOOP WAS VECTORIZED.
scalar_dep.cpp(43) (col. 3): remark: LOOP WAS VECTORIZED.

For more information on using the -guide, -vec-report, and -par-report compiler options, see the Compiler Options section in the Compiler User Guide and Reference.

This completes the tutorial for Guided Auto-parallelization, where you have seen how the compiler can guide you to an optimized solution through auto-parallelization.

Previous: Analyzing Guided Auto-parallelization Reports


Submit feedback on this help topic

Copyright © 2010, Intel Corporation. All rights reserved.