About the High-Level Optimizer (HLO)

HLO takes advantage of the processor's cache.

How Cache Works

  1. The processor requests data from memory. It may take, for example, 100 cycles to obtain that data.
  2. The processor takes the requested data plus additional data, called a cache line, in addition to the requested data.
  3. The processor places the cache line into the cache, hoping that the code will subsequently request memory that is now sitting in cache. When the processor accesses the data from cache, it takes significantly fewer cycles than from memory.

 

On both the IA-32 and Itanium® architectures, -O3 invokes the the High Level Optimizer (HLO), which enables the -O2 option plus more aggressive optimizations, such as loop transformation and prefetching. HLO optimizes for maximum speed, and may actually rewrite your algorithm to get the most cache hits possible. See the box to the right for an explanation of how cache works.

As with the vectorizer, loops must meet the following criteria to qualify for HLO:

  • iteration independence
  • memory disambiguation—all memory references within the loop are unique.
  • high loop count

Pro

HLO optimizes for maximum speed.

Cons

  • Because using HLO may cause the compiler to rewrite your algorithm, HLO is less safe than the -O2 option, and it may not improve performance for some programs.
  • On IA-32, in conjunction with the vectorization options (-Qax[i|M|K|W] and -Qx[i|M|K|W]) this option causes the compiler to perform more aggressive data dependency analysis than for -O2, which may result in longer compilation times.