|
About the High-Level Optimizer (HLO)
HLO takes advantage of the processor's cache.
How Cache Works
- The processor requests data from memory. It may take, for example,
100 cycles to obtain that data.
- The processor takes the requested data plus additional data,
called a cache line, in addition to the requested data.
- The processor places the cache line into the cache, hoping that
the code will subsequently request memory that is now sitting
in cache. When the processor accesses the data from cache, it
takes significantly fewer cycles than from memory.
|
On both the IA-32 and Itanium® architectures, -O3
invokes the the High Level Optimizer (HLO), which enables the -O2
option plus more aggressive optimizations, such as loop transformation
and prefetching. HLO optimizes for maximum speed, and may actually rewrite
your algorithm to get the most cache hits possible. See the box to the
right for an explanation of how cache works.
As with the vectorizer, loops must meet the following criteria to qualify
for HLO:
- iteration independence
- memory disambiguationall
memory references within the loop are unique.
- high loop count
Pro
HLO optimizes for maximum speed.
Cons
- Because using HLO may cause the compiler to rewrite your algorithm,
HLO is less safe than the -O2 option, and
it may not improve performance for some programs.
- On IA-32, in conjunction with the vectorization options (-Qax[i|M|K|W]
and -Qx[i|M|K|W]) this option causes the
compiler to perform more aggressive data dependency analysis than
for -O2, which may result in longer compilation
times.
|
|