Instruction sequence where control flow starts at the first instruction and ends at the last, without jumps (except at the last instruction).
Transformations within a basic block maintain the block semantics (considered as an atomic entity).
Compiler optimization is an essential process in compiler design that enhances program performance by minimizing execution time, memory consumption, and other metrics. Various optimization techniques can be employed at different stages of the compilation process, such as user, generic, peephole, local, inter-block, and global optimization. Each optimization level has a different focus, ranging from specific user algorithms to the entire code structure, aiming to streamline the execution process and make the compiled code as efficient as possible.
User-level, that is, algorithm- and application-level, optimizations are specific to a particular algorithm or application. They are usually done by the developer who writes the program and knows the algorithm's specificities and the application's behavior. These optimizations aim to rewrite or refactor the code to improve its performance, often based on knowledge of the algorithm or application. For instance, a developer could optimize a sorting algorithm by choosing the most efficient one based on the data's expected characteristics.
Generic optimizations are carried out considering the intermediate code and are processor-independent. These optimizations are typically implemented in the middle-end of the compiler, after the frontend has parsed the code into an intermediate representation but before the backend generates machine code. Generic optimizations can include things like constant folding, dead code elimination, or loop optimization. Because they're not tied to a specific processor or machine architecture, these optimizations can be applied to any program being compiled.
Peephole optimization involves looking at a small window of consecutive instructions in the assembly or machine code and attempting to improve them. The compiler replaces these instructions with a more efficient sequence, reducing redundancy, and improving execution speed. This process could include eliminating unnecessary load and store operations, simplifying arithmetic operations, or replacing slow instructions with faster ones.
Local optimization is focused on enhancing the performance within a basic block (a sequence of code with one entry point and one exit point, with no possibility of branching in between). These blocks are identified after the program has been parsed into an intermediate representation. The optimization here focuses on improving the efficiency of each block by eliminating redundant computations, dead code, and performing register allocation, among others.
Inter-block optimization takes into consideration the information flow between blocks, mainly cycles. In this phase, the compiler looks for optimization opportunities that involve more than one basic block, such as loop unrolling or loop invariant code motion. These optimizations can help to minimize redundant computations and enhance overall code efficiency.
Global optimizations focus on the entire function or program, optimizing across multiple jumps, including jumps to jumps and jumps to the next instruction. These optimizations are more complex than local optimizations, as they require a broader view of the program's control flow. Global optimizations may include function inlining, loop unrolling across multiple basic blocks, or interprocedural optimizations that span multiple functions or procedures. These kinds of optimizations can significantly improve execution speed but can also increase the size of the compiled code.
Direct evaluation of expressions using only literal and known values:
int *x = (int *)malloc(1024 * 4);
int y = 3;
int z = 20 * y + 10;
Becomes:
int *x = (int *)malloc(4096);
int y = 3;
int z = 70;
Using temporary variables to store common results:
d = c * (a + b)
e = (a + b) / 2
Becomes:
t = a + b
d = c * t
e = t / 2
for (i = 1; i < n; i++)
a[i] = b[i] + 10;
Becomes:
for (i = 1; i < (n/4)*4; i+=4) {
a[i] = b[i] + 10;
a[i+1] = b[i+1] + 10;
a[i+2] = b[i+2] + 10;
a[i+3] = b[i+3] + 10;
}
// rest of the cycle
for (; i < n; i++)
a[i] = b[i] + 10;
for (i = 1; i < n; i++) {
a[i] = b[i] + c * d;
e = g[k];
}
Becomes:
t = c * d;
for (i = 1; i < n; i++) {
a[i] = b[i] + t;
}
e = g[k];
for (i = 1; i < n; i++)
k = i * 4 + m;
Becomes:
k = m;
for (i = 1; i < n; i++)
k = k + 4;
xor eax,eax
instead of
mov 0,eax
x << 2 + x
instead of
x * 5
Reorder independent instructions in order to avoid register spilling.