Code generation (compiler)

In computing, code generation is part of the process chain of a compiler and converts intermediate representation of source code into a form (e.g., machine code) that can be readily executed by the target system.

Sophisticated compilers typically perform

code optimization are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (the backend) needs to change from target to target. (For more information on compiler design, see Compiler

.)

The input to the code generator typically consists of a

intermediate language such as three-address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, a peephole optimization

pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.)

Major tasks

In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way.

Tasks which are typically part of a sophisticated compiler's "code generation" phase include:

Instruction selection: which instructions to use.
pipelined
machines.
variables to processor registers^[2]

Debug data generation if required so the code can be debugged.

Instruction selection is typically carried out by doing a

postorder traversal

on the abstract syntax tree, matching particular tree configurations against templates; for example, the tree W := ADD(X,MUL(Y,Z)) might be transformed into a linear sequence of instructions by recursively generating the sequences for t1 := X and t2 := MUL(Y,Z), and then emitting the instruction ADD W, t1, t2.

In a compiler that uses an intermediate language, there may be two instruction selection stages—one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from the

language translator (for example, one that converts Java to C++

), then the second code-generation phase may involve building a tree from the linear intermediate code.

Runtime code generation

When code generation occurs at

finite state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage of profiling

information that is available only at runtime.

Related concepts

The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the core

YACC (Yet Another Compiler-Compiler) takes input in Backus–Naur form and converts it to a parser in C. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed.^[3]

Many

Data transformation

.)

Reflection

In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g.,

reflection

becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution.