The Language Translation Pipeline

Compilers translate source text through lexical analysis, parsing, semantic analysis, and code generation before a program can run.

A compiler is not one giant pass. It is a staged translation pipeline.

Sebesta frames the compiler as a series of transformations, each responsible for a different class of questions. Characters become tokens, tokens become structure, structure becomes checked meaning, and meaning becomes executable instructions.

When to reach for this

Reach for this concept when you want to understand where syntax errors, semantic errors, and generated code each come from inside a compiler.

Why this matters

Once you see the compiler as a pipeline, error messages, tooling, and language implementation stop feeling mysterious. You can place each kind of bug in the stage that actually owns it.

The mental model

Each stage adds structure

The compiler never jumps straight from characters to machine code. Every stage makes the program representation richer and more checkable.

Each stage can halt the build

A malformed program does not need to reach later stages. Syntax failure stops before semantic checking. Semantic failure stops before code generation.

Step through the concept

How to use this page

Follow the animation one state at a time and connect the code to the runtime behavior.

  • Compare the successful pipeline to the syntax-error pipeline so you can see exactly where the build stops.
  • Track how the representation changes: source text, tokens, parse tree, semantic facts, instructions.
  • Notice that each stage answers a different class of question.
Compiler pipelineEvery stage succeeds
Tokens become syntax trees, then checked programs, then instructions
Pipeline stages
Source
Lexer
Parser
Semantic Analysis
Codegen
Source program
1let total = price + tax;
2print(total);
Input

The compiler starts with raw source text.

let total = price + tax;
print(total);
Step 1 of 6
What
The pipeline begins with source code as plain text.
Why
Compilers never reason about characters all at once. They pass the program through stages that add more structure each time.
Compiler skeleton
tokens = lex(source)
tree = parse(tokens)
checked = analyze(tree)
output = generate(checked)

Different compiler stages solve different problems

AspectPrimary questionTypical output
LexerWhat are the tokens?Token stream
ParserDo the tokens fit the grammar?Syntax tree
Semantic analysisDoes the program make sense?Annotated tree / diagnostics
Code generationHow do we execute it?Instructions or target code

The short version

  • The compiler is a sequence of stages, not one monolithic operation.
  • Lexing, parsing, semantic analysis, and code generation each answer a different question.
  • Errors stop the pipeline at the stage responsible for them.
  • Understanding the pipeline makes compiler diagnostics much easier to interpret.