Bootstrapping: self-compiling compilers
Notice that every compiler involves 3 programming languages:
- the source language processed by the compiler
- the object or target language output by the compiler
- the implementation language in which the compiler
itself is written.
It is sensible for the object language and implementation language to
be the same, although typically the object language is a machine
language or assembly language, and the implementation language is a
higher-level language compiled into the same object language. Usually,
the compiler and its object code are intended to run on the same
machine, but exceptions, called cross-compilers, exist.
Obviously, there is no value in
- a compiler whose source and object languages are the same
- a compiler whose source and implementation languages are the same.
since implementations of both the object and implementation languages
must exist before the source language can be
compiled. Obvious, but false. The first case, source and object
languages the same, is called a program transformer, or in
some cases an optimizer. It is usually intended to convert a
given program to a more efficient program in the same language, but it
might improve the program along other dimensions than run-time
efficiency (for example, there is some experimental success in
automatically converting interpreters, which are easier to write, to
compilers, which are more efficient for many purposes). The most
common optimizers operate on low-level code, such as assembly
language, machine language, or a specially designed intermediate
code. They are usually viewed as part of a larger compiler.
The use of compilers with the same source and implementation language
is quite common and important as an implementation tool. Suppose that we
have designed a new programming language, called Z. We might first
write an easy but inefficient compiler or interpreter for Z, called
EZZ, in whatever language Y is already implemented for us. EZZ might
implement only a portion of the language Z. Then, we write an
excellent compiler for Z, called XLZ in Z itself (using only
the portion of Z that our first inefficient implementation can
process). Now, we compile or interpret XLZ using EZZ, and execute XLZ
on itself. The result is high-quality compiled code for XLZ. This
trick is widely used, and XLZ in this case is called a "bootstrapping
compiler," or a "self-compiling compiler." (Exercise: can we gain
anything by recompiling XLZ using the higher-quality output of the
first compilation? Ability to answer this question quickly is a good
indicator that you understand the idea of bootstrapping.)
What was gained by the bizarre trick of bootstrapping?
- Many (but not all) programming languages are particularly well
suited to the problem of writing their own high-quality
compilers. And, it is much easier to write an inefficient
compiler or even interpreter in whatever language Y is available,
than to write a high-quality compiler directly. If some of the
difficult features of Z are not required for the XLZ compiler,
then a lot of effort may be saved in writing EZZ.
- The implementors of Z have spent a lot of time thinking about
Z, so they are relatively well prepared to use it.
- By using XLZ to compile itself, the implementors force
themselves to test it. Many errors are found at this stage.
Michael J. O'Donnell <odonnell@cs.uchicago.edu>
Last modified: Mon Jan 2 14:37:01 1995