Anecdote for CS 221

Bootstrapping: self-compiling compilers

Notice that every compiler involves 3 programming languages:

the source language processed by the compiler
the object or target language output by the compiler
the implementation language in which the compiler itself is written.

It is sensible for the object language and implementation language to be the same, although typically the object language is a machine language or assembly language, and the implementation language is a higher-level language compiled into the same object language. Usually, the compiler and its object code are intended to run on the same machine, but exceptions, called cross-compilers, exist.

Obviously, there is no value in

a compiler whose source and object languages are the same
a compiler whose source and implementation languages are the same.

since implementations of both the object and implementation languages must exist before the source language can be compiled. Obvious, but false. The first case, source and object languages the same, is called a program transformer, or in some cases an optimizer. It is usually intended to convert a given program to a more efficient program in the same language, but it might improve the program along other dimensions than run-time efficiency (for example, there is some experimental success in automatically converting interpreters, which are easier to write, to compilers, which are more efficient for many purposes). The most common optimizers operate on low-level code, such as assembly language, machine language, or a specially designed intermediate code. They are usually viewed as part of a larger compiler.

The use of compilers with the same source and implementation language is quite common and important as an implementation tool. Suppose that we have designed a new programming language, called Z. We might first write an easy but inefficient compiler or interpreter for Z, called EZZ, in whatever language Y is already implemented for us. EZZ might implement only a portion of the language Z. Then, we write an excellent compiler for Z, called XLZ in Z itself (using only the portion of Z that our first inefficient implementation can process). Now, we compile or interpret XLZ using EZZ, and execute XLZ on itself. The result is high-quality compiled code for XLZ. This trick is widely used, and XLZ in this case is called a "bootstrapping compiler," or a "self-compiling compiler." (Exercise: can we gain anything by recompiling XLZ using the higher-quality output of the first compilation? Ability to answer this question quickly is a good indicator that you understand the idea of bootstrapping.)

What was gained by the bizarre trick of bootstrapping?

Many (but not all) programming languages are particularly well suited to the problem of writing their own high-quality compilers. And, it is much easier to write an inefficient compiler or even interpreter in whatever language Y is available, than to write a high-quality compiler directly. If some of the difficult features of Z are not required for the XLZ compiler, then a lot of effort may be saved in writing EZZ.
The implementors of Z have spent a lot of time thinking about Z, so they are relatively well prepared to use it.
By using XLZ to compile itself, the implementors force themselves to test it. Many errors are found at this stage.

Michael J. O'Donnell <odonnell@cs.uchicago.edu>

Last modified: Mon Jan 2 14:37:01 1995