-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Compiler internals
Here we explain how the compiler works. We link to the relevant code whenever possible, but since code changes we will use this version, which is a recent one and where later changes have a low impact on the general algorithm.
The main file that is compiled to generate the compiler is src/compiler/crystal.cr. Here all source code relative to Crystal is required and then Crystal::Command.run
is executed. The Command module provides a command line interface to the compiler. According to command line options, it creates a Compiler, configures it and then uses it to compile one or more source files.
Let's see what the Compiler class does.
The main public method of the Compiler class is compile.
First of all a Program is instantiated. A Program represents the top level container of everything. It's like a top level module that can have classes and methods. It's similar to Ruby's main
when you do puts self
at the top-level. However, unlike Ruby, when you define a method at the top level it gets defined in this Program, not as a private method of Object.
As you can see in Program's source code some basic types common to all programs are defined, like Object, Nil and String.
The Program is also a container of data associated to a single compilation, so for example it keeps track of all the symbols that were used (symbols can't be dynamically created), as well as some configurations, like CRYSTAL_PATH (similar to Ruby's $LOAD_PATH, only it is immutable).
Going back to Compiler#compile
, a Program is created and configured. Then the source code is parsed (also here). After parsing each file into an AST it is normalized. Normalization consists of transforming some AST nodes into others. The most important transformation is transforming a require into AST nodes that result from actually requiring that file. Other transformations are, for example, transforming an unless to an if
by simply inverting the branches.
At the end of this stage we will have an AST node representing the whole program, with all requires expanded (the special "prelude" file is automatically required). This AST node will probably be an Expressions node, which just represents more than one AST node.
The next step is the most important one: type inference.
The name of this stage is actually misleading. It's called infer_type in the source code, but many things happen here. This has two important consequences: 1) The code is harder to follow and understand, because many concerns are mixed, and 2) The compiler is faster, as the whole program is traversed just once. We believe point 2 is more important than 1, as there will be more developers using the compiler than developers developing the compiler, and compile times are very important for us.
Let's take a look at what Program#infer_type(node) does.
The first and most important thing it does is to create a TypeVisitor to traverse that AST node. This makes use of the Visitor Pattern, which is one that is heavily used across the compiler and it's one of the most useful ways to deal with an AST node in a generic way. Because of Crystal's multi-dispatch feature implementing the Visitor Pattern is very easy, as no manual double-dispatch is needed.
To be continued...