In our latest article we discussed variable allocations, hidden classes and how V8 handles our JavaScript code. Now we're going to dive a little bit more into the compiling pipeline and the components that V8 is made up of.
Prior to the V8.5.9 release in 2017, V8 had an old execution pipeline which was composed of the full-codegen compiler, and a JIT compiler called Crankshaft, which had two subcomponents called Hydrogen and Lithium. This image from Mathias Bynens illustrates well our old pipeline:
Let's talk about them a little bit.
The full-codegen compiler
Full-codegen compiler is a simple and very fast compiler that produced simple and relatively slow (not-optimised) machine code. The main purpose of this compiler is to be absolutely fast, but to write extremely shitty code. So it translates JS to machine code at the speed of light, however the code is not optimised and might be very slow. Also, it handles the type-feedback that collects information about data types and usage of our functions as our program runs.
It firstly takes our AST, walks over all the nodes and emits calls to a macro-assembler directly. The result: generic native code. That's it! The full-codegen fulfilled its purpose. All the complex cases are handled by emitting calls to runtime procedures and all local variables are stored on heap, like the usual. The magic starts when V8 perceives hot and cold functions!
A hot function is a function that is called several times during the execution of our program so it needs to be optimised more than the others. A cold function is the exact opposite. That's when the Crankshaft compiled comes on.
Crankshaft
The Crankshaft compiler used to be the default JIT compiler that handled all the optimisation parts of JS.
After receiving the type information and call information from the runtime that full-codegen created, it analyses the data and see which functions have become hot. Then Crankshaft can walk the AST generating optimised code for these particular functions. Afterwards, the optimised function will replace the unoptimised one using what is called on-stack replacement (OSR).
But, this optimised function does not cover all cases, since it is optimised only to work with those defined types we were passing during execution. Let's imagine our readFile
function. In the first lines we have this:
const readFileAsync = (filePath) => { /* ... */ }
Let's supose this function is hot, filePath
is a string, so Crankshaft will optimize it to work with a string. But now, let's imagine the filePath
is null
, or maybe a number (who knows?). Then the optimized function would not be fit for this case. So Crankshaft will de-optimize the function, replacing it with the original function.
In order to explain how this whole magic works, we need to understand a few parts inside Crankshaft.
Hydrogen compiler
The Hydrogen compiler takes the AST with type-feedback information as its input. Based on that information it generates what's called a high-level intermediate representation (HIR) which has a control-flow graph (CFG) in the static-single assignment form (SSA) which is something like this:
For this given function:
function clamp (x, lower, upper) {
if (x < lower) x = lower
else if (x > upper) x = upper
return x
}
An SSA translation would be:
entry:
x0, lower0, upper0 = args;
goto b0;
b0:
t0 = x0 < lower0;
goto t0 ? b1 : b2;
b1:
x1 = lower0;
goto exit;
b2:
t1 = x0 > upper0;
goto t1 ? b3 : exit;
b3:
x2 = upper0;
goto exit;
exit:
x4 = phi(x0, x1, x2);
return x4;
In SSA variables are never assigned again; they are bound once to their value and that's it. This form breaks down any procedure into several basic blocks of computation which ends with a branch to another block whether this branch is conditional or not. As you can see variables are bound to unique names at each assignment and, in the end, the phi
function takes all the x
s and merge them together, returning the one which has a value.
When the HIR is being generated, Hydrogen applies several optimisations to the code such as constant folding, method inlining and other stuff we'll see at the end of this guide - there's a whole section to it.
The result Hydrogen outputs is an optimised CFG which the next compiler, Lithium, takes as input to generate actual optimised code.
Lithium compiler
As we said, the Lithium compiler takes the HIR and translates into a machine-specific low-level intermediate representation (LIR). Which is conceptually similar to what a machine code should be, but also platform independent.
While this LIR is being generated, new code optimisations are applied, but this time those are low-level optimisations.
In the end, this LIR is read and Crankshaft generates a sequence of native instructions for every Lithium instruction, the OSR is applied and then the code is executed.
Conclusion
This is the first of two parts when we talk about the V8 compiling pipelines. So stay alert for the next article of this series!
If you found any typos, errors or anything wrong with this article, give me your feedback! All feedbacks are appreciated and help me improve my content quality! <3