A List Apart

An illustration of a man fishing from a pier, yanking an enormous fish out of the water

Illustration by Dougal MacPherson

var to JIT

In our previous article we described how the browser uses CSS to render beautiful pixels to the user’s screen. Although modern CSS can (and should!) be used to create highly interactive user experiences, for the last mile of interactivity, we need to dynamically update the HTML document. For that, we’re going to need JavaScript.

Bundle to bytecode

Article Continues Below

For a modern web application, the JavaScript that the browser first sees will typically not be the JavaScript written by a developer. Instead, it will most likely be a bundle produced by a tool such as webpack. And it will probably be a rather large bundle containing a UI framework such as React, various polyfills (libraries that emulate new platform features in older browsers), and an assortment of other packages found on npm. The first challenge for the browser’s JavaScript engine is to convert that big bundle of text into instructions that can be executed on a virtual machine. It needs to parse the code, and because the user is waiting on JavaScript for all that interactivity, it needs to do it fast.

At a high level, the JavaScript engine parses code just like any other programming language compiler. First, the stream of input text is broken up into chunks called tokens. Each token represents a meaningful unit within the syntactic structure of the language, similar to words and punctuation in natural written language. Those tokens are then fed into a top-down parser that produces a tree structure representing the program. Language designers and compiler engineers like to call this tree structure an AST (abstract syntax tree). The resulting AST can then be analyzed to produce a list of virtual machine instructions called bytecode.

JavaScript is run through the abstract syntax tree, which produces byte code

The process of generating an AST is one of the more straightforward aspects of a JavaScript engine. Unfortunately, it can also be slow. Remember that big bundle of code we started out with? The JavaScript engine has to parse and build syntax trees for the entire bundle before the user can start interacting with the site. Much of that code may be unnecessary for the initial page load, and some may not even be executed at all!

Fortunately, our compiler engineers have invented a variety of tricks to speed things up. First, some engines parse code on a background thread, freeing up the main UI thread for other computations.  Second, modern engines will delay the creation of in-memory syntax trees for as long as possible by using a technique called deferred parsing or lazy compilation. It works like this: if the engine sees a function definition that might not be executed for a while, it will perform a fast, “throwaway” parse of the function body. This throwaway parse will find any syntax errors that might be lurking within the code, but it will not generate an AST. Later, when the function is called for the first time, the code will be parsed again. This time, the engine will generate the full AST and bytecode required for execution. In the world of JavaScript, doing things twice can sometimes be faster than doing things once!

The best optimizations, though, are the ones that allow us to bypass doing any work at all. In the case of JavaScript compilation, this means skipping the parsing step completely. Some JavaScript engines will attempt to cache the generated bytecode for later reuse in case the user visits the site again. This isn’t quite as simple as it sounds. JavaScript bundles can change frequently as websites are updated, and the browser must carefully weigh the cost of serializing bytecode against the performance improvements that come from caching.

Bytecode to runtime

Now that we have our bytecode, we’re ready to start execution. In today’s JavaScript engines, the bytecode that we generated during parsing is first fed into a virtual machine called an interpreter. An interpreter is a bit like a CPU implemented in software. It looks at each bytecode instruction, one at a time, and decides what actual machine instructions to execute and what to do next.

The structure and behavior of the JavaScript programming language is defined in a document formally known as ECMA-262. Language designers like to call the structure part “syntax” and the behavior part “semantics.” The semantics of almost every aspect of the language is defined by algorithms that are written using prose-like pseudo-code. For instance, let’s pretend we are compiler engineers implementing the signed right shift operator (>>). Here’s what the specification tells us:

ShiftExpression : ShiftExpression >> AdditiveExpression

  1. Let lref be the result of evaluating ShiftExpression.
  2. Let lval be ? GetValue(lref).
  3. Let rref be the result of evaluating AdditiveExpression.
  4. Let rval be ? GetValue(rref).
  5. Let lnum be ? ToInt32(lval).
  6. Let rnum be ? ToUint32(rval).
  7. Let shiftCount be the result of masking out all but the least significant 5 bits of rnum, that is, compute rnum & 0x1F.
  8. Return the result of performing a sign-extending right shift of lnum by shiftCount bits. The most significant bit is propagated. The result is a signed 32-bit integer.

In the first six steps we convert the operands (the values on either side of the >>) into 32-bit integers, and then we perform the actual shift operation. If you squint, it looks a bit like a recipe. If you really squint, you might see the beginnings of a syntax-directed interpreter.

Unfortunately, if we implemented the algorithms exactly as they are described in the specification, we’d end up with a very slow interpreter. Consider the simple operation of getting a property value from a JavaScript object.

Objects in JavaScript are conceptually like dictionaries. Each property is keyed by a string name. Objects can also have a prototype object.

A JavaScript object with a prototype, an arrow pointing to an object.prototype, an arrow pointing to obj, an arrow pointing to obj2

If an object doesn’t have an entry for a given string key, then we need to look for that key in the prototype. We repeat this operation until we either find the key that we’re looking for or get to the end of the prototype chain.

That’s potentially a lot of work to perform every time we want to get a property value out of an object!

The strategy used in JavaScript engines for speeding up dynamic property lookup is called inline caching. Inline caching was first developed for the language Smalltalk in the 1980s. The basic idea is that the results from previous property lookup operations can be stored directly in the generated bytecode instructions.

To see how this works, let’s imagine that the JavaScript engine is a towering gothic cathedral. As we step inside, we notice that the engine is chock full of objects swarming around. Each object has an identifiable shape that determines where its properties are stored.

Now, imagine that we are following a series of bytecode instructions written on a scroll. The next instruction tells us to get the value of the property named “x” from some object. You grab that object, turn it over in your hands a few times to figure out where “x” is stored, and find out that it is stored in the object’s second data slot.

It occurs to you that any object with this same shape will have an “x” property in its second data slot.  You pull out your quill and make a note on your bytecode scroll indicating the shape of the object and the location of the “x” property. The next time you see this instruction you’ll simply check the shape of the object. If the shape matches what you’ve recorded in your bytecode notes, you’ll know exactly where the data is located without having to inspect the object. You’ve just implemented what’s known as a monomorphic inline cache!

But what happens if the shape of the object doesn’t match our bytecode notes? We can get around this problem by drawing a small table with a row for each shape we’ve seen. When we see a new shape, we use our quill to add a new row to the table. We now have a polymorphic inline cache. It’s not quite as fast as the monomorphic cache, and it takes up a little more space on the scroll, but if there aren’t too many rows, it works quite well.

If we end up with a table that’s too big, we’ll want to erase the table, and make a note to remind ourselves to not worry about inline caching for this instruction. In compiler terms, we have a megamorphic callsite.

In general, monomorphic code is very fast, polymorphic code is almost as fast, and megamorphic code tends to be rather slow. Or, in haiku form:

One shape, flowing wind
Several shapes, jumping fox
Many shapes, turtle

Interpreter to just-in-time (JIT)

The great thing about an interpreter is that it can start executing code quickly, and for code that is run only once or twice, this “software CPU” performs acceptably fast. But for “hot code” (functions that are run hundreds, thousands, or millions of times) what we really want is to execute machine instructions directly on the actual hardware. We want just-in-time (JIT) compilation.

As JavaScript functions are executed by the interpreter, various statistics are gathered about how often the function has been called and what kinds of arguments it is called with. If the function is run frequently with the same kinds of arguments, the engine may decide to convert the function’s bytecode into machine code.

Let’s step once again into our hypothetical JavaScript engine, the gothic cathedral. As the program executes, you dutifully pull bytecode scrolls from carefully labeled shelves. For each function, there is roughly one scroll. As you follow the instructions on each scroll, you record how many times you’ve executed the scroll. You also note the shapes of the objects encountered while carrying out the instructions. You are, in effect, a profiling interpreter.

When you open the next scroll of bytecode, you notice that this one is “hot.” You’ve executed it dozens of times, and you think it would run much faster in machine code. Fortunately, there are two rooms full of scribes that are ready to perform the translation for you. The scribes in the first room, a brightly lit open office, can translate bytecode into machine code quite fast. The code that they produce is of good quality and is concise, but it’s not as efficient as it could be. The scribes in the second room, dark and misty with incense, work more carefully and take a bit longer to finish. The code that they produce, however, is highly optimized and about as fast as possible.

In compiler-speak, we refer to these different rooms as JIT compilation tiers. Different engines have different numbers of tiers depending on the tradeoffs they’ve chosen to make.

You decide to send the bytecode to the first room of scribes. After working on it for a bit, using your carefully recorded notes, they produce a new scroll containing machine instructions and place it on the correct shelf alongside the original bytecode version. The next time you need to execute the function, you can use this faster set of instructions.

The only problem is that the scribes made quite a few assumptions when they translated our scroll. Perhaps they assumed that a variable would always hold an integer. What happens if one of those assumptions is invalidated?

In that case we must perform what’s known as a bailout. We pull the original bytecode scroll from the shelf, and figure out which instruction we should start executing from. The machine code scroll disappears in a puff of smoke and the process starts again.

To infinity and beyond

Today’s high-performance JavaScript engines have evolved far beyond the relatively simple interpreters that shipped with Netscape Navigator and Internet Explorer in the 1990s. And that evolution continues. New features are incrementally added to the language. Common coding patterns are optimized. WebAssembly is maturing. A richer standard module library is being developed. As developers, we can expect modern JavaScript engines to deliver fast and efficient execution as long as we keep our bundle sizes in check and try to make sure our performance-critical code is not overly dynamic.


About the Author

No Comments (yet)