Emscripten's compiled Web Assembly, used minimally

Sam Thorogood - Jul 3 '18 - - Dev Community

(This is a technical counterpart to my post on using Wuffs to decode GIFs in the browser. If you'd just like to see fast GIF decoding, go here! 🖼️➡️👍)

Converting C/C++ to Web Assembly (WASM) isn't easy. The de-facto toolchain as of July 2018 is Emscripten, an amazing but challenging set of tools which does the work for us. When Emscripten runs on your code, it generates a few things:

  • the compiled .wasm file itself—containing your code plus a runtime
  • boilerplate .js alongside that
  • (optional) a .html harness for running the program

For an average build, the .js file—which is what you include to run WASM—can be 100k or so. And aside the obvious bloat, why is this a bad thing?

  • it adds a singleton Module variable to your global scope—not suitable for libraries, only for monolithic apps—the docs even call this out
  • calling JS back from WASM needs EM_ASM macros, which call global (!) JS methods–poor for modularizing code and hides JS idioms like .bind(this)
  • it performs its own window.fetch for the .wasm—not ideal in Node or if you want to contain everything in a single file 🙅
  • and, somewhat subjectively, it's quite hard to read/parse/modify.

To Be Fair

The boilerplate Emscripten generates isn't unreasonable—the Web Assembly format is actually quite simple, so even the concepts of having a 'heap', or dealing with strings or complex data types, are something compilers need to do. There's no inbuilt concepts here, and Emscripten goes a long way to get you moving fast. 😲

But everything it does could also be unwieldy. Here, I'm going to document it, so you can instantiate the .wasm yourself. But beware! This doesn't provide any of the EM_.. magic, like binding, exposing C++ objects in JavaScript, etc. This sugar is sometimes great, but it comes at a cost. And if you're wrapping simple C (or even C++, which you could wrap) then you don't need it.

First Steps

Let's build a simple C library that reads a PNG file and fill a passed struct with metadata, and calls JavaScript back via an extern method. This is a totally artificial demo which shows off some challenging tasks:

  • passing arbitrary sized data into WASM, as well as using malloc and free
  • dealing with C structs
  • calling back to JS
  • returning a string error description.

Save the above file to disk. To build, we can run emcc like this—note that we must include -O1, to remove some needless methods:

emcc -O1 -s WASM=1 -s EXPORTED_FUNCTIONS="['_parse_png','_malloc','_free']" png.c -o png.js
Enter fullscreen mode Exit fullscreen mode

Great! We now have png.wasm and png.js. For me, the JS is 62k, and the WASM is 10k. Actually, setting -O3 will bring this down quite a lot, but let's keep going with this for now.


Default, Minimal Harness

If you include png.js in a HTML file and load it from a local web server, you'll get access to the global Module var—which has a bunch of properties, including _parse_png. But for the reasons above, we don't want to use this boilerplate—it's too prescriptive. So here's the actual minimal JS needed:

const memoryPages = 256;
const memory = new WebAssembly.Memory({initial: memoryPages, maximum: memoryPages});
const stackTop = 2672;  // WARNING: Different per-program.
const env = {
  DYNAMICTOP_PTR: stackTop,
  STACKTOP: stackTop + 16,
  STACK_MAX: 1024 * 1024 * 5,
  abort() { throw new Error('abort'); },
  abortOnCannotGrowMemory() { throw new Error('abortOnCannotGrowMemory'); },
  enlargeMemory() { throw new Error('enlargeMemory'); },
  getTotalMemory() { return memory.buffer.byteLength; },
  ___setErrNo(v) { throw new Error('errno'); },
  _emscripten_memcpy_big(dst, src, num) {
    view.set(view.subarray(src, src + num), dst);
    return dst;
  },
  _chunk_callback() { /* our callback method */ },
  memory: memory,
};

// tell Emscripten's malloc where to start
(new Uint32Array(memory.buffer))[env.DYNAMICTOP_PTR >> 2] = env.STACK_MAX;

Promise.resolve(true).then(async () => {
  // use async so we can await for Promises to finish
  const buffer = await (await self.fetch('png.wasm')).arrayBuffer();
  const module = await WebAssembly.instantiate(buffer, {env});

  // save exports on window so you can debug them
  const exports = module.instance.exports;
  console.info('got exports', exports);
  return window._exports = exports;

  // ... but if you were writing a library, you'd continue using the Promise
}).catch((err) => console.error('oh no!', err));
Enter fullscreen mode Exit fullscreen mode

Great. So, let's start with the environment being passed to Web Assembly. It's not simple, like some "pure" WASM examples—that's because Emscripten's runtime expects a lot from us. The methods on the object are pretty self-explanatory:

  • abort, abortOnCannotGrowMemory, ___setErrNo are to deal with failures
  • enlargeMemory isn't implemented—when our code runs out of memory, it will crash
  • getTotalMemory does what it says on the tin
  • and _emscripten_memcpy_big implements the C method memcpy().

Your code might need more, depending if it calls other C methods. In many cases, like you see here, we can just throw an Error—they're often unexpected conditions that we don't really have to deal with.

But what about the properties above that—those ones with the magic numbers? So, they all revolve around memory, and if you don't have them correct, your WASM will probably fail. Let's learn about them.

Memory in Web Assembly

Diagram showing Web Assembly stack vs WebAssembly.Memory
Web Assembly has an inbuilt stack, at top: we can also provide it memory, at bottom

Web Assembly has two types of memory. First, it's inbuilt stack, used for local variables in methods—this isn't accessible by your JS. Secondly, the WebAssembly.Memory object, which most of these magic numbers refer to, and which can be used by the runtime (in this case Emscripten) in literally any way it wants.

memoryPages: Emscripten's fixed memory

In our boilerplate code, this variable is used to create a WebAssembly.Memory of this many 'pages'. By default, with no flags, Emscripten requests 16mb—each page is 65k, so that's 256 pages. (You can request more with e.g., -s TOTAL_MEMORY=32mb)

But turns out, our WASM file actually knows what it requires. You can load your WASM file in wasm2wat: look for a line like the following.

  (import "env" "memory" (memory $env.memory 256 256))
Enter fullscreen mode Exit fullscreen mode

Why this value is in the WASM file, yet I also need to specify it, I'll never know. Although if you know, please email me. 🤔💭

STACKTOP: Where the "stack" begins

This variable is a misnomer. As I mentioned above, Web Assembly has its own internal stack for its own calls—one that is not exposed in any memory we pass into the object. It has an intentionally ambiguous size, and it's only used for variables local to functions—and even then, only when they're one of the four primitive Web Assembly types (i32, i64, f32, f64).

So, what is the "stack" (quotes intentional), then? Well, it's generated by Emscripten, which uses it for allocating larger types (e.g. structs, char x[10] = "I'm long\n";). Emscripten also adds stack-related methods to its exports (e.g. exports.stackAlloc). How does it use this? Well, a decompiled method looks a bit like:

  (func $_test_stack (export "_test_stack") (type $t5) (param $p0 i32) (param $p1 i32) (result i32)
    (local $l0 i32) (local $l1 i32) (local $l2 i32) (local $l3 i32)
    (set_local $l1        # save stackTop
      (get_global $g4))
    (set_global $g4       # modify stackTop while in this function
      (i32.add
        (get_global $g4)
        (i32.const 32)))  # we want 32 bytes of stack
    #
    # ... rest of function removed
    # 
    (set_global $g4       # restore old stackTop
      (get_local $l1))
    (get_local $p0))      # return value
Enter fullscreen mode Exit fullscreen mode

The stackTop variable is stored in $g4, so we store it for ourselves into $l1, modify it to add 32 bytes, and then continue. We restore it at the end. During the method, we now have free reign over these 32 bytes.

But why 2672 (plus 16)?

The value of stackTop in my example—2672—is the value that was built into Emscripten's JS harness (you can find it by searching for STATICTOP = STATIC_BASE +—STATIC_BASE is 1024, so just add the numbers together).

(We add 16 to 2672, but ignore this for now. I'll explain below under DYNAMICTOP_PTR.)

This is a number that's bigger than the constants in the Web Assembly program (1), plus room for Emscripten's malloc() implementation to work (2), storing fixed information seemingly about what's currently been allocated.

a. Your program will have a fixed size of constants, and you can discover this by looking at the wasm2wat output again: at the very bottom of the file, we see our data section. Look for the last line (although in our example, we only have one).

  (data (i32.const 1024) "\89PNG\0d\0a\1a\0ainvalid header\00chunk too large\00IHDR\00IHDR has wrong size\00couldn't malloc for copy\00header not found"))
Enter fullscreen mode Exit fullscreen mode

This says that at 1024, we have roughly ~120 bytes of data. (The last string isn't null-terminated, because all memory starts off as NULL).

b. How much space does malloc() require? This is unclear, and seemingly an implementation detail of Emscripten. The values it uses are actually inlined into the generated WASM code, so there's no way to configure it.

Unfortunately, the best way to find out our magic constant is to look into the generated .js, as above. If you want to wing it though, I'd suggest looking at your constants, and maybe adding ~10k of buffer just to be safe.

STACK_MAX: Where the "stack" ends

This tells Emscripten where to stop allocating larger types on the "stack". It's a fixed number controlled by a compile-time flag, -s TOTAL_STACK=....

However, this isn't actually checked unless you enable assertions, passing -s ASSERTIONS=1 to your compile. Without this, you can blow the stack and allocate into the heap.

DYNAMICTOP_PTR: Where the heap lives

Emscripten's implementation of malloc will begin its allocation at the value pointed to by DYNAMICTOP_PTR. If you look at our boilerplate, we set a value:

// tell Emscripten's malloc where to start
(new Uint32Array(memory.buffer))[env.DYNAMICTOP_PTR >> 2] = env.STACK_MAX;
Enter fullscreen mode Exit fullscreen mode

This sets, at DYNAMICTOP_PTR (we divide it by four, as we're indexing 32-bit ints), the ending point of Emscripten's "stack". Everything after here will be used for malloc.

Emscripten puts this value in its generated .js code, at the top of the stack. If you were wondering why we set DYNAMICTOP_PTR to stackTop, but then STACKTOP to stackTop + 16, this is why—we just give it 16 bytes to play with (which is obviously more than the four it needs for a 32-bit number).

Notably, this value is read at instantiation time and used as the 'bottom' of the heap. So you must set its value before you call WebAssembly.instantiate. Emscripten's runtime will set the value later, so you can examine the value at DYNAMICTOP_PTR to see how much the heap has grown (aka how much your program has malloced).

Function Table

Some builds require you to pass what's known as a table import. There's some great posts about what this is, but effectively it's a safe way to specify function pointers—in a way that wouldn't be possible if they shared the WebAssembly.Memory object.

Unfortunately, if any of the code you use or import use function pointers, even just as an implementation detail, you'll need to provide a table (one quick way to demonstrate this is to add a call to sprintf()).

If so, update the env to add something like:

const env = {
  // ... rest
  table: new WebAssembly.Table({initial: 2, maximum: 2, element: 'anyfunc'}),
  tableBase: 0,
}
Enter fullscreen mode Exit fullscreen mode

You'll need to modify this until WASM is happy. Again, this information is in the compiled WASM format, so it's not clear why we need to specify it, but here we are!


We've digressed enormously, to explain the requirements of setting up Web Assembly with Emscripten. But what about actually using our method? Well, let's now do it.

That code we compiled has two interesting methods on its exports: _parse_png, and _malloc. Let's actually use them in JS. We can add an extra .then to our setup:

.then((exports) => {
  const b64PNG = 'iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAACklEQVR4nGMAAQAABQABDQottAAAAABJRU5ErkJggg==';
  const buf = Uint8Array.from(atob(b64PNG), (c) => c.charCodeAt(0));
  const at = exports._malloc(buf.length);
  const view = new Uint8ClampedArray(memory.buffer);
  view.set(buf, at);

  const infoAt = exports._malloc(4 * 3);  // alloc space for info_t, 3x4-byte int
  const outStringAt = exports._parse_png(at, buf.length, infoAt);
  console.info('got output:', readString(outStringAt));
})
Enter fullscreen mode Exit fullscreen mode

And update our chunk_callback method from the environment:

  _chunk_callback(type, len, dataAt) {
    console.info('chunk', type, len, dataAt);
  },
Enter fullscreen mode Exit fullscreen mode

Strings

The _parse_png method returns the address of a char * from the consts section of the WASM program if an error occurs. We need a readString method to pull that out—finding the first NULL byte and converting it to a JS string—using TextDecoder:

function readString(start) {
  const view = (new Uint8ClampedArray(memory.buffer)).slice(start);
  let end = 0;
  while (view[end++]) {}
  return (new TextDecoder()).decode(view.slice(0, end));
}
Enter fullscreen mode Exit fullscreen mode

If we try it out but muck with the base64-encoded image, we'll now see an error like this, containing a string:

Error message with invalid header

Of course, if we do it right, you'll see something like this, detailing the chunks in the image (and no string, as we return NULL):

Successful read of chunks

Structs

In the sample code above, we allocate space for the info_t struct defined in the program. This is probably the most awkward part of not using Emscripten's generated JS code. Because our struct has three int values, we know that it takes up 12 bytes of memory, so we can malloc that space:

  const infoAt = exports._malloc(4 * 3);  // alloc space for info_t, 3x4-byte int
Enter fullscreen mode Exit fullscreen mode

To read our individual values, we need to load them using a view:

  const infoAt = exports._malloc(4 * 3);  // alloc space for info_t, 3x4-byte int
  const outStringAt = exports._parse_png(at, buf.length, infoAt);
  console.info('got output:', readString(outStringAt));

  // .. add this code
  const infoView = (new Uint32Array(memory.buffer)).slice(infoAt >> 2, 3);
  console.info('count', infoView[0]);   // contains 'chunks'
  console.info('width', infoView[1]);   // contains 'w'
  console.info('height', infoView[2]);  // contains 'h'
Enter fullscreen mode Exit fullscreen mode

The three value are in order as defined in the C struct. And any native types (int32 or int64) will be packed on 4 or 8 byte boundaries. Although as an aside, using int64 with JavaScript is a challenge—the default JS number type can't accurately represent it (as it's a 64-bit floating point number). Try not to pass these across the JS/WASM boundary.

Information about structs is the sort of thing that Emscripten can generate for us at compile-time: helpers that literally convert memory to a nice, friendly JS structure containing our three values with names. But for most libraries, you're going to need this only a few times, so it's not infeasible to write manually.


Further Thoughts

This post came out of building fastgif as a small, fast library for decoding GIFs. I wanted to use Emscripten, but it's generated code was too bloated and didn't work well to provide a compartmentalized ES6 module.

Fixed WASM

The work you do to hand-write your JS around a compiled bit of WASM is 'fixed'. The WASM has no external dependencies: Emscripten is something you use to build, and the code it generates doesn't change over time.

In fastgif, we have a hand-written env object that we pass to WebAssembly.instantiate, just like the one we detailed above. Once it works though—it doesn't need to be changed, because WASM is so limited and the core language doesn't change over time.

Promise-based

It's worth calling out that any library that uses Web Assembly must be based on a Promise or some async work, because fundamentally WebAssembly.instantiate returns a Promise.

Fastgif solves this by making the API itself Promise-based, depending first on the instantiation before doing any further work. By accepting that all our APIs are async, it could also allow us to work more freely with other, dependent APIs.

  decode(buffer) {
    return this._exports.then((exports) => {
      const buf = new Uint8Array(buffer);
      const at = exports._malloc(buf.length);
      // more stuff
      return result;
    });
  }
Enter fullscreen mode Exit fullscreen mode

Size 🗜️ + Shipping ⛴️

By dropping the JS generated by Emscripten, we can reduce the size of code you ship to clients—that's been one of the main themes of this post. Fastgif also ships with just one JS file, by encoding the WASM code itself in base64 and decoding it at creation time.

Without the WASM code itself, the JS wrapped for fastgif is about ~4.3k (uncompressed). With the WASM code bundled as base64, it's ~44k (uncompressed) or ~20k (compressed)—and if we encoded in say, base128 or base192, the size could be even smaller.

This is a bit smaller than default Emscripten output of total ~70k for our demo program, above. And, our approach of including the code in the JS—even though base64 adds overhead—means that we only have to do one network request to fetch our library, and bundlers can be happier as they're not trying to include this random related .fetch-ed resource.


Fin

If you've got this far, I'm proud of you! This has been a very long post about some of Emscripten's internals as of July 2018. 😴

Why did I write this post? Emscripten is a wonderful tool, but it has a long history (for asm.js), and isn't perfect. I think it errs too much on the side of "magic", and many posts rave about how it's so easy to EM_ASM_ or use binding-fu, but this all comes at a cost, and can introduce huge amounts of inadvertent overhead—think copying huge memory buffers around because we're trying to make them immutable or easily exposed.

Every language that is being compiled to Web Assembly needs a runtime—whether it be Go, or Rust, or C/C++ as we have here. I don't believe that we'll ever really be able to directly import Web Assembly via ES2015 modules, at least not without changes on the JS side. But it behooves us to write the smallest one we possibly can.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .