Compiling C++ to JavaScript: Emscripten vs. Cheerp

Your JavaScript code is slow or needs too much memory? No problem, just rewrite it in C++ and compile back to JavaScript — you will get much better performance and the code will still run in any browser (or Node.js). Well, at least that’s what C++ to JavaScript compilers like Emscripten and Cheerp promise you. And often they can deliver, primarily thanks to heavy usage of typed arrays which allow modern JavaScript engines to optimize the resulting code much better than more traditional JavaScript. Also, the code is already preoptimized, with the C++ compiler recognizing calculations yielding constant results as well as inlining short functions.

I tried both Emscripten and Cheerp but the following isn’t exactly a fair comparison. For one, I spent much more time learning Emscripten than Cheerp, so I might have missed some Cheerp tweaks. Then again, I might have missed some Emscripten tweaks as well as I am by no means an expert in it. If you are still interested, enjoy the reading!

Does it work?

Sometimes. Of course, no amount of automated optimization will save you if you are using inefficient algorithms — these will be slow/memory intensive no matter what programming language you use. Also, my experience was that STL container classes produce too many memory allocations and copying operations to be fast (your mileage may vary). But there are plenty of areas where JavaScript isn’t the fastest, string processing being one example.

I tried calculating the FNV-1a hash, here is the corresponding C++ code:

uint32_t hashCalc(int index)
{
  const char* buffer = (index ? buffer1 : buffer2);
  uint32_t result = 2166136261;
  for (int i = 0; i < std::strlen(buffer); i++)
    result = (result ^ buffer[i]) * 16777619;
  return result;
}

Note that I am choosing from two strings here, depending on the parameter — otherwise the C++ compiler would happily optimize away the entire calculation since it is working on constant input only anyway. The same code in JavaScript would look like this:

function hashCalc(index)
{
  var buffer = (index ? buffer1 : buffer2);
  var result = 2166136261;
  for (var i = 0; i < buffer.length; i++)
    result = Math.imul(result ^ buffer.charCodeAt(i), 16777619);
  return result;
}

Here Cheerp and Emscripten are roughly equal, with Emscripten being slightly faster but with the difference is almost too small to measure. Regular JavaScript code on the other hand takes around 2.5 times more time, the charCodeAt() calls being slow compared to accessing typed arrays.

Calling C++ from JavaScript

Of course, the C++ code above was only so fast because it already had the string in a typed array. What if it had to take a JavaScript string and convert it to a typed array first? That conversion would likely eat away the entire performance advantage. This is one important lesson I learned: if your compiled C++ code is being accessed from regular JavaScript then the transition better be extremely efficient or it will become the major factor slowing you down.

Emscripten offers multiple ways of exposing an API to the JavaScript code. Out of these, Embind is the most convenient. With a tiny bit of C++ interface description your C++ classes will automagically appear in JavaScript, and even smart pointers are handled transparently (you have to call .delete() in JavaScript code). This convenience comes at massive cost however: Embind generates the JavaScript bindings at runtime. In the first consequence, this increases the size of your compiled code significantly. All the Embind classes are compiled in, and they need enough information to create the JavaScript bindings. The bindings themselves are generated by calling eval() (problematic for Chrome extensions for example) and cost performance (20% on top of the direct call performance measured above).

The other proposed solution is the WebIDL binder which will take an interface definition in WebIDL format and compile it into C++ and JavaScript code that you can then add to your application. This approach should be more efficient but also makes compiling the application more complicated. I didn’t really try it however because I noticed that it maps DOMString to std::string, converting all JavaScript strings to UTF-8.

You can always go with low-level calls however. Adding EMSCRIPTEN_KEEPALIVE to a function or method declaration will automatically export it, meaning that it can be called from JavaScript. You have to figure out the mangled name of the function yourself (easiest solution: extern "C" to disable mangling) and you have to keep any eye on calling conventions but it is very fast and efficient.

As to Cheerp, there appears to be only one way to call from JavaScript into C++, by specifying the [[cheerp::jsexport]] attribute on a class. It’s really that simple, you don’t need to do anything in addition and the generated bindings seem pretty efficient. The downside: the restrictions are currently severe. You cannot have a destructor or virtual methods, and static methods are dropped silently. Private methods on the other hand are happily exported. It seems that the only way to call a function without instantiating a class is declaring it as a non-static member of an exported class, not accessing this anywhere and calling it as Class.prototype.myFunction.

Memory model

C++ isn’t meant to run in a browser, it assumes some continuous address space that it accesses. Memory blocks are allocated, used, then freed. How is that mapped to JavaScript? The approaches used by Emscripten and Cheerp are radically different.

Emscripten actually allocates a single continuous typed array to serve as the application’s memory. This makes it relatively easy to map C++ concepts to JavaScript: pointers represent an offset in the typed array, memory accesses are represented as accesses to this typed array. The downside is that Emscripten has to allocate a fairly large buffer (16 MB by default) from the start so that the application doesn’t run out of memory. The ALLOW_MEMORY_GROWTH parameter makes sure that the amount of available memory can increase but this is a slow operation (new typed array has to be allocated and the existing contents copied into it). But you get very low-level control over the memory allocations and powerful profiling functionality for that.

Cheerp on the other hand attempts to map C++ data structures to JavaScript data structures. So creating a C++ class instance will instantiate a JavaScript object. Class properties become properties of that JavaScript object, with pointers typically becoming references to a typed array. The obvious side-effect is that Cheerp is unable to reflect all C++ concepts properly, especially when it comes to pointers, e.g. the Cheerp-specific warning “Using values casted to unrelated types is undefined behaviour unless the destination type is the actual type of the value” comes up whenever pointers are casted. This function illustrates the problem nicely:

uint16_t sum(uint32_t dword)
{
  uint16_t* ptr = reinterpret_cast<uint16_t*>(&dword);
  return ptr[0] + ptr[1];
}

Compiling this with Emscripten and calling sum(0x10002) will correctly interpret the parameter as two 16-bit integers and return their sum (meaning 3). With Cheerp on the other hand &dword creates an array with dword as its only element. reinterpret_cast<uint16_t*> then merely changes the way array elements are interpreted so when we say ptr[0] we get the first element of the array trimmed down to 16 bits. ptr[1] yields 0 on the other hand, there is no second element.

So while Emscripten is often used to bring existing C/C++ code to the web, doing the same with Cheerp is very unlikely to succeed: the limitations require writing custom, Cheerp-optimized code. Also, the reliance on JavaScript objects means that as soon as classes or pointers are involved the performance of Cheerp drops drastically. This can be illustrated with the following code:

uint32_t doSomething(int index)
{
  std::string str(index ? buffer1 : buffer2);
  str.append("something to add");
  str.erase(2, 12);
  return str.length();
}

Here Emscripten is four times faster than Cheerp. The equivalent JavaScript code is even ten times faster by the way: std::string will allocate a memory buffer on the heap and later reallocate it again, only to free it immediately. This doesn’t perform well, a stack-allocated string class would be able to achieve a much better performance here.

Code size

Something else worth mentioning is the code size. Cheerp seems to be the clear winner here, a minimal application compiles to a few hundred bytes. Emscripten on the other hand needs 150 kB even for the most trivial application, at least with the default parameters. Once you specify the NO_BROWSER=1, NO_FILESYSTEM=1, and NODE_STDOUT_FLUSH_WORKAROUND=0 options things start to look better, and it once again gets better if you use the SHELL_FILE parameter to specify your own shell file. However, the code size will always be counted in tens of kilobytes.

For larger applications the distance gets much smaller. Compiling my Emscripten codebase with Cheerp created a JavaScript file with almost the same size, and I’m not even sure that part of the code wasn’t optimized away. So it seems that for a large application the Emscripten output will be smaller than Cheerp’s.

Hey, what about asm.js?

Oh, right, Emscripten compiles to asm.js. This means that JavaScript engines understanding asm.js (currently this refers only to Firefox but Google Chrome and Microsoft Edge want to add support soon) should be able to process it more efficiently. However, support for growing available memory has been dropped from asm.js recently. I don’t know about you but the applications I typically work on don’t know in advance how much memory is “enough.” I don’t really want to choose between crashing because the user loaded too much data and allocating way too much memory that merely sits there unused for almost everybody. So I use Emscripten with the ASM_JS=2 parameter which makes it emit "almost asm" instead of "use asm". And guess what? At least in my case it doesn’t seem to make a difference.

Bonus reading: what does the code look like?

The generated code can be tough to read, but it’s definitely simpler to understand than assembler. Here I want to show the output of a single code line:

std::string str(index ? buffer1 : buffer2);

Here is Emscripten’s output, beautified and with variables and functions renamed for reading convenience:

var savedStackPointer = 0,
    str = 0;
savedStackPointer = stackPointer;
stackPointer = stackPointer + 16 | 0;
str = savedStackPointer;
index = (index | 0) != 0;
basic_string__init(str, index ? 896 : 921, index ? 24 : 30);
// template <class _CharT, class _Traits, class _Allocator>
// void
// basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz) 
function basic_string__init(this_, __s, __sz)
{
    this_ = this_ | 0;
    __s = __s | 0;
    __sz = __sz | 0;
    // pointer __p; 
    var __p = 0,
        __cap = 0;
    // if (__sz > max_size()) this->__throw_length_error(); 
    if (__sz >>> 0 > 4294967279) __throw_length_error(this_);
    // if (__sz < __min_cap) 
    if (__sz >>> 0 < 11)
    {
        // __set_short_size(__sz);
        HEAP8[this_ >> 0] = __sz << 1;
        // __p = __get_short_pointer();
        this_ = this_ + 1 | 0
    }
    else
    {
        // size_type __cap = __recommend(__sz);
        __cap = __sz + 16 & -16;
        // __p = __alloc_traits::allocate(__alloc(), __cap+1);
        __p = bb(__cap) | 0;
        //__set_long_pointer(__p); 
        HEAP32[this_ + 8 >> 2] = __p;
        // __set_long_cap(__cap+1);
        HEAP32[this_ >> 2] = __cap | 1;
        // __set_long_size(__sz);
        HEAP32[this_ + 4 >> 2] = __sz;
        this_ = __p
    }
    // traits_type::copy(_VSTD::__to_raw_pointer(__p), __s, __sz);
    rc(this_ | 0, __s | 0, __sz | 0) | 0;
    // traits_type::assign(__p[__sz], value_type());
    HEAP8[this_ + __sz >> 0] = 0;
    return
}

No big surprises here, C++ code was mapped to JavaScript almost 1:1. String length parameter has been precalculated, a few calls have been inlined, and the this_ parameter reused as a local variable. But the original code is relatively easy to recognize if you compare.

And now the Cheerp code:

var str = aSlot = {
    i0: 0,
    i1: 0,
    a2: nullArray
};
basic_string__init(str, (((index >> 0) !== 0) ? buffer1 : buffer2),
                   (((index >> 0) !== 0) ? 0 >> 0 : 0 >> 0) >> 0,
                   (((index >> 0) !== 0) ? 24 : 30) >> 0);
// template <class _CharT, class _Traits, class _Allocator>
// void
// basic_string<_CharT, _Traits, _Allocator>::__init(const value_type* __s, size_type __sz)
function basic_string__init(this_, __s, __offset, __sz)
{
    var label = 0;
    // if (__sz > max_size()) this->__throw_length_error();
    // size_type __cap = __recommend(__sz);
    if (((__sz >>> 0) < 11))
        var a = 11;
    else
        var a = (((__sz >> 0) + (16 >> 0) >> 0) & -16);
    // __p = __alloc_traits::allocate(__alloc(), __cap+1);
    var __p = new Uint8Array(a / 1 >> 0);
    // __set_long_pointer(__p);
    this_.a2 = __p;
    // __set_long_cap(__cap+1); 
    this_.i0 = ((a | 1) >> 0);
    // __set_long_size(__sz);
    this_.i1 = (__sz >> 0);
    // traits_type::copy(_VSTD::__to_raw_pointer(__p), __s, __sz);
    if (!(((__sz >> 0) = 0)))
    {
        var a = 0;
        while (1)
        {
            __p[(0 >> 0) + (a >> 0) >> 0] = ((__s[(__offset >> 0) + (a >> 0) >> 0] & 255) & 255);
            var a = ((a >> 0) + (1 >> 0) >> 0);
            if (((a >> 0) = (__sz >> 0)))
                break;
        }
    }
    // traits_type::assign(__p[__sz], value_type());
    __p[(0 >> 0) + (__sz >> 0) >> 0] = 0;
    return;
}

One thing that got me confused at first was the additional __offset parameter (at least that’s how I named it). Eventually I realized that each pointer is represented as two variables for Cheerp: one holding the reference to a typed array and another denoting the offset inside that array. So the initial __s parameter got converted into two parameters in the generated code. This conversion of pointers into two variables must have happened after the optimization step, otherwise the optimizer would have recognized that (((index >> 0) !== 0) ? 0 >> 0 : 0 >> 0) >> 0 is always zero and removed the pointless parameter from the function altogether.

The major difference however is the this_ parameter: it’s a real JavaScript object with the fields i0 and i1 representing string capacity and length respectively, and field a2 a reference to the typed array holding the data. While I didn’t measure the impact of it, I have a strong suspicion that this approach of creating many small JavaScript objects is wasting lots of memory.

There are some minor differences worth noting: the branch allocating the string buffer on the stack is missing, Cheerp removed it from the include file (it makes little sense without a stack concept). Also, variable a is pointlessly declared four times throughout the function. It’s a common tendency with Cheerp-generated code but apparently it gets away with this despite declaring "use strict".

Comments

Venkataramanan S 2016-02-06 17:37

Good and helpful analysis. Did you verify the compilation times on larger source? It would be interesting to get experimental data on how the compilation time on both vary by source size.

Wladimir Palant

Unfortunately – no. The large codebase wasn’t really usable with Cheerp so impossible to compare.