How bad is a buffer overflow in an Emscripten-compiled application?

Emscripten allows compiling C++ code to JavaScript. It is an interesting approach allowing porting large applications (games) and libraries (crypto) to the web relatively easily. It also promises better performance and memory usage for some scenarios (something we are currently looking into for Adblock Plus core). These beneficial effects largely stem from the fact that the “memory” Emscripten-compiled applications work with is a large uniform typed array. The side-effect is that buffer overflows, use-after-free bugs and similar memory corruption mistakes are introduced to JavaScript that was previously safe from them. But are these really security-relevant?

Worst-case scenario are obviously memory corruption bugs that can be misused in order to execute arbitrary code. At the first glance, this don’t seem to be possible here — even with Emscripten the code is still running inside the JavaScript sandbox and cannot escape. In particular, it can only corrupt data but not change any code because code is kept separately from the array serving as “memory” to the application. Then again, native applications usually cannot modify code either due to protection mechanisms of modern processors. So memory corruption bugs are typically abused by manipulating function pointers such as those found on the stack.

Now Emscripten isn’t working with return pointers on the stack. I could identify one obvious place where function pointers are found: virtual method tables. Consider the following interface for example:

class Database {
  virtual User* LookupUser(char* userName) = 0;
  virtual bool DropTable(char* tableName) = 0;
  ...
};

Note how both methods are declared with the virtual keyword. In C++ this means that the methods should not be resolved at compile time but rather looked up when the application is running. Typically, that’s because there isn’t a single Database class but rather multiple possible implementations for the Database interface, and it isn’t known in advance which one will be used (polymorphism). In practice this means that each subclass of the Database interface will have a virtual method table with pointers to its implementations of LookupUser and DropTable methods. And that’s the memory area an attacker would try to modify. If the virtual method table can be changed in such a way that the pointer to LookupUser is pointing to DropTable instead, in the next step the attacker might make the application try to look up user "users" and the application will inadvertently remove the entire table.

There are some limitations here coming from the fact that function pointers in Emscripten aren’t actual pointers (remember, code isn’t stored in memory so you cannot point to it). Instead, they are indexes in the function table that contains all functions with the same signature. Emscripten will only resolve the function pointer against a fixed function table, so the attacker can only replace a function pointer by a pointer to another function with the same signature. Note that the signature of the two methods above is identical as far as Emscripten is concerned: both have an int-like return value (as opposed to void, float or double), both have an int-like value as the first parameter (the implicit this pointer) and another int-like value as the second parameter (a string pointer). Given that most types end up as an int-like values, you cannot really rely on this limitation to protect your application.

But the data corruption alone can already cause significant security issues. Consider for example the following memory layout:

char incomingMessage[256];
bool isAdmin = false;

If the application fails to check the size of incoming messages properly, the data will overflow into the following isAdmin field and the application might allow operations that aren’t safe. It is even possible that in some scenarios confidential data will leak, e.g. with this memory layout:

char response[256];
char sessionToken[32];

If you are working with zero-terminated strings, you should be really sure that the response field will always contain the terminating zero character. For example, if you are using some moral equivalent of the _snprintf function in Microsoft Visual C++ you should always check the function return value in order to verify that the buffer is large enough, because this function will not write the terminating zero when confronted with too much data. If the application fails to check for this scenario, an attacker might trick it into producing an overly large response, meaning that the secret sessionToken field will be sent along with the response due to missing terminator character.

These are the problematic scenarios I could think of, there might be more. Now all this might be irrelevant for your typical online game, if you are only concerned about are cheaters then you likely have bigger worries — cheaters have much easier ways to mess with code that runs on their end. A website on the other hand, which might be handling data from a third-party site (typically received via URL or window.postMessage()), is better be more careful. And browser extensions are clearly endangered if they are processing website data via Emscripten-compiled code.

Comments

There are currently no comments on this article.