Buckets:
| %\documentclass[11pt]{proc} | |
| \documentclass[preprint,10pt]{sigplanconf} | |
| \usepackage{amsmath} | |
| \usepackage{url} | |
| \usepackage{graphicx} | |
| \begin{document} | |
| \title{Emscripten: An LLVM-to-JavaScript Compiler} | |
| \conferenceinfo{Splash '11}{??-2011, Portland.} | |
| \copyrightyear{2011} | |
| \copyrightdata{[to be supplied]} | |
| \titlebanner{} % These are ignored unless | |
| \preprintfooter{} % 'preprint' option specified. | |
| \authorinfo{Alon Zakai} | |
| {Mozilla} | |
| {azakai@mozilla.com} | |
| \maketitle | |
| %\title{Emscripten: An LLVM-to-JavaScript Compiler} | |
| %\subtitle{} | |
| %\authorinfo{Alon Zakai} | |
| % {Mozilla} | |
| % {azakai@mozilla.com} | |
| %\author{Alon Zakai \\ Mozilla \\ \url{azakai@mozilla.com}} | |
| %\maketitle | |
| \begin{abstract} | |
| We present Emscripten, a compiler from LLVM (Low Level Virtual Machine) assembly to JavaScript. This | |
| opens up two avenues for running code written | |
| in languages other than JavaScript on the web: (1) Compile code directly into LLVM assembly, and | |
| then compile that into JavaScript using Emscripten, or (2) Compile | |
| a language's entire runtime into LLVM and then JavaScript, as in the previous | |
| approach, and then use the compiled runtime to run code written in that language. For example, the | |
| former approach can work for C and C++, while the latter can work for Python; all three | |
| examples open up new opportunities for running code on the web. | |
| Emscripten itself is written in JavaScript and is available under the MIT | |
| license (a permissive open source license), at \url{http://www.emscripten.org}. | |
| As a compiler from LLVM to JavaScript, the challenges in designing | |
| Emscripten are somewhat the reverse of the norm -- one must go from a low-level | |
| assembly into a high-level language, and recreate parts of the original | |
| high-level structure of the code that were lost in the compilation to | |
| low-level LLVM. We detail the methods used in | |
| Emscripten to deal with those challenges, and in particular present and prove | |
| the validity of Emscripten's Relooper | |
| algorithm, which recreates high-level loop structures from low-level | |
| branching data. | |
| \end{abstract} | |
| %\category{CR-number}{subcategory}{third-level} | |
| %\terms | |
| %term1, term2 | |
| %\keywords | |
| %keyword1, keyword2 | |
| \bigskip | |
| %\copyright 2011 Alon Zakai. License: Creative Commons Attribution-ShareAlike (CC BY-SA), \url{http://creativecommons.org/licenses/by-sa/3.0/} | |
| \section{Introduction} | |
| Since the mid 1990's, JavaScript~\cite{js} has been present in most web browsers (sometimes | |
| with minor variations and under slightly different names, e.g., JScript in Internet | |
| Explorer), and today it is | |
| well-supported on essentially all web browsers, from desktop browsers like | |
| Internet Explorer, Firefox, Chrome and Safari, to mobile browsers on smartphones | |
| and tablets. Together with HTML and CSS, JavaScript forms the standards-based | |
| foundation of the web. | |
| Running other programming languages on the web has been suggested many times, | |
| and browser plugins have allowed doing so, e.g., via the Java | |
| and Flash plugins. However, plugins must be manually installed and do not integrate in | |
| a perfect way with the outside HTML. Perhaps more problematic is that they cannot run | |
| at all on some platforms, for example, Java and Flash cannot run on iOS devices such as the iPhone | |
| and iPad. For those reasons, JavaScript remains | |
| the primary programming language of the web. | |
| There are, however, reasonable motivations for running code from | |
| other programming languages on the web, for example, if one has a large | |
| amount of existing code already written in another language, or if one | |
| simply has a strong preference for another language and perhaps is | |
| more productive in it. As a consequence, there has been work on tools to compile languages | |
| \textbf{into} JavaScript. Since JavaScript is present in essentially all web | |
| browsers, by compiling one's language of choice into JavaScript, one | |
| can still generate content that will run practically everywhere. | |
| Examples of the approach of compiling into JavaScript include | |
| the Google Web Toolkit~\cite{gwt}, which compiles Java into JavaScript; | |
| Pyjamas\footnote{\url{http://pyjs.org/}}, which compiles Python into JavaScript; | |
| SCM2JS \cite{hop}, which compiles Scheme to JavaScript, | |
| Links \cite{links}, which compiles an ML-like language into JavaScript; | |
| and AFAX \cite{afax}, which compiles F\# to JavaScript; | |
| see also \cite{ashkenas} for additional examples. | |
| While useful, such tools usually only allow a subset of the original language to | |
| be compiled. For example, multithreaded code (with shared memory) is | |
| not possible on the web, so compiling code of that sort is | |
| not directly possible. There are also often limitations of the conversion | |
| process, for example, Pyjamas compiles Python to JavaScript in a nearly | |
| 1-to-1 manner, and as a consequence the underlying semantics are those of JavaScript, | |
| not Python, so for example division of integers can yield unexpected results | |
| (it should yield an integer in Python 2.x, | |
| but in JavaScript and in Pyjamas a floating-point number can be generated). | |
| In this paper we present another project along those lines: \textbf{Emscripten}, | |
| which compiles LLVM (Low Level Virtual Machine\footnote{\url{http://llvm.org/}}) assembly into JavaScript. | |
| LLVM is a compiler project primarily focused on C, C++ and | |
| Objective-C. It compiles those languages through a \emph{frontend} (the | |
| main ones of which are Clang and LLVM-GCC) into the | |
| LLVM intermediary representation (which can be machine-readable | |
| bitcode, or human-readable assembly), and then passes it | |
| through a \emph{backend} which generates actual machine code for a particular | |
| architecture. Emscripten plays the role of a backend which targets JavaScript. | |
| By using Emscripten, potentially many languages can be | |
| run on the web, using one of the following methods: | |
| \begin{itemize} | |
| \item Compile \textbf{code} in a language recognized by one of the existing LLVM frontends | |
| into LLVM, and then compile that | |
| into JavaScript using Emscripten. Frontends for various languages | |
| exist, including many of the most popular programming languages such as C and | |
| C++, and also various new and emerging languages (e.g., Rust\footnote{\url{https://github.com/graydon/rust/}}). | |
| \item Compile the \textbf{runtime} used to parse and execute code in | |
| a particular language into LLVM, then compile that into JavaScript using | |
| Emscripten. It is then possible to run code in that runtime on the web. | |
| This is a useful approach if | |
| a language's runtime is written in a language for which an LLVM | |
| frontend exists, but the language itself has no such frontend. For | |
| example, there is currently no frontend for Python, however | |
| it is possible to compile CPython -- the standard implementation of | |
| Python, written in C -- into JavaScript, and run Python code on that | |
| (see Section~\ref{sec:examples}). | |
| \end{itemize} | |
| From a technical standpoint, one challenge in designing and implementing | |
| Emscripten is that it compiles a low-level language -- LLVM assembly -- into | |
| a high-level one -- JavaScript. This is somewhat the reverse of the usual | |
| situation one is in when building a compiler, and leads to some unique | |
| difficulties. For example, to get good performance in JavaScript one must | |
| use natural JavaScript code flow structures, like loops and ifs, but | |
| those structures do not exist in LLVM assembly (instead, what is present | |
| there is a `soup of code fragments': blocks of code with branching information | |
| but no high-level structure). | |
| Emscripten must therefore reconstruct a high-level | |
| representation from the low-level data it receives. | |
| In theory that issue could have been avoided by compiling a higher-level | |
| language into JavaScript. For example, if compiling Java into JavaScript | |
| (as the Google Web Toolkit does), then one can benefit from the fact | |
| that Java's loops, ifs and so forth generally have a very direct parallel | |
| in JavaScript. But of course the downside in that approach is it yields a | |
| compiler only for Java. In Section~\ref{sec:relooper} | |
| we present the `Relooper' algorithm, which generates high-level loop structures from the low-level | |
| branching data present in LLVM assembly. It is similar to loop recovery algorithms used in decompilation | |
| (see, for example, \cite{Cifuentes98assemblyto}, \cite{pro97}). | |
| The main difference between the Relooper and standard loop recovery algorithms | |
| is that the Relooper generates loops in a different language than that which was compiled originally, whereas | |
| decompilers generally assume they are returning to the original language. The Relooper's | |
| goal is not to accurately recreate the original source code, but rather to generate | |
| native JavaScript control flow structures, which can then be implemented | |
| efficiently in modern JavaScript engines. | |
| Another challenge in Emscripten is to maintain accuracy (that is, to | |
| keep the results of the compiled code the same as the original) | |
| while not sacrificing performance. LLVM assembly | |
| is an abstraction of how modern CPUs are programmed for, and its basic | |
| operations are not all directly possible in JavaScript. For example, if in | |
| LLVM we are to add two unsigned 8-bit numbers $x$ and $y$, with overflowing (e.g., 255 | |
| plus 1 should give 0), then there is no single operation in JavaScript which | |
| can do this -- we cannot just write $x+y$, as that would use the normal JavaScript | |
| semantics. It is possible to emulate a CPU in JavaScript, however doing so | |
| is very slow. Emscripten's approach is to allow such emulation, but to try to | |
| use it as little as possible, and to provide tools that help one find out | |
| which parts of the compiled code actually need such full emulation. | |
| We conclude this introduction with a list of this paper's main contributions: | |
| \begin{itemize} | |
| \item We describe Emscripten itself, during | |
| which we detail its approach in compiling LLVM into JavaScript. | |
| \item We give details of Emscripten's Relooper algorithm, mentioned earlier, which generates | |
| high-level loop structures from low-level branching data, and prove | |
| its validity. | |
| \end{itemize} | |
| In addition, the following are the main contributions of Emscripten | |
| itself, that to our knowledge were not previously possible: | |
| \begin{itemize} | |
| \item It allows compiling a very large subset of C and C++ code into | |
| JavaScript, which can then be run on the web. | |
| \item By compiling their runtimes, it allows running languages such as Python | |
| on the web (with their normal semantics). | |
| \end{itemize} | |
| The remainder of this paper is structured as follows. In Section~\ref{sec:compapp} we | |
| describe the approach Emscripten takes to compiling LLVM assembly into JavaScript, | |
| and show some benchmark data. | |
| In Section~\ref{sec:emarch} we describe Emscripten's internal design and in | |
| particular elaborate on the Relooper algorithm. | |
| In Section~\ref{sec:examples} we give several example uses of | |
| Emscripten. In Section~\ref{sec:summary} we summarize and give directions for future | |
| work. | |
| \section{Compilation Approach} | |
| \label{sec:compapp} | |
| Let us begin by considering what the challenge is, when we want to compile LLVM assembly | |
| into JavaScript. Assume we are given the | |
| following simple example of a C program: | |
| \begin{verbatim} | |
| #include <stdio.h> | |
| int main() | |
| { | |
| int sum = 0; | |
| for (int i = 1; i <= 100; i++) | |
| sum += i; | |
| printf("1+...+100=%d\n", sum); | |
| return 0; | |
| } | |
| \end{verbatim} | |
| This program calculates the sum of the integers from 1 to 100. When | |
| compiled by Clang, the generated LLVM | |
| assembly code includes the following: | |
| \label{code:examplellvm} | |
| \begin{verbatim} | |
| @.str = private constant [14 x i8] | |
| c"1+...+100=%d\0A\00" | |
| define i32 @main() { | |
| %1 = alloca i32, align 4 | |
| %sum = alloca i32, align 4 | |
| %i = alloca i32, align 4 | |
| store i32 0, i32* %1 | |
| store i32 0, i32* %sum, align 4 | |
| store i32 1, i32* %i, align 4 | |
| br label %2 | |
| ; <label>:2 | |
| %3 = load i32* %i, align 4 | |
| %4 = icmp sle i32 %3, 100 | |
| br i1 %4, label %5, label %12 | |
| ; <label>:5 | |
| %6 = load i32* %i, align 4 | |
| %7 = load i32* %sum, align 4 | |
| %8 = add nsw i32 %7, %6 | |
| store i32 %8, i32* %sum, align 4 | |
| br label %9 | |
| ; <label>:9 | |
| %10 = load i32* %i, align 4 | |
| %11 = add nsw i32 %10, 1 | |
| store i32 %11, i32* %i, align 4 | |
| br label %2 | |
| ; <label>:12 | |
| %13 = load i32* %sum, align 4 | |
| %14 = call i32 (i8*, ...)* | |
| @printf(i8* getelementptr inbounds | |
| ([14 x i8]* @.str, i32 0, i32 0), | |
| i32 %13) | |
| ret i32 0 | |
| } | |
| \end{verbatim} | |
| At first glance, this may look more difficult to translate into | |
| JavaScript than the original C++. However, compiling C++ in | |
| general would require writing code to handle preprocessing, | |
| classes, templates, and all the idiosyncrasies and complexities | |
| of C++. LLVM assembly, while more verbose in this example, is | |
| lower-level and simpler to work on. Compiling it also has the benefit we | |
| mentioned earlier, which | |
| is one of the main goals of Emscripten, that it allows many languages can | |
| be compiled into LLVM and not just C++. | |
| A detailed overview of LLVM assembly is beyond our scope here (see \url{http://llvm.org/docs/LangRef.html}). Briefly, | |
| though, the example assembly above can be seen to define a | |
| function main(), then allocate some values on the stack (alloca), | |
| then load and store various values (load and store). We do not have | |
| the high-level code structure as we had in C++ (with a loop), instead | |
| we have labeled code fragments, called LLVM basic blocks, and code flow moves | |
| from one to another by branch (br) instructions. (Label 2 is the | |
| condition check in the loop; label 5 is the body, label 9 is the | |
| increment, and label 12 is the final part of the function, outside | |
| of the loop). | |
| Conditional branches | |
| can depend on calculations, for example the results of comparing | |
| two values (icmp). Other numerical operations include addition (add). | |
| Finally, printf is called (call). The challenge, then, is to convert | |
| this and things like it into JavaScript. | |
| In general, Emscripten's main approach is to translate each line of LLVM | |
| assembly into JavaScript, 1 to 1, into `normal' JavaScript | |
| as much as possible. So, for example, an \emph{add} operation becomes | |
| a normal JavaScript addition, a function call becomes a JavaScript | |
| function call, etc. This 1 to 1 translation generates JavaScript | |
| that resembles the original assembly code, for example, the LLVM assembly code shown | |
| before for main() would be compiled into the following: | |
| \label{code:example} | |
| \begin{verbatim} | |
| function _main() { | |
| var __stackBase__ = STACKTOP; | |
| STACKTOP += 12; | |
| var __label__ = -1; | |
| while(1) switch(__label__) { | |
| case -1: | |
| var $1 = __stackBase__; | |
| var $sum = __stackBase__+4; | |
| var $i = __stackBase__+8; | |
| HEAP[$1] = 0; | |
| HEAP[$sum] = 0; | |
| HEAP[$i] = 0; | |
| __label__ = 0; break; | |
| case 0: | |
| var $3 = HEAP[$i]; | |
| var $4 = $3 <= 100; | |
| if ($4) { __label__ = 1; break; } | |
| else { __label__ = 2; break; } | |
| case 1: | |
| var $6 = HEAP[$i]; | |
| var $7 = HEAP[$sum]; | |
| var $8 = $7 + $6; | |
| HEAP[$sum] = $8; | |
| __label__ = 3; break; | |
| case 3: | |
| var $10 = HEAP[$i]; | |
| var $11 = $10 + 1; | |
| HEAP[$i] = $11; | |
| __label__ = 0; break; | |
| case 2: | |
| var $13 = HEAP[$sum]; | |
| var $14 = _printf(__str, $13); | |
| STACKTOP = __stackBase__; | |
| return 0; | |
| } | |
| } | |
| \end{verbatim} | |
| Some things | |
| to take notice of: | |
| \begin{itemize} | |
| \item A switch-in-a-loop construction is used in order to let the flow | |
| of execution move between basic blocks of code in an arbitrary manner: We set | |
| \emph{\_\_label\_\_} to the (numerical representation of the) label of | |
| the basic block we want to reach, and do a break, which leads to the proper | |
| basic block being reached. Inside each basic block, every line of code corresponds to a line of | |
| LLVM assembly, generally in a very straightforward manner. | |
| \item Memory is implemented by \emph{HEAP}, a JavaScript array. Reading from | |
| memory is a read from that array, and writing to memory is a write. | |
| \emph{STACKTOP} is the current position of the stack. (Note that we | |
| allocate 4 memory locations for 32-bit integers on the stack, but only | |
| write to 1 of them. See Section~\ref{sec:lsc} for why.) | |
| \item LLVM assembly functions become JavaScript functions, and function calls | |
| are normal JavaScript function calls. In general, we attempt to generate | |
| as `normal' JavaScript as possible. | |
| \item We implemented the LLVM \emph{add} operation using simple addition in JavaScript. | |
| As mentioned earlier, the semantics of that code are not entirely identical to | |
| those of the original LLVM assembly code (in this case, overflows will have very | |
| different effects). We will explain Emscripten's approach to that problem in | |
| Section~\ref{sssec:realworldcode}. | |
| \end{itemize} | |
| \subsection{Performance} | |
| \label{sec:perf} | |
| In this section we will deal with several topics regarding | |
| Emscripten's approach to generating high-performance JavaScript code. | |
| \subsubsection{Load-Store Consistency (LSC)} | |
| \label{sec:lsc} | |
| We saw before that Emscripten's memory usage allocates the usual number | |
| of bytes on the stack for variables (4 bytes for a 32-bit integer, etc.). | |
| However, we only wrote values into the first location, which appeared odd. | |
| We will now see the reason for that. | |
| To get there, we must first step back, and note that | |
| Emscripten does not aim to achieve perfect compatibility with all possible | |
| LLVM assembly (and correspondingly, with all possible C or C++ code, etc.); | |
| instead, Emscripten targets a large subset of LLVM assembly code, which is portable | |
| and does not make crucial assumptions about the underlying CPU architecture | |
| on which the code is meant to run. That subset is meant to encompass the | |
| vast majority of real-world code that would be compiled into LLVM, | |
| while also being compilable into very | |
| performant JavaScript. | |
| More specifically, Emscripten assumes that the LLVM assembly code it is | |
| compiling has \textbf{Load-Store Consistency} (LSC), which is the requirement that | |
| after a value with a specific type is written to a memory location, loads from | |
| that memory location will be of the same type (until a value with a different | |
| type is written there). | |
| Normal C and C++ | |
| code generally does so: If $x$ is a variable containing a 32-bit floating | |
| point number, then both loads and stores of $x$ will be of 32-bit floating | |
| point values, and not 16-bit unsigned integers or anything else. | |
| To see why this is important for performance, consider the following | |
| C code fragment, which does \emph{not} have LSC: | |
| \begin{verbatim} | |
| int x = 12345; | |
| printf("first byte: %d\n", *((char*)&x)); | |
| \end{verbatim} | |
| Assuming an architecture with more than 8 bits, this code will read | |
| the first byte of \emph{x}. (This might, for example, be used to detect the | |
| endianness of the CPU.) To compile this into JavaScript in | |
| a way that will run properly, we must do more than a single operation | |
| for either the read or the write, for example we could do this: | |
| \begin{verbatim} | |
| var x_value = 12345; | |
| var x_addr = stackAlloc(4); | |
| HEAP[x_addr] = (x_value >> 0) & 255; | |
| HEAP[x_addr+1] = (x_value >> 8) & 255; | |
| HEAP[x_addr+2] = (x_value >> 16) & 255; | |
| HEAP[x_addr+3] = (x_value >> 24) & 255; | |
| [...] | |
| printf("first byte: %d\n", HEAP[x_addr]); | |
| \end{verbatim} | |
| Here we allocate space for the value of \emph{x} on the stack, and | |
| store that address in \emph{x\_addr}. The stack itself is part of | |
| the `memory space', which is the array \emph{HEAP}. In order for | |
| the read on the final line to give the proper value, we must go to | |
| the effort of doing 4 store operations, each of the value of a | |
| particular byte. In other words, \emph{HEAP} is an array of bytes, | |
| and for each store into memory, we must deconstruct the value into | |
| bytes.\footnote{Note that we can use JavaScript typed arrays with a shared memory | |
| buffer, which would work as expected, assuming (1) we are running | |
| in a JavaScript engine which supports typed arrays, and (2) we | |
| are running on a CPU with the same architecture as we expect. This | |
| is therefore dangerous as the generated code may run differently on | |
| different JavaScript engines and different CPUs. | |
| Emscripten currently has optional experimental support for typed arrays.} | |
| Alternatively, we can store the value in a single operation, and | |
| deconstruct into bytes as we load. This will be faster in some | |
| cases and slower in others, but is still more overhead | |
| than we would like, generally speaking -- for if the code \textbf{does} have | |
| LSC, then we can translate that code fragment into | |
| the far more optimal | |
| \begin{verbatim} | |
| var x_value = 12345; | |
| var x_addr = stackAlloc(4); | |
| HEAP[x_addr] = x_value; | |
| [...] | |
| printf("first byte: %d\n", HEAP[x_addr]); | |
| \end{verbatim} | |
| (Note that even this can be optimized even more -- we can store | |
| \emph{x} in a normal JavaScript variable. We will discuss such | |
| optimizations in Section \ref{sec:codeopt}; for now we are | |
| just clarifying why it is useful to assume we are compiling code | |
| that has LSC.) | |
| In practice the vast majority of C and C++ code does have LSC. Exceptions | |
| do exist, however, for example: | |
| \begin{itemize} | |
| \item Code that detects CPU features like endianness, the behavior of floats, etc. In general such code can be disabled | |
| before running it through Emscripten, as it is not actually needed. | |
| \item \emph{memset} and related functions typically work on values of one kind, | |
| regardless of the underlying values. For example, memset may write 64-bit | |
| values on a 64-bit CPU since that is usually faster than writing individual | |
| bytes. This tends to | |
| not be a problem, as with \emph{memset} the most common case is setting to | |
| 0, and with \emph{memcpy}, the values end up copied properly anyhow (with | |
| a proper implementation of \emph{memcpy} in Emscripten's generated code). | |
| \item Even LSC-obeying C or C++ code may turn into LLVM assembly that does not, | |
| after being optimized. For example, when storing two 32-bit integers constants into | |
| adjoining locations in a structure, the optimizer may generate a single | |
| 64-bit store of an appropriate constant. In other words, optimization can | |
| generate nonportable code, which runs faster on the current CPU, but | |
| nowhere else. Emscripten currently assumes that optimizations of this form | |
| are not being used. | |
| \end{itemize} | |
| In practice it may be hard to know if code has LSC or not, and requiring | |
| a time-consuming code audit is obviously impractical. Emscripten therefore has | |
| a compilation option, SAFE\_HEAP, which generates code that checks that LSC holds, and warns if it | |
| doesn't. It also warns about other memory-related issues like | |
| reading from memory before a value was written (somewhat similarly to tools | |
| like Valgrind\footnote{\url{http://valgrind.org/}}). When such problems are detected, possible solutions are to ignore the issue (if it has no actual | |
| consequences), or alter the source code. | |
| Note that it is somewhat wasteful to allocate 4 memory locations for | |
| a 32-bit integer, and use only one of them. It is possible to change | |
| that behavior with the QUANTUM\_SIZE parameter to Emscripten, however, | |
| the difficulty is that LLVM assembly has hardcoded values that depend on | |
| the usual memory sizes being used. We are looking into modifications | |
| to LLVM itself to remedy that. | |
| \subsubsection{Emulating Code Semantics} | |
| \label{sssec:realworldcode} | |
| As mentioned in the introduction, the semantics of LLVM assembly and JavaScript are not identical: The former | |
| is very close to that of a modern CPU, while the latter is a high-level | |
| dynamic language. Both are of course Turing-complete, so it is possible to | |
| precisely emulate each in the other, but doing so with good performance is | |
| more challenging. For example, if we want to convert | |
| \begin{verbatim} | |
| add i8 %1, %2 | |
| \end{verbatim} | |
| (add two 8-bit integers) to JavaScript, then to be completely accurate we must emulate the | |
| exact same behavior, in particular, we must handle overflows properly, which would not be the case if we just implement | |
| this as $\%1 + \%2$ in JavaScript. For example, with inputs of $255$ and $1$, the | |
| correct output is 0, but simple addition in JavaScript will give us 256. We | |
| can of course emulate the proper behavior by adding additional code. | |
| This however significantly degrades performance, | |
| because modern JavaScript engines can often translate something like $z = x + y$ into | |
| native code containing a single instruction (or very close to that), but if instead we had | |
| something like $z = (x + y)\&255$ (in order to correct overflows), the JavaScript engine | |
| would need to generate additional code to perform the AND operation.\footnote{ | |
| In theory, the JavaScript engine could determine that we are implicitly working | |
| on 8-bit values here, and generate machine code that no longer needs the AND operation. | |
| However, most or all modern JavaScript engines have just two internal numeric types, doubles and | |
| 32-bit integers. This is so because they are tuned for `normal' JavaScript code | |
| on the web, which in most cases is served well by just those two types. | |
| In addition, even if JavaScript engines did analyze code containing $\&255$, etc., | |
| in order to deduce that a variable can be implemented | |
| as an 8-bit integer, there is a cost to including all the necessary $\&255$ text | |
| in the script, because code size is a significant factor on the web. Adding even | |
| a few characters for every single mathematic operation, in a large JavaScript file, | |
| could add up to a significant increase in download size.} | |
| Emscripten's approach to this problem is to allow the generation of both accurate code, | |
| that is identical in behavior to LLVM assembly, and inaccurate code which is | |
| faster. In practice, most addition operations in LLVM do not overflow, | |
| and can simply be translated into $\%1 + \%2$. Emscripten | |
| provides tools that make it straightforward to find which code does require | |
| the slower, more accurate code, and to generate that code in those locations, as follows: | |
| \begin{itemize} | |
| \item Compile the code using Emscripten with special options that generate runtime checking. | |
| CHECK\_OVERFLOWS adds runtime checks for integer overflows, CHECK\_SIGNS | |
| checks for signing issues (the behavior of signed and unsigned integers can | |
| be different, and JavaScript does not natively support that difference), and | |
| CHECK\_ROUNDINGS checks for rounding issues (in C and C++, the convention is | |
| to round towards 0, while in JavaScript there is no simple operation that does | |
| the same). | |
| \item Run the compiled code on a representative sample of inputs, and notice which | |
| lines are warned about by the runtime checks. | |
| \item Recompile the code, telling Emscripten to add corrections (using CORRECT\_SIGNS, CORRECT\_OVERFLOWS | |
| or CORRECT\_ROUNDINGS) only on the specific lines that actually need it. | |
| \end{itemize} | |
| This method is not guaranteed to work, as if we do not run on a truly representative | |
| sample of possible inputs, we may not compile with all necessary corrections. It is | |
| of course possible to compile with all corrections applied to all the code, to make | |
| sure things will work properly (this is the default compilation setting), however, in | |
| practice the procedure above works quite well, and results in code is significantly faster. | |
| \subsubsection{Emscripten Code Optimizations} | |
| \label{sec:codeopt} | |
| When comparing the example program from page~\pageref{code:example}, | |
| the generated code was fairly complicated | |
| and cumbersome, and unsurprisingly it performs quite poorly. There | |
| are two main reasons for that: First, that the code is simply | |
| unoptimized -- there are many variables declared when fewer could | |
| suffice, for example, and second, that the code does not use `normal' | |
| JavaScript, which JavaScript engines are optimized for -- it | |
| stores all variables in an array (not normal JavaScript variables), | |
| and it controls the flow of execution using a switch-in-a-loop, not | |
| normal JavaScript loops and ifs. | |
| Emscripten's approach to generating fast-performing code is as | |
| follows. Emscripten doesn't do any | |
| optimizations that can be done by other tools: | |
| LLVM can be used to perform optimizations before Emscripten, and | |
| the Closure Compiler\footnote{\url{http://code.google.com/closure/compiler/}} | |
| can perform optimizations on the generated JavaScript afterwards. Those | |
| tools will perform standard useful optimizations like removing unneeded variables, dead code elimination, | |
| function inlining, etc. | |
| That leaves two major optimizations that are left for Emscripten | |
| to perform: | |
| \begin{itemize} | |
| \item \textbf{Variable nativization}: Convert variables | |
| that are on the stack -- which is implemented using addresses in the \emph{HEAP} array | |
| as mentioned earlier -- into native JavaScript variables (that is to say, \emph{var x;} and so forth). In general, | |
| a variable will be nativized unless it is used | |
| outside that function, e.g., if its address is taken and stored somewhere | |
| or passed to another function. When optimizing, Emscripten tries to nativize | |
| as many variables as possible. | |
| \item \textbf{Relooping}: Recreate high-level loop and if structures | |
| from the low-level code block data that appears in LLVM assembly. | |
| We describe Emscripten's Relooper algorithm in Section~\ref{sec:relooper}. | |
| \end{itemize} | |
| When run with Emscripten's optimizations, the code on page \pageref{code:example} looks | |
| like this: | |
| \begin{verbatim} | |
| function _main() { | |
| var __label__; | |
| var $1; | |
| var $sum; | |
| var $i; | |
| $1 = 0; | |
| $sum = 0; | |
| $i = 0; | |
| $2$2: while(1) { | |
| var $3 = $i; | |
| var $4 = $3 <= 100; | |
| if (!($4)) { __label__ = 2; break $2$2; } | |
| var $6 = $i; | |
| var $7 = $sum; | |
| var $8 = $7 + $6; | |
| $sum = $8; | |
| var $10 = $i; | |
| var $11 = $10 + 1; | |
| $i = $11; | |
| __label__ = 0; continue $2$2; | |
| } | |
| var $13 = $sum; | |
| var $14 = _printf(__str, $13); | |
| return 0; | |
| } | |
| \end{verbatim} | |
| If in addition the Closure Compiler is run on that output, we get | |
| \begin{verbatim} | |
| function K() { | |
| var a, b; | |
| b = a = 0; | |
| a:for(;;) { | |
| if(!(b <= 100)) { | |
| break a | |
| } | |
| a += b; | |
| b += 1; | |
| } | |
| _printf(J, a); | |
| return 0; | |
| } | |
| \end{verbatim} | |
| which is fairly close to the original C++ (the differences, of | |
| having the loop's condition inside the loop instead of inside | |
| the for() expression at the top of the original loop, are not important to performance). Thus, it is possible | |
| to recreate the original high-level structure of the code that | |
| was compiled into LLVM assembly. | |
| \subsection{Benchmarks} | |
| \label{sec:benchmarks} | |
| We will now take a look at some performance benchmarks: | |
| \bigskip | |
| \begin{tabular}{ l | c | c | c || c } | |
| \hline | |
| \textbf{benchmark} & \textbf{SM} & \textbf{V8} & \textbf{gcc} & \textbf{ratio} \\ | |
| \hline | |
| fannkuch (10) & 1.158 & \textbf{0.931} & 0.231 & 4.04 \\ | |
| fasta (2100000) & \textbf{1.115} & 1.128 & 0.452 & 2.47 \\ | |
| primes & \textbf{1.443} & 3.194 & 0.438 & 3.29 \\ | |
| raytrace (7,256) & \textbf{1.930} & 2.944 & 0.228 & 8.46 \\ | |
| dlmalloc (400,400) & 5.050 & \textbf{1.880} & 0.315 & 5.97 \\ | |
| \hline | |
| \end{tabular} | |
| \bigskip | |
| The first column is the name of the benchmark, and in parentheses any | |
| parameters used in running it. The source code to all the benchmarks | |
| can be found at \url{https://github.com/kripken/emscripten/tree/main/tests} | |
| (each in a separate file with its name, except for `primes', which is | |
| embedded inside runner.py in the function test\_primes). A brief summary of | |
| the benchmarks is as follows: | |
| \begin{itemize} | |
| \item \textbf{fannkuch} and \textbf{fasta} are commonly-known benchmarks, appearing for example | |
| on the Computer Language Benchmarks Game\footnote{\url{http://shootout.alioth.debian.org/}}. | |
| They use a mix of mathematic operations (integer in the former, floating-point in the latter) and memory access. | |
| \item \textbf{primes} is the simplest benchmark in terms of code. It is basically just a tiny loop that calculates prime numbers. | |
| \item \textbf{raytrace} is real-world code, from the sphereflake raytracer\footnote{\url{http://ompf.org/ray/sphereflake/}}. This benchmark has a combination of memory access and floating-point math. | |
| \item \textbf{dlmalloc} (Doug Lea's malloc\footnote{\url{http://en.wikipedia.org/wiki/Malloc#dlmalloc_and_its_derivatives}}) is a well-known real-world implementation of malloc and free. This benchmark does a large amount of calls to malloc and free in an intermixed way, which tests memory access and integer calculations. | |
| \end{itemize} | |
| Returning to the table of results, the second | |
| column is the elapsed time (in seconds) when running the compiled code (generated using all Emscripten and LLVM | |
| optimizations as well as the Closure Compiler) in the SpiderMonkey JavaScript | |
| engine (specifically the JaegerMonkey branch, checked out June 15th, 2011). | |
| The third column is the elapsed time when running the same JavaScript code in the V8 JavaScript engine | |
| (checked out Jun 15th, 2011). In both the second and third column lower values | |
| are better; the best of the two is in bold. | |
| The fourth column is the elapsed time when running the original code compiled with \emph{gcc -O3}, | |
| using GCC 4.4.4. The last column is the ratio, that is, how much slower the JavaScript code | |
| (running in the faster of the two engines for that test) is | |
| when compared to gcc. All the tests were run on a MacBook Pro with | |
| an Intel i7 CPU clocked at 2.66GHz, running on Ubuntu 10.04. | |
| Clearly the results greatly vary by the benchmark, with the generated JavaScript running from 2.47 to 8.46 times | |
| slower. There are also significant differences between the two JavaScript engines, with each better | |
| at some of the benchmarks. | |
| It appears that code that does simple numerical operations -- like | |
| the primes test -- can run fairly fast, while code that has a lot of memory | |
| accesses, for example due to using structures -- like the raytrace test -- | |
| will be slower. (The main issue with structures is that Emscripten does not | |
| `nativize' them yet, as it does to simple local variables.) | |
| Being 2.47 to 8.46 times slower than the most-optimized C++ code | |
| is a significant slowdown, but it is still more than fast enough for | |
| many purposes, and the main point of course is that the code can run | |
| anywhere the web can be accessed. Further work on Emscripten is expected to | |
| improve the speed as well, as are improvements to LLVM, the Closure | |
| Compiler, and JavaScript engines themselves; see further discussion | |
| in the Summary. | |
| \subsection{Limitations} | |
| Emscripten's compilation approach, as has been described in this Section so far, | |
| is to generate `natural' JavaScript, as close as possible to normal JavaScript | |
| on the web, so that modern JavaScript engines perform well on it. In particular, | |
| we try to generate `normal' JavaScript operations, like regular addition and | |
| multiplication and so forth. This is a very | |
| different approach than, say, emulating a CPU on a low level, or for the case | |
| of LLVM, writing an LLVM bitcode interpreter in JavaScript. The latter approach | |
| has the benefit of being able to run virtually any compiled code, at the cost | |
| of speed, whereas Emscripten makes a tradeoff in the other direction. We will | |
| now give a summary of some of the limitations of Emscripten's approach. | |
| \begin{itemize} | |
| \item \textbf{64-bit Integers}: JavaScript numbers are all 64-bit doubles, with engines | |
| typically implementing them as 32-bit integers where possible for speed. | |
| A consequence of this is that it is impossible to directly implement | |
| 64-bit integers in JavaScript, as integer values larger than 32 bits will become doubles, | |
| with only 53 bits for the significand. Thus, when Emscripten uses normal | |
| JavaScript addition and so forth for 64-bit integers, it runs the risk of | |
| rounding effects. This could be solved by emulating 64-bit integers, | |
| but it would be much slower than native code. | |
| \item \textbf{Multithreading}: JavaScript has Web Workers, which are additional | |
| threads (or processes) that communicate via message passing. There is no | |
| shared state in this model, which means that it is not directly possible | |
| to compile multithreaded code in C++ into JavaScript. A partial solution | |
| could be to emulate threads, without Workers, by manually controlling | |
| which blocks of code run (a variation on the switch in a loop construction | |
| mentioned earlier) and manually switching between threads every so often. | |
| However, in that case there would not be any utilization | |
| of additional CPU cores, and furthermore performance would be slow due to not | |
| using normal JavaScript loops. | |
| \end{itemize} | |
| After seeing these limitations, it is worth noting that some advanced LLVM instructions turn out to be | |
| surprisingly easy to implement. For example, C++ exceptions are represented in | |
| LLVM by \emph{invoke} and \emph{unwind}, where \emph{invoke} is a call to a function that will | |
| potentially trigger an \emph{unwind}, and \emph{unwind} returns to the earliest invoke. | |
| If one were to implement those | |
| in a typical compiler, doing so would require careful work. In Emscripten, however, | |
| it is possible to do so using JavaScript exceptions in a straightforward manner: | |
| \emph{invoke} becomes a function call wrapped in a \emph{try} block, and \emph{unwind} | |
| becomes \emph{throw}. This is a case where compiling to a high-level language turns | |
| out to be quite convenient. | |
| \section{Emscripten's Architecture} | |
| \label{sec:emarch} | |
| In the previous section we saw a general overview of Emscripten's approach | |
| to compiling LLVM assembly code into JavaScript. We will now get into more detail | |
| into how Emscripten itself is implemented. | |
| Emscripten is written in JavaScript. The primary reason for that decision | |
| was convenience: Two simple examples of the benefits of that approach are that (1) | |
| the compiler can create JavaScript objects that represent constant structures from the original | |
| assembly code, and convert them to a string using JSON.stringify() | |
| in a trivial manner, | |
| and (2) the compiler can simplify numerical operations by simply | |
| eval()ing the code (so ``1+2'' would become ``3'', etc.). In both examples, | |
| the development of Emscripten was made simpler by having the exact same environment | |
| during compilation as the executing code will have. This also helps in more | |
| complex ways, for example when the same code needs to be run at compile time | |
| and at runtime, and makes various dynamic compilation techniques possible in the future. | |
| Emscripten's compilation has three main phases: | |
| \begin{itemize} | |
| \item The \textbf{intertyper}, which converts from LLVM assembly into | |
| Emscripten's internal representation. | |
| \item The \textbf{analyzer}, which inspects the internal representation | |
| and generates various useful information for the final phase, | |
| including type and variable information, stack usage analysis, | |
| optional data for optimizations | |
| (variable nativization and relooping), etc. | |
| \item The \textbf{jsifier}, which does the final conversion of the | |
| internal representation plus additional analyzed data into JavaScript. | |
| \end{itemize} | |
| \subsection{The Runtime Environment} | |
| \label{sec:runtime} | |
| Code generated from Emscripten is meant to run in a JavaScript engine, | |
| typically in a web browser. This has implications for the kind of | |
| runtime environment we can generate for it, for example, there is no | |
| direct access to the local filesystem. | |
| Emscripten comes with a partial implementation of a C library, | |
| mostly written from scratch in JavaScript, with parts compiled from an | |
| existing C library\footnote{newlib, \url{http://sourceware.org/newlib/}}. Some aspects of the runtime environment, as | |
| implemented in that C library, are: | |
| \begin{itemize} | |
| \item An emulated filesystem is available, with files stored in memory. | |
| \item Emscripten allows writing pixel data to an HTML5 canvas element, | |
| using a subset of the SDL API. That is, one can write an application in C or C++ using | |
| SDL, and that same application can be compiled normally and run | |
| locally, or compiled using Emscripten and run on the web. See, for | |
| example, Emscripten's raytracing demo at \url{http://syntensity.com/static/raytrace.html}. | |
| \item \emph{sbrk()} is implemented using the \emph{HEAP} array which | |
| was mentioned previously. This allows a normal \emph{malloc()} | |
| implementation written in C to be compiled to JavaScript. | |
| \end{itemize} | |
| \subsection{The Relooper: Recreating high-level loop structures} | |
| \label{sec:relooper} | |
| The Relooper the most complex module in Emscripten. It receives | |
| a `soup of blocks', which is a set of labeled fragments of code, each | |
| ending with a branch operation, and the goal is to generate normal | |
| high-level JavaScript code flow structures such as loops and ifs. | |
| Generating such code structures is essential to producing good-performing code, | |
| since JavaScript engines are tuned to run such code very quickly (for | |
| example, a tracing JIT as in SpiderMonkey will only trace normal loops). | |
| Returning to the LLVM assembly code on page~\pageref{code:examplellvm}, it | |
| has the following structure (where arrows denote potential paths of execution): | |
| \includegraphics[width=80mm]{graph.png} | |
| In this simple example, it is fairly straightforward to see that a natural way to implement it | |
| using normal loop structures is | |
| \newpage | |
| \begin{verbatim} | |
| ENTRY | |
| while (true) do | |
| 2 | |
| if (condition) break | |
| 5 | |
| 9 | |
| 12 | |
| \end{verbatim} | |
| In general though, this is not always easy or even practical -- there may | |
| not be a straightforward high-level loop structure corresponding to the low-level one, if | |
| for example the original C code relied heavily on \emph{goto} instructions. | |
| In practice, however, almost all real-world C and C++ code tends to | |
| be amenable to loop recreation. | |
| We now begin to describe the Relooper algorithm. As mentioned before, it takes as input a `soup of labeled LLVM blocks' as described above, | |
| and generates a structured set of Emscripten code blocks, which are each a set of LLVM blocks | |
| with some logical structure. For simplicity we call LLVM blocks `labels' and Emscripten | |
| blocks `blocks' in the following. | |
| There are three types of Emscripten blocks: | |
| \begin{itemize} | |
| \item \textbf{Simple block}: A block with | |
| \begin{itemize} | |
| \item One \textbf{Internal} label, and | |
| \item a \textbf{Next} | |
| block, which the internal label branches to. The block is later | |
| translated simply into the code for that label, and the Next | |
| block appears right after it. | |
| \end{itemize} | |
| \item \textbf{Loop}: A block that represents a basic loop, comprised of | |
| two internal sub-blocks: | |
| \begin{itemize} | |
| \item \textbf{Inner}: A block that will appear inside | |
| the loop, i.e., when execution reaches the end of that block, | |
| flow will return to the beginning. Typically a loop will contain | |
| a conditional \emph{break} defining where it is exited. When we | |
| exit, we reach the Next block, below. | |
| \item \textbf{Next}: A block that will appear just outside | |
| the loop, in other words, that will be reached when the loop is exited. | |
| \end{itemize} | |
| \item \textbf{Multiple}: A block that represents a divergence into several | |
| possible branches, that eventually rejoin. A Multiple block can | |
| implement an `if', an `if-else', a `switch', etc. It is comprised of: | |
| \begin{itemize} | |
| \item \textbf{Handled blocks}: A set of blocks to which execution can | |
| enter. When we reach the multiple block, we check which of them | |
| should execute, and go there. When execution of that block is | |
| complete, or if none of the handled blocks was selected for | |
| execution, we proceed to the Next block, below. | |
| \item \textbf{Next}: A block that will appear just after the Handled blocks, | |
| in other words, that will be reached after code flow | |
| exits the Handled blocks. | |
| \end{itemize} | |
| \end{itemize} | |
| To clarify these definitions, the example LLVM assembly code we have | |
| been working with could be represented in a natural way as | |
| \begin{verbatim} | |
| Simple | |
| entry | |
| Loop | |
| Simple | |
| 2 | |
| Simple | |
| 5 | |
| Simple | |
| 9 | |
| null | |
| Simple | |
| 12 | |
| null | |
| \end{verbatim} | |
| where the first indented line in a Simple block is the Internal label in that Simple block, | |
| the second indented line is its Next block, and so forth. | |
| Continuing to describe the Relooper algorithm, we will use the term `entry' to | |
| mean a label that can be reached immediately in a block. In other | |
| words, a block consists of labels $l_1,..,l_n$, and the entries | |
| are a subset of those labels, specifically the ones that execution | |
| can directly reach when we reach that block. With that | |
| definition, the Relooper algorithm can | |
| then be written as follows: | |
| \begin{itemize} | |
| \item \textbf{Receive a set of labels and which of them are entry points.} | |
| We wish to create a block comprised of all those labels. | |
| \item \textbf{Calculate, for each label, which other labels it \emph{can} | |
| reach}, i.e., which labels we are able to reach if we start | |
| at the current label and follow one of the possible paths | |
| of execution. | |
| \item \textbf{If we have a single entry, and cannot return to it (by some other | |
| label later on branching to it) then create a Simple block}, with the entry | |
| as its internal label, and the Next block comprised of all | |
| the other labels. The entries for the Next block are the entries | |
| to which the internal label can branch. | |
| \item \textbf{If we can return to all of the entries, create a | |
| Loop block}, whose Inner block is comprised of all labels that | |
| can reach one of the entries, and whose Next block is | |
| comprised of all the others. The entry labels for the current | |
| block become entry labels for the Inner block (note that | |
| they must be in the Inner block by definition, as each one can reach | |
| itself). The Next block's entry labels are all the labels | |
| in the Next block that can be reached by the Inner block. | |
| \item \textbf{If we have more than one entry, try to create a Multiple block}: For each entry, find all | |
| the labels it reaches that cannot be reached by any other | |
| entry. If at least one entry has such labels, return a | |
| Multiple block, whose Handled blocks are blocks for those | |
| labels (and whose entries are those labels), and whose Next block is all the rest. | |
| Entries for the next block are entries that did not become part of the Handled | |
| blocks, and also labels that can be reached from the Handled blocks. | |
| \item \textbf{If we could not create a Multiple block, then create a Loop block as described above} | |
| (see proof below of why creating a Loop block is possible, i.e., why the labels contain a loop). | |
| \end{itemize} | |
| Note that we first create a Loop only if we must, then try to create a | |
| Multiple, then create a Loop if we have no other choice. We could have slightly simplified this in | |
| various ways, but the algorithm as presented above has given overall better | |
| results in practice, in terms of the `niceness' of the shape of the | |
| generated code, both subjectively and at least in some simple benchmarks. | |
| Additional details of the algorithm include | |
| \begin{itemize} | |
| \item The technical mechanism by which execution flow is controlled in the generated code involves | |
| the \emph{\_\_label\_\_} variable, mentioned earlier. Whenever we enter a block with more than one | |
| entry, we set \emph{\_\_label\_\_} before we branch into it, and we | |
| check its value when we enter that block. So, for example, when we | |
| create a Loop block, its Next block can have multiple entries -- any | |
| label to which we branch out from the loop. By creating a Multiple | |
| block after the loop, we can enter the proper label when the loop is | |
| exited. (Having a \emph{\_\_label\_\_} variable does add some overhead, | |
| but it greatly simplifies the problem that the Relooper needs to solve | |
| and allows us to only need three kinds of blocks as described above. | |
| Of course, it is possible to optimize | |
| away writes and reads to \emph{\_\_label\_\_} in many or even most cases.) | |
| \item As the Relooper processes labels, it replaces branch | |
| instructions accordingly. For example, when we create a Loop | |
| block, then all branch instructions to the outside of the loop are | |
| converted into \emph{break} commands (since a break instruction in JavaScript | |
| will indeed get us to outside of the loop), and all branch | |
| instructions to the beginning of the loop are converted into | |
| \emph{continue} commands, etc. Those commands are then | |
| ignored when called recursively to generate the Inner block (that is, | |
| the \emph{break} and \emph{continue} | |
| commands are guaranteed, by the semantics of JavaScript, to get us to | |
| where we need to go -- they do not need any further work | |
| for them to work properly). | |
| \item Emscripten also does an additional pass after what has been | |
| described thus far, which was the first pass. The first pass is guaranteed to produce valid output (see below), while | |
| the second pass takes that valid output and optimizes it, by | |
| making minor changes such as removing | |
| \emph{continue} commands that occur at the very end of loops | |
| (where they are not needed), etc. | |
| \end{itemize} | |
| We now turn to an analysis of the Relooper algorithm. It is straightforward to see that the output of the algorithm, | |
| assuming it completes successfully -- that is, that if finishes in finite time, and does | |
| not run into an error in the last part (where it is claimed that | |
| if we reach it we can return to at least one of the entry labels) -- | |
| is correct in the sense of code execution being carried out | |
| as in the original data. We will now prove that the algorithm must | |
| in fact complete successfully. | |
| First, note that if we | |
| successfully create a block, then we simplify the remaining | |
| problem, where the `complexity' of the problem for our purposes | |
| here is the sum of labels plus the sum of branching operations: | |
| \begin{itemize} | |
| \item This is trivial for Simple blocks (since we now have a Next block | |
| which is strictly smaller). | |
| \item It is true for Loop blocks simply by removing branching | |
| operations (there must be a branching back to an entry, which | |
| becomes a \emph{continue}). | |
| \item For Multiple blocks, if the Next block is non-empty then we have split into strictly | |
| smaller blocks (in number of labels) than before. If the next block | |
| is empty, then since we built the Multiple block from a set of labels | |
| with more than one entry, then the Handled blocks are strictly smaller | |
| than the current one. | |
| \end{itemize} | |
| We have seen that whenever we successfully create a block, we simplify the remaining problem | |
| as defined above, which means that we must eventually halt successfully (since | |
| we strictly decrease a nonnegative integer). | |
| The remaining issue is whether we can reach a situation where we \emph{cannot} | |
| successfully create a block, which is if we reach the final part of the relooper algorithm, but cannot create a | |
| Loop block there. For that to occur, we must not be able | |
| to return to any of the entries (or else we would create a Loop | |
| block). Assume that indeed we cannot return to any of the entries. But if that is so, we can create a Multiple | |
| block with Handled blocks that each include one of the entries (and possibly additional labels as well), since each entry | |
| label cannot be reached from any other label by our assumption earlier, thus | |
| contradicting that assumption | |
| and concluding the proof. | |
| (We have not, of course, proven that the shape of the blocks is optimal | |
| in any sense. However, even if it is possible to optimize them further, the Relooper | |
| already gives a very substantial speedup due to the move from the switch-in-a-loop | |
| construction to more natural JavaScript code flow structures.) | |
| \section{Example Uses} | |
| \label{sec:examples} | |
| Emscripten has been run successfully on several real-world codebases. We present | |
| some examples here to give an idea of the various opportunities made possible | |
| by Emscripten. | |
| \begin{itemize} | |
| \item \textbf{Python}: It is possible to run variants of Python on | |
| the web in various ways, including Pyjamas, IronPython on SilverLight and | |
| Jython in Java. However, all of these are slightly nonstandard in the | |
| Python code they run, while the latter two also require plugins to be | |
| installed. With Emscripten, on the other hand, it is possible to compile | |
| CPython itself -- the standard, reference implementation of Python -- and | |
| run that on the web, which allows running standard Python code. An online | |
| demo is available at \url{http://syntensity.com/static/python.html}. | |
| (Another example of a language runtime that Emscripten can convert to run | |
| on the web is Lua; an online demo is available at \url{http://syntensity.com/static/lua.html}.) | |
| \item \textbf{Poppler and FreeType}: Poppler\footnote{\url{http://poppler.freedesktop.org/}} is an open source PDF | |
| rendering library. In combination with FreeType\footnote{\url{http://www.freetype.org/}}, an open source font | |
| engine, it can be used to render PDF files. By compiling it with Emscripten, | |
| PDF files can be viewed on the web, without the use of plugins or external | |
| applications. An online demo is available at \url{http://syntensity.com/static/poppler.html} | |
| \item \textbf{Bullet}: The Bullet Physics library\footnote{\url{http://bulletphysics.org/wordpress/}} is | |
| an open source physics engine, used in many open source and proprietary applications. An online | |
| demo is available at \url{http://syntensity.com/static/bullet.html}, showing a physics | |
| simulation of falling blocks that uses Bullet compiled to JavaScript. Bullet has in the | |
| past been ported to JavaScript\footnote{\url{http://pl4n3.blogspot.com/2010/07/bulletjs-javascript-physics-engine.html}}, by porting JBullet (a port of Bullet to Java). The main difference in the approaches is that with Emscripten, there is no need for | |
| time-consuming manual conversion of C++ to Java and then to JavaScript, and consequently, | |
| the latest Bullet code can be run in JavaScript and not an earlier version (JBullet lags | |
| several versions behind the latest Bullet release). | |
| \end{itemize} | |
| \section{Summary} | |
| \label{sec:summary} | |
| We presented Emscripten, an LLVM-to-JavaScript compiler, which opens up | |
| numerous opportunities for running code written in languages other | |
| than JavaScript on the web, including some not previously possible. | |
| Emscripten can be used to, among other | |
| things, compile real-world C and C++ code and run that on the web. In | |
| addition, by compiling the runtimes of languages which are implemented in C and C++, | |
| we can run them on the web as well, for example Python and Lua. | |
| Perhaps the largest future goal of Emscripten is to improve the performance of | |
| the generated code. As we have seen, speeds of around $1/10$th that of | |
| GCC are possible, which is already good enough for many purposes, but | |
| can be improved much more. The code Emscripten generates will become faster | |
| `for free' as JavaScript engines get | |
| faster, and also by improvements in the optimizations done by LLVM and the Closure | |
| Compiler. However there is also a lot of room for additional optimizations in | |
| Emscripten itself, in particular in how it nativizes variables and structures, | |
| which can potentially lead to very significant speedups. | |
| When we compile a language's entire runtime into JavaScript, as mentioned | |
| before, there is another way to improve performance. | |
| Assume that we are compiling a C or C++ runtime of a language | |
| into JavaScript, and that that runtime uses JIT compilation to generate machine code. Typically | |
| code generators for JITs are written for the main CPU architectures, today | |
| x86, x86\_64 and ARM. However, it would be possible for a JIT to | |
| generate JavaScript instead. Thus, the runtime would be compiled using | |
| Emscripten, and at runtime it would pass the JIT-generated JavaScript to | |
| \emph{eval}. In this | |
| scenario, JavaScript is used as a low-level intermediate representation in | |
| the runtime, and the final conversion to machine code is left to the underlying | |
| JavaScript engine. This approach can potentially allow languages that | |
| greatly benefit from a JIT (such as Java, Lua, etc.) to be run on the web | |
| efficiently. | |
| Getting back to the issue of high-performing code in general, it is worth comparing | |
| Emscripten to Portable Native Client (\cite{pnacl},~\cite{nacl}), a project in development which aims | |
| to allow an LLVM-like format to be distributed and run securely | |
| on the web, with speed comparable to native code. | |
| Both Emscripten | |
| and PNaCl aim to allow code written in languages like C and C++ to | |
| be run on the web, but in very different ways: Emscripten compiles code into JavaScript, and | |
| PNaCl compiles into an LLVM-like format which is then | |
| run in a special PNaCl runtime. As a consequence, Emscripten's generated code can run on all web browsers, | |
| since it is standard JavaScript, while PNaCl's generated code | |
| requires the PNaCl runtime to be installed; another major | |
| difference is that JavaScript engines do not yet run code at | |
| near-native speeds, while PNaCl does. In a broad summary, Emscripten's | |
| approach allows the code to be run in more places, while PNaCl's | |
| allows the code to run faster. | |
| However, as mentioned earlier, improvements in JavaScript engines and compiler | |
| technology may narrow the speed | |
| gap. Also, when considering the speed of JavaScript engines, for purposes of Emscripten we do not need to | |
| care about \emph{all} JavaScript, but only the kind generated by | |
| Emscripten. Such code is \textbf{implicitly statically typed}, that is, | |
| types are not mixed, despite JavaScript in general allowing assigning, e.g., an | |
| integer to a variable and later a floating point value or even an object to that same variable. Implicitly statically | |
| typed code can be statically analyzed and converted into | |
| machine code that has no runtime type checks at all. While such | |
| static analysis can be time-consuming, there are practical ways for | |
| achieving similar results quickly, such as tracing and type inference, which | |
| would help on such code very significantly, and are already in use | |
| or being worked on in mainstream JavaScript engines (e.g., SpiderMonkey). As | |
| a consequence, it may soon be possible to run code written in languages such as | |
| C and C++ on the web with near-native speed. | |
| %\appendix | |
| %\section{Appendix Title} | |
| %This is the text of the appendix, if you need one. | |
| \acks | |
| We thank the following people for their contributions to Emscripten: David LaPalomento, Daniel Heres, Brian Crowder, Brian McKenna, dglead and tuba. | |
| \bibliographystyle{abbrvnat} | |
| % The bibliography should be embedded for final submission. | |
| \begin{thebibliography}{} | |
| \softraggedright | |
| \bibitem[Ashkenas. (2011)]{ashkenas} J. Ashkenas. | |
| List of languages that compile into JavaScript. Available at \url{https://github.com/jashkenas/coffee-script/wiki/List-of-languages-that-compile-to-JS}. | |
| Retrieved April 2011. | |
| \bibitem[Cifuentes et~al. (1998)]{Cifuentes98assemblyto} C. Cifuentes, D. Simon and A. Fraboulet. | |
| Assembly to High-Level Language Translation. In Int. Conf. on Softw. Maint, pp. 228--237, IEEE-CS Press, 1998. | |
| \bibitem[Cooper et~al. (2006)]{links} E. Cooper, S. Lindley, P. Wadler and J. Yallop. | |
| Links: Web programming without tiers. In 5th International Symposium on Formal Methods for Components and Objects (FMCO), 2006. | |
| \bibitem[Donovan et~al. (2010)]{pnacl} A. Donovan, R. Muth, B. Chen and D. Sehr. | |
| PNaCl: Portable Native Client Executables. Available at | |
| \url{http://nativeclient.googlecode.com/svn/data/site/pnacl.pdf}. Retrieved April 2011. | |
| \bibitem[Flanagan (2006)]{js} D. Flanagan. | |
| JavaScript: The Definitive Guide. O'Reilly Media, 2006. | |
| \bibitem[Loitsch et~al. (2008)]{hop} F. Loitsch and M. Serrano. | |
| Hop Client-Side Compilation. In Trends in Functional Programming, vol. 8, pp. 141--158, Seton Hall University, Intellect Bristol, 2008. | |
| \bibitem[Petříček et~al (2011)]{afax} T. Petříček and D. Syme. | |
| AFAX: Rich client/server web applications in F\#. Draft. Available at \url{http://tomasp.net/academic/fswebtools.aspx}. Retrieved April 2011. | |
| \bibitem[Prabhakar (2007)]{gwt} C. Prabhakar. | |
| Google Web Toolkit: GWT Java Ajax Programming. Packt Publishing, 2007. | |
| \bibitem[Proebsting et~al. (1997)]{pro97} T.A. Proebsting and S. A. Watterson. | |
| Krakatoa: Decompilation in Java (Does Bytecode Reveal Source?) In Third USENIX Conference on Object-Oriented Technologies and Systems (COOTS), 1997. | |
| \bibitem[Yee et~al. (2009)]{nacl} B. Yee, D. Sehr, G. Dardyk, J. B. Chen, R. Muth, T. Ormandy, S. | |
| Okasaka, N. Narula, and N. Fullagar. Native Client: A Sandbox for | |
| Portable, Untrusted x86 Native Code. In IEEE Symposium on | |
| Security and Privacy, May 2009. | |
| \end{thebibliography} | |
| \end{document} | |
Xet Storage Details
- Size:
- 60.3 kB
- Xet hash:
- e45f8edc462946234cc7480119b88e4d5957329e645e7d8ecae150cf61ed74a0
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.