Why and how we wrote a compiler in Rust (blog post series 1/X): the context
This blog post is the first one of a series, it will give you more context to understand why we decided to write our own compiler, as well as where and how it is used in our use case. At Cosmian we are continuously striving to build the most comprehensive and reliable technological stack based on state-of-the-art cryptographic techniques. We are using them to open new horizons for companies, new possibilities to run computation together without making compromises on data security. That’s why we created CipherCompute, one of our products to run collaborative computation over sensitive data between different entities/companies. To provide this ability we made the known cryptographic technique named Secure Multi Party Computation (sMPC) easy to use for data scientists.
Secure Multi Party Computation
Let’s start with a small introduction to sMPC. Secure Multi Party Computation (sMPC) is a cryptographic protocol for performing computation between participants who do not or can not (legally) trust each other with their raw data. The goal of sMPC is to achieve the same result as if the computation was performed by a central third party but in a distributed manner between the participants without any of the participants having to reveal their input data in clear text to any other participant.
There are various forms of sMPC with different security models; the most useful is the one achieving passive security with up to n-1 corrupt participants. In this model (which is the one provided by Cipher Compute ) participating attackers follow the protocol but try to learn more than they are entitled to.
Multiple recent cryptographic techniques have been developed to achieve sMPC. The main 4 are based on Oblivious Transfers and Garbled Circuits to achieve 2-party computation and Shamir Secret Sharing (SSS) or Fully Homomorphic Encryption (FHE) for multi party computation. Using SSS, participants can share secrets using a property of polynomials that n points are enough to uniquely determine a polynomial of degree n-1 (using Lagrange interpolation). To share a secret between n participants, one of them will generate a random polynomial of degree n-1 where f(0) is the secret and share n-1 points (x, f(x)) to the other participants. Recombining all the shares will allow them to recover the polynomial using interpolation and hence recover the secret by computing f(0).
We are working in collaboration with COSIC from KU Leuven and Nigel Smart (a member of the scientific advisor board of Cosmian) who created Scale (Secure Computation Algorithms from LEuven). It’s a virtual machine who takes bytecode to run sMPC computations. Under the hood, Scale is developed in C++ and is in charge of making all the cryptographic work and exchanges between players during the computation. If you need more information about how it’s done under the hood and how the Scale VM works I suggest to browse this mirrored git repository on GitHub including the official documentation. When we tried Scale at this time we spotted a big drawback. To be able to write a sMPC program and to execute it on the Scale VM we needed to learn a new language which was neither widespread nor standard. This language is called Scale Mamba, it’s like essentially a code generator written in python, with some convenience libraries on top that make it mostly a DSL on top of python. It’s quite convenient but our goal at Cosmian is to bring advanced cryptographic techniques to the enterprise world with standards coming from software engineering in order to open new possibilities and allow for a wider audience. Learning a new specific programming language to write a simple program was a kind of entry barrier we don’t want to have. That’s one of the reasons we decided to write a Rust compiler to compile Rust programs into bytecode understandable by the Scale VM. Why Rust? Because at Cosmian we love Rust, our technical stack is already written in Rust and we are pretty happy and confident thanks to this language. It’s also a well-known language and could be used in another context than only Scale. As Rust is a pretty low level language we were able to develop our own memory allocator and our own standard library. So the main goal to develop this compiler is to let developers write their own sMPC algorithm entirely written in Rust.
At Cosmian we decided to go further in order to bring secure multi party computation in companies by adding a step in our compilation flow. Indeed instead of compiling Rust directly to Scale assembly and then bytecode we compile Rust to WebAssembly (WASM) then use the WASM bytecode and compile it into Scale assembly. Why? Because we wanted to benefit from all the existing WASM implementations in other languages, for example today many languages are able to compile to WebAssembly bytecode. Then, thanks to this we will be able later to add a new language than just Rust to write sMPC programs. We could think about Golang for example and then we theoretically will need to just provide a standard library for Golang to run on Scale virtual machine. Our compiler is called WASCA because it takes WASM and compiles it to Scale Assembly. So the step to compile Rust to WASM is not our job at all, it’s already existing and has good support thanks to the LLVM backend compiler. Here is the full steps of our compilation pipeline (some steps about optimisations are intentionally omitted):
As explained before, with the ability to write sMPC programs in Rust we provide the WASCA compiler and the standard library for Rust. Indeed during this blog post series we will see that it’s very similar to an embedded environment without a standard library (in a no_std environment) because we execute our code on a specific VM. The Scale standard library provides helpers for developers to iterate over input data, send data to a specific player/participant but also is in charge of handling heap allocations. Memory allocation will be a big part of the next blog post in this series because we have a not so common memory mapping in the Scale VM. Then in our standard library we redefined some data structures such as slices and arrays to be able to allocate memory on the heap as we wish. Here is a little illustration to recap what are responsibilities of the Rust core lib and the Scale standard library in a secure multi party computing program.
To be continued…
Here was the first blog post of this series just to let you know the context, what we are building and why we are building it in that way. The next post will be more focused on technical details, especially how we manage memory allocations in the WASCA compiler. We’ll see that we have encrypted memory zones in addition to the classical heap and the stack. We’ll explore together what we have already done and what we plan for the future. If you want to be notified when it’s out, feel free to follow Cosmian’s twitter account.