Introduction to Privacy-Preserving Smart Contracts
The advent of smart contracts have opened up an entirely new avenue of applications. Ethereum, the leading smart contract platform, has facilitated the creation of compelling decentralized applications such as decentralized exchanges, prediction markets, autonomous organizations, and Cryptokitties. However, the Ethereum blockchain is completely transparent and data from any of these applications is accessible by anyone. This closes the door to an entire range of applications that rely on privacy. For example, a voting application should not reveal anything about the votes while the voting process is under way, and only reveal the winner once the process has closed. Voting is an extremely compelling use case for trustless, private smart-contracts — you are able to trust that the votes have been tallied correctly and no information about the vote was leaked until it was complete. Because of this, multiple teams have used different approaches to solve the privacy problem — this post analyzes the different approaches and their respective tradeoffs.
The ideal privacy-preserving smart contract should have the ability to hide inputs from everyone except the person supplying it, perform some arbitrary computation without revealing any state, and returning outputs following the program’s specification (broadcast single public output/return output to specific users/no visible output). This poses a few challenges:
- How do we hide inputs to a smart contract?
- How do we keep state hidden from everyone, yet ensure program correctness?
- How do we do arbitrary computations on hidden (or encrypted) data?
The most ambitious way to achieve the following is to use a cryptographic scheme called Homomorphic encryption. This type of encryption allows users to encrypt their private data, send it to a cloud computer, let the computer perform the computations on the encrypted data, return it back to the users, and only the users will be able to decrypt these outputs. However, homomorphic encryption is really impractical today and can at best, be used to perform simple arithmetic on integers. Hence, teams are forced to use different methods.
1. Trusted Execution Environments (TEE)
A trusted execution environment is an isolated area on the main processor of a device that is separate from the main operating system. It ensures that data is stored, processed and protected in a trusted environment. The most popular TEE to date is the Intel SGX, which is included out of the box in many Intel chips today.
These TEEs allow programs to run in secure enclaves, which are like black boxes where the state of the program is completely hidden and inaccessible by anyone. This is interesting because one can generate a private/public key pair within an enclave, encrypt a file with that public key, making the file onlyaccessible from within the enclave. Naturally, this sets the stage for private computation because users are able to encrypt their files, send it to the TEE, and let the TEE perform the computation without the inputs ever being exposed. Because of the trustless nature of the execution of these programs, all smart contract execution can be brought off-chain and can also bring scaling benefits to blockchains. The most notable projects which are using this approach are Enigma and Oasis Labs.
This general approach to private computing is the most scaleable today. Many computers have Intel SGXs enabled and there is no significant performance slowdown by using a secure enclave to run a program. However, this comes with significant trade-offs that may render TEEs completely unviable for highly sensitive data. Firstly, the majority of TEEs in the market today are Intel SGX. This is a centralization risk because it is unclear what the precise architecture of the chips are, and whether or not Intel may have reduced the security of the SGX to improve the performance of their chip overall due to the niche market for SGXs today. Secondly, to prove that a computer has an Intel SGX, the computer has to contact Intel’s remote attestation service. Again, this is a centralized cloud service that keeps track of all the chips that have been manufactured which contain Intel SGX technology. A hacker who is able to breach that cloud service can insert fake IDs into the cloud service which “proves” their computers contain Intel SGX even though they do not. If a smart contract runs on one of these fake TEEs, the operator will be able to see all the information.
These are not just theoretical attacks on the Intel SGX. To date, there have been two big vulnerabilities which undermine the security of Intel SGXs — Meltdown and Spectre, and Foreshadow. New TEE designs have been devised, but trusted hardware designs will always be exposed to the risk of being attacked. Instead of hardware guarantees, other cryptographic approaches are backed by math and computer science which is likely a more sound foundation.
2. Secure Multi-party Computation (sMPC)
The second approach to privacy-preserving smart contracts is using a cryptographic technique called secure multi-party computation. In sMPC, if we want to perform a secret computation, we can split data into multiple pieces in a very specific way and then have individuals perform arithmetic operations on those pieces without revealing anything about the original data. Those pieces can then be recombined to produce the final result.
For example, let’s say we want to calculate A + B without revealing what A and B is. We can split A into 3 parts ([A]₁ , [A]₂ , [A]₃) and B into 3 parts ([B]₁ , [B]₂ , [B]₃) such that [A]ₓ + [B]ₓ = [C]ₓ . We distribute the shares of the data to 3 different people, and on their own the shares are completely meaningless. After calculating their respective [C]ₓ values, the 3 people can come together and combine their [C]ₓ values, producing C. Through this process, the 3 people were able to compute C together without any of them ever knowing what A and B were. Multiplications are more difficult but can be achieved with some tricks.
Theoretically, once we can do additions and multiplication over a sMPC protocol, we can achieve any arbitrary computation. This is a very exciting way of doing privacy-preserving computations, because the only way to expose the underlying data is for every single node to collude. If there is even one node who is honest, all the other nodes will not be able to make any useful sense of the secret shared data, which makes it extremely resistant against adversarial behavior. However, the tradeoff here is that sMPC protocols are slow and expensive because of communication cost between the nodes. All the nodes must also execute the program correctly using the secret shared data that they were given. If a single node replaces their output [C]ₓ with a random value, the nodes’ combined C value will make no sense as well. By doing this, an attacker would have wasted the computational effort of all the other participants.
The two projects which are approaching privacy-preserving smart contracts with sMPC are Keep Network and Enigma. The original Enigma white paper came up with an design for a decentralized private computation platform using sMPC, but they have since changed their roadmap to use TEEs instead, which speaks to the practicalities of using TEEs instead sMPC in the short term. However, the future is bright for sMPC and lots of optimizations are being made to make sMPC more practical. Enigma plans to reintroduce sMPC into their current protocol, but that is still at least a year out.
3. Zero-knowledge Proofs (ZKP)
The final approach is by using a zero-knowledge proof — a cryptographic technique used to prove the validity of any statement without revealing any other information. This technique is useful in private computations because it allows computing nodes to create verifiable proofs that they performed the computation honestly with the correct inputs, and hence the output must be correct.
The only project which is working on this is Origo Network. In this protocol, all computing nodes are off-chain and must submit a zero-knowledge proof that they performed the program correctly. This allows the public to verify that computations were done correctly without any private data ever being on-chain. However, my largest criticism of the project is that it only minimizes the amount of private data being put on-chain; computing nodes can still see all the inputs and outputs. Furthermore, creating a zero-knowledge proof requires a trusted setup to be done beforehand. Performing a trusted setup for each different smart-contract is super expensive, and there are limits as to how complex a program can be able to generate a zkSNARK for. This is where cutting-edge technology like STARKs come into play — but these are still years out from any meaningful use in production.
The Golden Standard
Different projects use different techniques to achieve privacy-preserving smart contracts, each approach with its pros and cons. However, I do believe that these 3 techniques can be combined in a coherent way in the future.
One of the largest drawbacks of sMPC is that the protocol cannot tolerate adversarial behavior. If one of the nodes uses the wrong inputs or does a wrong computation, the output becomes meaningless. Hence, we can envision a new protocol where each node in an sMPC protocol is required to also submit a zero-knowledge proof, proving that they have performed the computation honestly. We can also run these sMPC protocols in TEEs, such that if in the worse-case scenario all the computing nodes collude, they will still be unable to uncover the underlying private data.
However, each one of these approaches is a colossal effort in itself and requires an entire company to build. The zero-knowledge proof and sMPC techniques are also still very nascent, and lots of cutting-edge research is being produced by top cryptographers to make these protocols more efficient and usable in real-life applications.
In the future, we can envision that even large companies start to use some of these techniques. Facebook, which gathers data about us and uses machine-learning algorithms to serve us the most effective ads, can use privacy-preserving techniques such that they can still run their algorithms on our encrypted personal data. This will also enable an entire range of industries such as healthcare or banking to share encrypted data across each other, vastly improving the quality of products like insurance plans, loans, and even personal health.
Thanks to Haseeb Qureshi, Ivan Bogatyy, Dani Grant, Howard Wu, Guy Zyskind, and others for feedback and conversations which led to this post.