ML Applications

Open Source

Crypten: A new research tool for secure machine learning with PyTorch

October 10, 2019

Despite the AI community’s tremendous recent progress in advancing the applications of machine learning, there currently exist only very limited tools to build ML systems capable of working with encrypted data. This constrains the use of ML in domains that must be encrypted for security, such as work that involves sensitive medical information or data that people would prefer to encrypt simply for added privacy. Building secure ML systems to address these use cases today is difficult or even impossible because powerful, easy-to-use frameworks don’t work effectively with encrypted data.

To address this need and accelerate progress in this area, Facebook AI researchers have built and are now open-sourcing CrypTen, a new, easy-to-use software framework built on PyTorch to facilitate research in secure and privacy-preserving machine learning.

CrypTen enables ML researchers, who typically aren’t cryptography experts, to easily experiment with ML models using secure computing techniques. By leveraging and integrating with PyTorch, CrypTen lowers the barrier for ML researchers and developers who are already familiar with its API.

AI researchers can use CrypTen to train a PyTorch model such as ResNet using encrypted data while maintaining the familiar look and feel of Torch tensors. For example:

    x = torch.tensor([1, 2, 3]
    y = torch.tensor([4, 5, 6])
    z = x + y

Can be modified as follows:

    x_enc = crypten.cryptensor([1, 2, 3])
    y_enc = crypten.cryptensor([4, 5, 6])
    z_enc = x_enc + y_enc

CypTen offers a bridge between the PyTorch platform that is already familiar to thousands of ML researchers and the long history of academic research on algorithms and systems that work effectively with encrypted data. There is a long road ahead for the AI research community exploring this field, since secure computing techniques have various trade-offs, such as higher compute and communication requirements or a restricted space of functions. But we believe CrypTen will help researchers in academia and private industry advance toward a future where secure computing techniques are an integral part of ML frameworks themselves, so researchers and engineers can seamlessly transition to privacy-preserving ML whenever necessary.

We are making CrypTen available along with other new research and development tools being showcased at the PyTorch Developer’s Conference. CrypTen is available here, and additional technical details are available at the CrypTen website.

The fundamental components of CrypTen

CrypTen currently implements a cryptographic method called secure multiparty computation (MPC), and we expect to add support for homomorphic encryption and secure enclaves in future releases. MPC is different from RSA, AES and other cryptographic protocols that are in wide use today in that it allows computation on encrypted data while preserving privacy. We implement MPC in the “honest but curious” model (that assumes the absence of malicious and adversarial agents) that is used frequently in cryptographic research, but additional safeguards must be added before CrypTen is ready to be used in production settings.

This graphic provides a high-level overview of CrypTen, in which both the data and the model are encrypted using MPC.

Compared with prior implementations of secure computation protocols, CrypTen offers three main benefits to ML researcher.

  1. It is machine learning first. The framework presents the protocols via a CrypTensor object that looks and feels exactly like a PyTorch tensor. This allows the user to use automatic differentiation and neural network modules akin to those in PyTorch. This helps make secure protocols accessible to anyone who has worked in PyTorch.

  2. CrypTen is library-based. Unlike other software in this space, we are not implementing a compiler but instead implement a tensor library just as PyTorch does. This makes it easier for people to debug, experiment, and explore ML models.

  3. The framework is built with real-world challenges in mind. CrypTen does not scale back or oversimplify the implementation of the secure protocols. Parties run in separate processes that communicate with one another. The parties can run on separate machines as well. While CrypTen is not currently production-ready, it can provide realistic insights into the compute and communication requirements of doing machine learning using secure protocols, thereby facilitating high-quality research.

The following code snippet shows an example of inference on an encrypted model with encrypted data.

PyTorch main snippet:

      data = torch.load(DATA_PATH)
      model = torch.load(MODEL_PATH)

      model.eval()
      output = model(data)

CrypTen main snippet:

      data_enc = crypten.load(DATA_PATH, src=1)
      model = crypten.load(PATH, dummy_model=ModelClass(), src=0)
    
	    dummy_input = torch.empty(data_enc.size()) 
      private_model = crypten.nn.from_pytorch(model, dummy_input).encrypt(src=0)
    
      private_model.eval()
      output_enc = private_model(data_enc)

At launch, CrypTen covers models starting from simple linear models to ResNets, and we are working toward reaching parity with the full universe of PyTorch models.

This example shows how MPC encrypts information by dividing data between more than one party, each of which can perform calculations on its share (5 and 7, respectively) but are not able to read the original data (12). Each party then computes (“multiply by 3”). When the outputs are combined, the result (36) is identical to the result of performing the calculation on the data directly. Since Party A and Party B do not know the end result (36), they cannot deduce the original data point (12).

Something Went Wrong
We're having trouble playing this video.

This graphic shows how MPC can be used to adjust a photo without either of the parties having access to the content of the image.

The need for secure computing tools for machine learning

Machine learning systems today can often be run securely on-device — for example, to transcribe speech to text or translate one language into another. But before these models are deployed, they are typically trained on publicly available data, such as Wikipedia entries, or datasets that have been licensed for use, such as ImageNet. In many cases, however, either the data needed for training is too sensitive to share or there are security, privacy, policy, or legal roadblocks.

For example, medical researchers often face a difficult challenge when performing population studies on genetic data, because such data is very sensitive and cannot be easily shared between research institutions. Similarly, studying gender pay gap across companies is difficult because of privacy concerns with sharing salary data. Secure computation techniques like MPC provide a potential solution to these problems by allowing parties to encrypt their data in a secure way while still permitting ML computations on the encrypted data in aggregate.

Even though MPC enables such use cases, it has been challenging to do ML research using MPC because of the absence of familiar ML frameworks that abstract away the complexity of the technology. CrypTen addresses this need by exposing an abstraction that is familiar to ML researchers.

Example uses and applications

CrypTen can load a pretrained PyTorch model, giving users the flexibility to load an existing model to do inference using encrypted data. Users can also train an encrypted model using the familiar PyTorch API. The framework supports a rapidly increasing subset of PyTorch tensor operators that users can use to build models like ResNet.

Something Went Wrong
We're having trouble playing this video.

This graphic shows how CrypTen allows researchers to easily use PyTorch models by replacing the standard PyTorch tensor with an encrypted tensor. The dotted lines show potential additions to CrypTen that we expect to add at a future date.

CrypTen can perform MPC with any number of parties and can also convert between arithmetic and binary (XOR) sharing, thus enabling common nonlinearities like ReLU without any approximations. This also enables operations like max, which is necessary for contextual banding models.

Secure computing techniques come with their own challenges of reduced performance due to the increase in computation and communication. CrypTen works to address this by moving functionality into PyTorch core as necessary, for example, by adding support for data types like int64 in PyTorch itself. Deep learning systems must be able to operate efficiently at scale, so we will continue to make performance improvements in future releases.

These capabilities have allowed Facebook researchers to build privacy-preserving contextual bandit models using CrypTen. Contextual bandit models are used in many recommendation systems where parties select an arm given certain context about their environment and then receive a reward for their action. The parties use the reward as a learning signal and aim to maximize the total reward, thereby solving recommendation or ranking problems.

In a privacy-preserving variant of this scenario, each party has context it is unwilling to share with other parties. CrypTen has allowed us to train a model while respecting this requirement. Each party encrypts its context and uses an encrypted model to select an arm. The selected arm is revealed only to the party that pulls the arm. Another party then receives a reward and this reward signal is encrypted and used to learn an encrypted model, thus closing the learning loop.

Contextual bandit models have several characteristics that are challenging for many MPC-based systems. There can be any number of parties, but many MPC protocols are specialized for exactly two parties. These models involve computation that is challenging for MPC, such as division, exponentiation, and finding the maximum. CrypTen’s ability to work with many parties, its efficient implementations of operators, its ability to go back and forth between additive and XOR sharing that enable operators like max, and its familiar PyTorch API enabled us to quickly train such models.

Accelerating research on secure computing frameworks for ML

Machine learning has made tremendous progress in the past decade due in part to the availability of data and compute and the development of easy-to-use frameworks. We hope that by developing tools like CrypTen and lowering the bar for entry for other researchers, we can help foster and accelerate research in developing new secure computing techniques for machine learning.

Written By

David Gunning

Technical Program Manager

Awni Hannun

Research Scientist

Brian Knott

Research Engineer

Laurens van der Maaten

Research Scientist

Vinicius Reis

Research Engineer

Shubho Sengupta

Software Engineer

Shobha Venkataraman

Software engineer

Xing Zhou

AI Research Engineering