yanndupis.github.io - Encrypted Inference with Moose and HuggingFace

The other day, I was very inspired by the blog post Sentiment Analysis on Encrypted Data with Homomorphic Encryption co-written by Zama and HuggingFace. Zama has created an excellent encrypted machine learning library, Concrete-ML, based on fully homomorphic encryption (FHE). Concrete-ML enables data scientists to easily turn their machine learning models into an homomorphic equivalent in order to perform inference on encrypted data. In the blog post, the authors demonstrate how you can easily perform sentiment analysis on encrypted data with this library. As you can imagine, sometimes you will need to perform sentiment analysis on text containing sensitive information. With FHE, the data always remains encrypted during computation, which enables data scientists to provide a machine learning service to a user while maintaining data confidentiality.

The last several years, I was very fortunate to also work at the intersection of machine learning and cryptography. One of my collaborations with Morten Dahl, Jason Mancuso, Dragos Roturu and Lex Verona that I am very excited about is Moose. Moose is a distributed dataflow framework for encrypted machine learning and data processing. Moose’s cryptographic protocol is based on secure multi-party-computation (MPC). Depending on the scenario, FHE and MPC have different pros and cons. Currently MPC generally tends to be more performant, however the protocol requires 2 or 3 non-colluding parties (e.g a data owner and a data scientist) willing to perform computations together. If you want to learn about MPC in the context of machine learning, I highly recommend this very comprehensive blog post where Morten implements an MPC protocol from scratch for Deep Learning.

In the rest of this blog post, I will show how you can perform encrypted inference with Moose using the sentiment analysis use case from Zama and HuggingFace’s blog post.

Model Training

The sentiment analysis model will be trained on the Twitter US Airline Sentiment dataset from Kaggle. To train the model, we will use the code provided in the blog post. The sentiment model consists of a RoBERTa (Liu et al, 2019) transformer to extract features from the text, and an XGBoost model on top of it to classify the tweets into positive, negative, or neutral classes.

import os
import tqdm

import numpy as np
import pandas as pd
import torch

from sklearn.metrics import average_precision_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split
from xgboost.sklearn import XGBClassifier

from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer

Let’s first load the dataset.

if not os.path.isfile("local_datasets/twitter-airline-sentiment/Tweets.csv"):
    raise ValueError("Please launch the `download_data.sh` script to get datasets")

train = pd.read_csv("local_datasets/twitter-airline-sentiment/Tweets.csv", index_col=0)
text_X, y = train["text"], train["airline_sentiment"] 
y = y.replace(["negative", "neutral", "positive"], [0, 1, 2])

pos_ratio = y.value_counts()[2] / y.value_counts().sum()
neg_ratio = y.value_counts()[0] / y.value_counts().sum()
neutral_ratio = y.value_counts()[1] / y.value_counts().sum()

print(f"Proportion of positive examples: {round(pos_ratio * 100, 2)}%")
print(f"Proportion of negative examples: {round(neg_ratio * 100, 2)}%")
print(f"Proportion of neutral examples: {round(neutral_ratio * 100, 2)}%")

Proportion of positive examples: 16.14%
Proportion of negative examples: 62.69%
Proportion of neutral examples: 21.17%

As you can see the tweets are classified into three categories: positive, negative and neutral.

For the feature extractor, in the blog post, the authors use a RoBerta transformer pre-trained on Tweets.

device = "cuda:0" if torch.cuda.is_available() else "cpu"

# Load the tokenizer (converts text to tokens)
tokenizer = AutoTokenizer.from_pretrained(
    "cardiffnlp/twitter-roberta-base-sentiment-latest"
)

# Load the pre-trained model
transformer_model = AutoModelForSequenceClassification.from_pretrained(
    "cardiffnlp/twitter-roberta-base-sentiment-latest"
)

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

The function below will be responsible for extracting the features from the tweets.

# Function that transforms a list of texts to their representation
# learned by the transformer.
def text_to_tensor(
    list_text_X_train: list,
    transformer_model: AutoModelForSequenceClassification,
    tokenizer: AutoTokenizer,
    device: str,
) -> np.ndarray:
    # Tokenize each text in the list one by one
    tokenized_text_X_train_split = []
    for text_x_train in list_text_X_train:
        tokenized_text_X_train_split.append(
            tokenizer.encode(text_x_train, return_tensors="pt")
        )

    # Send the model to the device
    transformer_model = transformer_model.to(device)
    output_hidden_states_list = []

    for tokenized_x in tqdm.tqdm(tokenized_text_X_train_split):
        # Pass the tokens through the transformer model and get the hidden states
        # Only keep the last hidden layer state for now
        output_hidden_states = transformer_model(
            tokenized_x.to(device), output_hidden_states=True
        )[1][-1]
        # Average over the tokens axis to get a representation at the text level.
        output_hidden_states = output_hidden_states.mean(dim=1)
        output_hidden_states = output_hidden_states.detach().cpu().numpy()
        output_hidden_states_list.append(output_hidden_states)

    return np.concatenate(output_hidden_states_list, axis=0)

We are now ready to run the feature extractor on the training and testing set, then train the XGBoost model on the feature extractor’s output.

# Split in train test
text_X_train, text_X_test, y_train, y_test = train_test_split(
    text_X, y, test_size=0.1, random_state=42
)

# Let's vectorize the text using the transformer
list_text_X_train = text_X_train.tolist()
list_text_X_test = text_X_test.tolist()

X_train_transformer = text_to_tensor(
    list_text_X_train, transformer_model, tokenizer, device
)
X_test_transformer = text_to_tensor(
    list_text_X_test, transformer_model, tokenizer, device
)

# Let's build our model
model = XGBClassifier()

# A gridsearch to find the best parameters
parameters = {
    "max_depth": [1],
    "n_estimators": [10, 30, 50],
    "n_jobs": [-1],
}

# Now we have a representation for each tweet, we can train a model on these.
grid_search = GridSearchCV(model, parameters, cv=3, n_jobs=1, scoring="accuracy")
grid_search.fit(X_train_transformer, y_train)

# Check the accuracy of the best model
print(f"Best score: {grid_search.best_score_}")

# Check best hyperparameters
print(f"Best parameters: {grid_search.best_params_}")

# Extract best model
best_model = grid_search.best_estimator_

# Compute the metrics for each class

y_proba = best_model.predict_proba(X_test_transformer)

# Compute the accuracy
y_pred = np.argmax(y_proba, axis=1)
accuracy_transformer_xgboost = np.mean(y_pred == y_test)
print(f"Accuracy: {accuracy_transformer_xgboost:.4f}")

y_pred_positive = y_proba[:, 2]
y_pred_negative = y_proba[:, 0]
y_pred_neutral = y_proba[:, 1]

ap_positive_transformer_xgboost = average_precision_score(
    (y_test == 2), y_pred_positive
)
ap_negative_transformer_xgboost = average_precision_score(
    (y_test == 0), y_pred_negative
)
ap_neutral_transformer_xgboost = average_precision_score((y_test == 1), y_pred_neutral)

print(
    f"Average precision score for positive class: "
    f"{ap_positive_transformer_xgboost:.4f}"
)
print(
    f"Average precision score for negative class: "
    f"{ap_negative_transformer_xgboost:.4f}"
)
print(
    f"Average precision score for neutral class: "
    f"{ap_neutral_transformer_xgboost:.4f}"
)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
    - Avoid using `tokenizers` before the fork if possible
    - Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

100%|██████████| 13176/13176 [11:32<00:00, 19.02it/s]
100%|██████████| 1464/1464 [01:17<00:00, 18.78it/s]

Best score: 0.844869459623558
Best parameters: {'max_depth': 1, 'n_estimators': 50, 'n_jobs': -1}
Accuracy: 0.8559
Average precision score for positive class: 0.9015
Average precision score for negative class: 0.9675
Average precision score for neutral class: 0.7517

Excellent, we have a sentiment analysis model with an 85% accuracy. We can run the model on a sample tweet.

tested_tweet = ["AirFrance is awesome, almost as much as Zama!"]
X_tested_tweet = text_to_tensor(tested_tweet, transformer_model, tokenizer, device)
np.save("data/x_tested_tweet.npy", X_tested_tweet)
clear_proba = best_model.predict_proba(X_tested_tweet)
print(f"Proba prediction in plaintext {clear_output}")

100%|██████████| 1/1 [00:00<00:00, 10.14it/s]

Clear_proba [[0.02582786 0.02599407 0.94817805]]

Encrypted Inference with Moose

Now that we have a model trained, we are ready to serve encrypted inference with Moose. For simplicity, we will start by locally prototyping this computation happening between the different parties using the pm.LocalMooseRuntime.

To serve encrypted inference, we will have to perform the following steps: - Convert the trained model to ONNX format. - Convert the model from ONNX to a Moose computation. - Run encrypted inference by evaluating the Moose computation.

Let’s get started!

from onnxmltools.convert import convert_xgboost
from skl2onnx.common import data_types as onnx_dtypes

import pymoose as pm

Convert to ONNX

We can convert the XGBoost model into an ONNX proto using the convert_xgboos method from the onnxmltools.

n_features = X_test_transformer[0].shape[0]
initial_type = ("float_input", onnx_dtypes.FloatTensorType([None, n_features]))
onnx_proto = convert_xgboost(best_model, initial_types=[initial_type])

Convert ONNX to Moose Predictor

PyMoose provides several predictor classes to translate an ONNX model into a PyMoose DSL program. Because the trained model is an XGBoost model, we can use the class tree_ensemble.TreeEnsembleClassifier. The class has a method from_onnx which will parse the ONNX file. The returned object is callable. When called, it will compute the forward pass of the XGBoost model.

predictor = pm.predictors.TreeEnsembleClassifier.from_onnx(onnx_proto)

Define Moose Computation

To express this computation, Moose offers a Python DSL (internally referred to as the eDSL, i.e. “embedded” DSL). As you will notice, the syntax is very similar to the scientific computation library Numpy.

The main difference is the notion of placements. There are two types of placements: host placement and replicated placement. With Moose, every operation under a host placement context is computed on plaintext values (not encrypted). Every operation under a replicated placement is performed on secret shared values (encrypted).

We will compute the inference between three different players, each of them representing a host placement: a data owner, a data scientist, and a third party. The three players are grouped under the replicated placement to perform the encrypted computation. Currently, the MPC protocol of Moose expects three parties, but other MPC schemes can expect two parties. practice, the third party could be a secure enclave that the data scientist and data owner can’t access.

When we have instantiated the pm.predictors.TreeEnsembleClassifier class, under the hood three host placements have been instiated: alice, bob and carole. For our use case, alice will represent the data owner, bob the model owner and carole the third party.

The Moose computation below performs the following steps:

Loads the tweet (after running the feature extractor) in plaintext from alice’s (data owner) storage.
Secret share (encrypts) the data.
Computes XGBoost inference on secret shared data.
Reveals the prediction only to alice (the data owner) and saves it into its storage.

@pm.computation
def moose_predictor_computation():
    # Alice (data owner) load their data in plaintext
    # Then the data gets converted from float to fixed-point
    with predictor.alice:
        x = pm.load("x", dtype=pm.float64)
        x_fixed = pm.cast(x, dtype=pm.predictors.predictor_utils.DEFAULT_FIXED_DTYPE)
    # The data gets secret shared when moving from host placement
    # to replicated placement.
    # Then compute the logistic regression on secret shared data
    with predictor.replicated:
        y_pred = predictor(x_fixed, pm.predictors.predictor_utils.DEFAULT_FIXED_DTYPE)

    # The predictions gets revealed only to Alice (the data owner)
    # Convert the data from fixed-point to floats and save the data in the storage
    with predictor.alice:
        y_pred = pm.cast(y_pred, dtype=pm.float64)
        y_pred = pm.save("y_pred", y_pred)
    return y_pred

Evaluate the computation

For simplicity, we will use pm.LocalMooseRuntime to locally simulate this computation running across hosts. To do so, we need to provide: the Moose computation, the list of host identities to simulate, and a mapping of the data stored by each simulated host.

Since the data owner is represented by alice, we will place the patients’ data in alice’s storage.

Once you have instantiated the pm.LocalMooseRuntime with the identities and additional storage mapping and the runtime set as default, you can simply call the Moose computation to evaluate it.

executive_storage = {
    "alice": {"x": X_tested_tweet.astype(np.float64)},
    "bob": {},
    "carole": {},
}
identities = [plc.name for plc in predictor.host_placements]

runtime = pm.LocalMooseRuntime(identities, storage_mapping=executive_storage)
runtime.set_default()

_ = moose_predictor_computation()

Once the computation is done, we can extract the results. The predictions have been stored in alice’s storage. We can extract the value from the storage with read_value_from_storage.

y_pred = runtime.read_value_from_storage("alice", "y_pred")

print(f"Plaintext Prediction: {y_pred}")
print(f"Moose Prediction: {y_pred}")

Plaintext Prediction: [[0.02581358 0.02598119 0.94782831]]
Moose Prediction: [[0.02581358 0.02598119 0.94782831]]

Excellent! As you can see Moose returns the same prediction as XGBoost. However, with Moose, we were able to compute the inference on the data owner’s data while keeping the data encrypted during the entire process!

If you want to learn about how to run Moose over the network with gRPC, you can check out this tutorial.

Conclusion

I hope that thanks to this tutorial you have a better idea of how you can perform encrypted inference with Moose. Thanks to libraries like Concrete-ML and Moose, we’re entering an exciting time where data scientists and machine learning engineers can maintain the confidentiality of sensitive datasets using encryption, without having to become experts in cryptography.

Thank you to the Moose team for this amazing contribution and reviewing this blog post.