DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)

This repository contains a quantized, production-ready version of the distilroberta-sms-spam-detector model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.

This optimization resulted in a ~4x reduction in file size and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.

This is the model intended for direct deployment in mobile applications.

The original, full-precision (FP32) model can be found at the main model repository here

Model Description

Model type: Quantized ONNX graph of a fine-tuned distilroberta-base model.
Intended Use: On-device spam classification for mobile applications.
Language(s): English
License: MIT
File Size: ~79 MB

This repository also contains a version.txt file for use with Over-the-Air (OTA) update systems.

How to Use (with ONNX Runtime)

This model is designed to be used with onnxruntime.

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special

REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"

model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)

# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

session = ort.InferenceSession(model_path)

# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)

# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits

# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)

labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...

Quantization Procedure

The original FP32 PyTorch model was first exported to ONNX format using the optimum library. Subsequently, dynamic quantization was applied using the onnxruntime.quantization toolkit to convert the model's weights to INT8.

Library: onnxruntime
Method: quantize_dynamic
Weight Type: QuantType.QInt8

Performance & Trade-offs

The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.

File Size:

Original (FP32): ~313 MB
Quantized (INT8): ~79 MB (3.96x smaller)

Accuracy Comparison:

Model	Class	Precision	Recall	F1-Score
Original (FP32)	HAM	1.00	1.00	1.00
	SPAM	1.00	0.97	0.99
	Overall	1.00	1.00	1.00
Quantized (INT8)	HAM	0.99	1.00	1.00
	SPAM	1.00	0.96	0.98
	Overall	0.99	0.99	0.99

As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.

Downloads last month: 27

Dataset used to train SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized

Evaluation results

F1 (Weighted, Quantized) on ucirvine/sms_spam (test split)
test set self-reported

0.990
Accuracy (Quantized) on ucirvine/sms_spam (test split)
test set self-reported

0.990

View on Papers With Code