DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)
This repository contains a quantized, production-ready version of the distilroberta-sms-spam-detector model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.
This optimization resulted in a ~4x reduction in file size and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.
This is the model intended for direct deployment in mobile applications.
The original, full-precision (FP32) model can be found at the main model repository here
Model Description
- Model type: Quantized ONNX graph of a fine-tuned distilroberta-basemodel.
- Intended Use: On-device spam classification for mobile applications.
- Language(s): English
- License: MIT
- File Size: ~79 MB
This repository also contains a version.txt file for use with Over-the-Air (OTA) update systems.
How to Use (with ONNX Runtime)
This model is designed to be used with onnxruntime.
import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special
REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"
model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)
# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
session = ort.InferenceSession(model_path)
# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)
# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits
# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)
labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...
Quantization Procedure
The original FP32 PyTorch model was first exported to ONNX format using the optimum library. Subsequently, dynamic quantization was applied using the onnxruntime.quantization toolkit to convert the model's weights to INT8.
- Library: onnxruntime
- Method: quantize_dynamic
- Weight Type: QuantType.QInt8
Performance & Trade-offs
The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.
File Size:
- Original (FP32): ~313 MB
- Quantized (INT8): ~79 MB (3.96x smaller)
Accuracy Comparison:
| Model | Class | Precision | Recall | F1-Score | 
|---|---|---|---|---|
| Original (FP32) | HAM | 1.00 | 1.00 | 1.00 | 
| SPAM | 1.00 | 0.97 | 0.99 | |
| Overall | 1.00 | 1.00 | 1.00 | |
| Quantized (INT8) | HAM | 0.99 | 1.00 | 1.00 | 
| SPAM | 1.00 | 0.96 | 0.98 | |
| Overall | 0.99 | 0.99 | 0.99 | 
As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.
- Downloads last month
- 27
Dataset used to train SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized
Evaluation results
- F1 (Weighted, Quantized) on ucirvine/sms_spam (test split)test set self-reported0.990
- Accuracy (Quantized) on ucirvine/sms_spam (test split)test set self-reported0.990