DistilRoBERTa for SMS Spam Detection (INT8 ONNX, Quantized)

This repository contains a quantized, production-ready version of the distilroberta-sms-spam-detector model. The model has been converted to the ONNX format and its weights have been quantized to 8-bit integers (INT8) for optimal performance on edge devices like mobile phones.

This optimization resulted in a ~4x reduction in file size and significant improvements in inference speed, with only a marginal and acceptable decrease in accuracy.

This is the model intended for direct deployment in mobile applications.

The original, full-precision (FP32) model can be found at the main model repository here

Model Description

  • Model type: Quantized ONNX graph of a fine-tuned distilroberta-base model.
  • Intended Use: On-device spam classification for mobile applications.
  • Language(s): English
  • License: MIT
  • File Size: ~79 MB

This repository also contains a version.txt file for use with Over-the-Air (OTA) update systems.

How to Use (with ONNX Runtime)

This model is designed to be used with onnxruntime.

import onnxruntime as ort
from transformers import AutoTokenizer
import numpy as np
import scipy.special

REPO_ID = "SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized"
ONNX_MODEL_NAME = "model.quant.onnx"

model_path = hf_hub_download(repo_id=REPO_ID, filename=ONNX_MODEL_NAME)

# Load the tokenizer from the same repository
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)

session = ort.InferenceSession(model_path)

# Prepare text
text = "Congratulations! You've won a $1000 gift card. Click now!"
inputs = tokenizer(text, return_tensors="np", padding="max_length", truncation=True)

# Run inference
outputs = session.run(None, dict(inputs))
scores = outputs[0][0] # Get the raw logits

# Convert logits to probabilities
probabilities = scipy.special.softmax(scores)
prediction = np.argmax(probabilities)

labels = ["HAM", "SPAM"]
print(f"Prediction: {labels[prediction]}, Confidence: {probabilities[prediction]:.4f}")
# >> Prediction: SPAM, Confidence: 0.99...

Quantization Procedure

The original FP32 PyTorch model was first exported to ONNX format using the optimum library. Subsequently, dynamic quantization was applied using the onnxruntime.quantization toolkit to convert the model's weights to INT8.

  • Library: onnxruntime
  • Method: quantize_dynamic
  • Weight Type: QuantType.QInt8

Performance & Trade-offs

The primary goal of quantization is to trade a small amount of precision for a large gain in efficiency. The evaluation below, conducted on the same 558-example test set, demonstrates the success of this trade-off.

File Size:

  • Original (FP32): ~313 MB
  • Quantized (INT8): ~79 MB (3.96x smaller)

Accuracy Comparison:

Model Class Precision Recall F1-Score
Original (FP32) HAM 1.00 1.00 1.00
SPAM 1.00 0.97 0.99
Overall 1.00 1.00 1.00
Quantized (INT8) HAM 0.99 1.00 1.00
SPAM 1.00 0.96 0.98
Overall 0.99 0.99 0.99

As shown, the quantized model maintains perfect precision for SPAM detection and near-perfect precision for HAM, making it extremely reliable for on-device deployment.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train SharpWoofer/distilroberta-sms-spam-detector-onnx-quantized

Evaluation results