DistilBERT Toxic Comment Classifier 🛡️
This is a DistilBERT-based binary classifier fine-tuned to detect toxic vs. non-toxic comments using the Cleaned Toxic Comments dataset.
Model Performance
- Accuracy: ~94%
- Class metrics:
- Non-toxic (0): Precision 0.96 | Recall 0.95 | F1 0.95
- Toxic (1): Precision 0.90 | Recall 0.91 | F1 0.91
Dataset
- Name: Cleaned Toxic Comments (FizzBuzz @ Kaggle)
- Language: English
- Classes:
0
= Non-toxic1
= Toxic
- Balancing: To reduce class imbalance, undersampling was applied to the majority (non-toxic) class.
Training Details
Hyperparameter | Value |
---|---|
Base model | distilbert-base-uncased |
Epochs | 3 |
Batch size | 32 |
Learning rate | 2e-5 |
Loss function | CrossEntropyLoss (with undersampling) |
- Optimizer: AdamW
- Framework: Hugging Face Transformers
- Hardware: Google Colab GPU
How to Use
Load with the Hugging Face pipeline
:
from transformers import pipeline
classifier = pipeline("text-classification", model="YamenRM/distilbert-toxic-comments")
print(classifier("I hate everyone, you're the worst!"))
# [{'label': 'toxic', 'score': 0.97}]
Considerations
Because of undersampling of non-toxic comments, the model might be less robust on very large, unbalanced datasets in real-world settings.
If Toxic content is very rare in your target domain, the model might produce more false positives or negatives than expected.
This model is trained only in English — performance may drop for non-English or mixed-language texts.
Acknowledgements & License
Thanks to the Kaggle community for sharing the Cleaned Toxic Comments dataset.
Built using Hugging Face’s transformers & datasets libraries.
License: [Apache-2.0]
Contact & Feedback
If you find issues, want improvements (e.g. support for other languages, finer toxicity categories), or want to collaborate, feel free to open an issue or contact me at yamenrafat132@gmail.com.
- Downloads last month
- 32
Evaluation results
- accuracy on Cleaned Toxic Comments (Kaggle)test set self-reported0.940
- f1 on Cleaned Toxic Comments (Kaggle)test set self-reported0.930
- precision on Cleaned Toxic Comments (Kaggle)test set self-reported0.930
- recall on Cleaned Toxic Comments (Kaggle)test set self-reported0.930