DistilBERT IMDb Sentiment Classifier
A fine-tuned DistilBERT model for binary sentiment analysis on movie reviews.
Model Description
This model was fine-tuned from distilbert-base-uncased on 5,000 IMDb movie reviews for 3 epochs. It classifies text as POSITIVE or NEGATIVE sentiment.
Training Data
- Source: IMDb Large Movie Review Dataset (stored in SQLite, queried with pandas)
- Train: 5,000 samples | Validation: 1,000 samples
- Label balance: approximately 50% positive, 50% negative
Evaluation Results
| Metric | Score |
|---|---|
| Accuracy | 88.4% |
| F1 Score | 0.893 |
Baseline Comparison
| Model | Accuracy |
|---|---|
| TF-IDF + Logistic Regression | 86.4% |
| DistilBERT (this model) | 92.3% |
Intended Use
Product review analysis, feedback classification, general English sentiment tasks.
Limitations and Bias
- Trained only on English movie reviews performance on other domains may vary
- May not handle Urdu, Roman Urdu, or code-switched text well
- Sarcasm with no obvious negative words may be misclassified
- Very short texts (under 5 words) have lower confidence scores
How to Use
python from transformers import pipeline classifier = pipeline('text-classification', model='YOUR-USERNAME/distilbert-imdb-sentiment') result = classifier('This movie was absolutely incredible!')
Output: [{'label': 'POSITIVE', 'score': 0.997}]
- Downloads last month
- 55