DistilBERT IMDb Sentiment Classifier

A fine-tuned DistilBERT model for binary sentiment analysis on movie reviews.

Model Description

This model was fine-tuned from distilbert-base-uncased on 5,000 IMDb movie reviews for 3 epochs. It classifies text as POSITIVE or NEGATIVE sentiment.

Training Data

Source: IMDb Large Movie Review Dataset (stored in SQLite, queried with pandas)
Train: 5,000 samples | Validation: 1,000 samples
Label balance: approximately 50% positive, 50% negative

Evaluation Results

Metric	Score
Accuracy	88.4%
F1 Score	0.893

Baseline Comparison

Model	Accuracy
TF-IDF + Logistic Regression	86.4%
DistilBERT (this model)	92.3%

Intended Use

Product review analysis, feedback classification, general English sentiment tasks.

Limitations and Bias

Trained only on English movie reviews performance on other domains may vary
May not handle Urdu, Roman Urdu, or code-switched text well
Sarcasm with no obvious negative words may be misclassified
Very short texts (under 5 words) have lower confidence scores

How to Use

python from transformers import pipeline classifier = pipeline('text-classification', model='YOUR-USERNAME/distilbert-imdb-sentiment') result = classifier('This movie was absolutely incredible!')

Output: [{'label': 'POSITIVE', 'score': 0.997}]

Downloads last month: 55

Safetensors

Model size

67M params

Tensor type

F32

Asmatullah-AI-Engineer
/

distilbert-imdb-sentiment