Model Card for distilbert-imdb-sentiment
This model is a DistilBERT-based binary sentiment classifier, fine-tuned on the IMDb movie review dataset. It predicts whether a given piece of English text expresses a Positive or Negative sentiment, specifically optimized for movie review contexts.
Model Details
Model Description
This is a fine-tuned version of the distilbert-base-uncased-finetuned-sst-2-english
model, further adapted for binary sentiment classification using the IMDb Large Movie Review Dataset. The base model, DistilBERT, is a smaller, faster, and lighter version of BERT, making this model efficient for inference while retaining strong performance.
The model processes input text and outputs logits for two classes: 0 (Negative) and 1 (Positive).
- Developed by: Anthony Nguyen (@DeepAxion)
- Model type: Text Classification (Sentiment Analysis)
- Language(s) (NLP): English
- License: MIT
- Finetuned from model:
distilbert-base-uncased-finetuned-sst-2-english
(This model was already fine-tuned on SST-2, and we further fine-tuned it on IMDb.)
Uses
Direct Use
This model is intended for direct use in applications requiring binary sentiment classification of English text, particularly in domains related to movie reviews, literary critiques, or general consumer feedback where a positive/negative distinction is relevant. It can be integrated into web applications, chatbots, data analysis pipelines, or research projects.
Downstream Use
This model can serve as a strong baseline for further fine-tuning on highly specific sentiment analysis tasks (e.g., product reviews for a niche industry) or as a component within larger NLP systems (e.g., content moderation, recommender systems, customer support automation).
Out-of-Scope Use
This model is not intended for:
- Multilingual sentiment analysis: It's trained only on English.
- Sarcasm or irony detection: While it can infer sentiment, it may struggle with subtle human communication nuances like sarcasm.
- Fine-grained sentiment: It only provides binary (positive/negative) classification, not granular scores or emotion detection (e.g., joy, anger, sadness).
- Sensitive contexts: Do not use this model for high-stakes decisions without thorough domain-specific validation and human oversight, especially in areas like medical diagnoses, legal judgments, or financial advice.
- Generating text: This is a classification model, not a generative model.
Bias, Risks, and Limitations
- Dataset Bias: The model's performance and biases are influenced by the IMDb dataset. This dataset is primarily focused on movie reviews and may not generalize perfectly to other domains (e.g., product reviews, news articles) without further fine-tuning. It may also reflect biases present in the original dataset (e.g., demographic biases in movie reviews).
- Language Nuances: While strong, the model may misinterpret highly nuanced, ambiguous, or context-dependent language.
- Toxic Content: The model's training on general movie reviews does not guarantee robust performance on identifying or classifying toxic, hateful, or abusive language. Its primary function is sentiment.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model.
- Domain Adaptation: For optimal performance on text outside of movie reviews, consider further fine-tuning on domain-specific data.
- Human Oversight: Always incorporate human review for critical applications.
- Bias Auditing: If deploying in sensitive applications, conduct thorough bias auditing on relevant demographic or linguistic subgroups.
How to Get Started with the Model
You can use this model directly with the Hugging Face transformers
library.
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
# Load the model and tokenizer from the Hugging Face Hub
model_name = "DeepAxion/distilbert-imdb-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# put the model in eval mode
model.eval()
# Example Inference
text = "This movie totally blew me away, absolutely brilliant acting and a fantastic plot!"
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
# turn on eval mode
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
prediction = torch.argmax(probabilities, dim=-1).item()
sentiment_labels = {0: "Negative", 1: "Positive"}
print(f"Input Text: \"{text}\"")
print(f"Predicted Sentiment: {sentiment_labels[prediction]}")
print(f"Confidence (Negative): {probabilities[0][0].item():.4f}")
print(f"Confidence (Positive): {probabilities[0][1].item():.4f}")
Training Details
Training Data
The model was fine-tuned on the IMDb Large Movie Review Dataset. This dataset consists of 50,000 highly polar movie reviews (25,000 for training, 25,000 for testing), labeled as either positive or negative. Reviews with a score of <= 4 out of 10 are labeled negative, and those with a score of >= 7 out of 10 are labeled positive.
Dataset Card: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews (or the official IMDb dataset link if different)
Preprocessing
Text was tokenized using the DistilBertTokenizerFast associated with the base model. Input sequences were truncated to a maximum length of 512 tokens and padded to the longest sequence in the batch. Labels were mapped to 0 for negative and 1 for positive.
Training Hyperparameters
Training regime: Mixed precision (fp16) was likely used for faster training and reduced memory footprint. (Confirm this if you know your specific training setup)
Optimizer: AdamW
Learning Rate: Learning rate scheduler is used
Epochs: 3
Batch Size: 8
Hardware: Google Colab A100 GPU
Framework: PyTorch
Speeds, Sizes, Times
Training Time: [E.g., Approximately 1-2 hours on a single Colab T4 GPU] (Estimate based on your experience)
Model Size: The model.safetensors file is approximately 255 MB.
Metrics
The primary evaluation metrics used were:
- Accuracy: The proportion of correctly classified samples.
- F1-Score (weighted/macro): A measure combining precision and recall, useful for balanced assessment.
- Recall: The proportion of actual positive/negative samples that were correctly identified.
- Precision: The proportion of classified postive/negative that were actually positive/negative
Result
- Accuracy: 94%
- Recall: 94%
- Precision: 94%
- F1: 93%
Summary
The fine-tuned DistilBERT model demonstrates strong performance on the IMDb sentiment classification task, achieving high accuracy, F1-score, and recall on the test set.
- Downloads last month
- 33