Model Card for distilbert-imdb-sentiment

This model is a DistilBERT-based binary sentiment classifier, fine-tuned on the IMDb movie review dataset. It predicts whether a given piece of English text expresses a Positive or Negative sentiment, specifically optimized for movie review contexts.

Model Details

Model Description

This is a fine-tuned version of the distilbert-base-uncased-finetuned-sst-2-english model, further adapted for binary sentiment classification using the IMDb Large Movie Review Dataset. The base model, DistilBERT, is a smaller, faster, and lighter version of BERT, making this model efficient for inference while retaining strong performance.

The model processes input text and outputs logits for two classes: 0 (Negative) and 1 (Positive).

Developed by: Anthony Nguyen (@DeepAxion)
Model type: Text Classification (Sentiment Analysis)
Language(s) (NLP): English
License: MIT
Finetuned from model: distilbert-base-uncased-finetuned-sst-2-english (This model was already fine-tuned on SST-2, and we further fine-tuned it on IMDb.)

Uses

Direct Use

This model is intended for direct use in applications requiring binary sentiment classification of English text, particularly in domains related to movie reviews, literary critiques, or general consumer feedback where a positive/negative distinction is relevant. It can be integrated into web applications, chatbots, data analysis pipelines, or research projects.

Downstream Use

This model can serve as a strong baseline for further fine-tuning on highly specific sentiment analysis tasks (e.g., product reviews for a niche industry) or as a component within larger NLP systems (e.g., content moderation, recommender systems, customer support automation).

Out-of-Scope Use

This model is not intended for:

Multilingual sentiment analysis: It's trained only on English.
Sarcasm or irony detection: While it can infer sentiment, it may struggle with subtle human communication nuances like sarcasm.
Fine-grained sentiment: It only provides binary (positive/negative) classification, not granular scores or emotion detection (e.g., joy, anger, sadness).
Sensitive contexts: Do not use this model for high-stakes decisions without thorough domain-specific validation and human oversight, especially in areas like medical diagnoses, legal judgments, or financial advice.
Generating text: This is a classification model, not a generative model.

Bias, Risks, and Limitations

Dataset Bias: The model's performance and biases are influenced by the IMDb dataset. This dataset is primarily focused on movie reviews and may not generalize perfectly to other domains (e.g., product reviews, news articles) without further fine-tuning. It may also reflect biases present in the original dataset (e.g., demographic biases in movie reviews).
Language Nuances: While strong, the model may misinterpret highly nuanced, ambiguous, or context-dependent language.
Toxic Content: The model's training on general movie reviews does not guarantee robust performance on identifying or classifying toxic, hateful, or abusive language. Its primary function is sentiment.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model.

Domain Adaptation: For optimal performance on text outside of movie reviews, consider further fine-tuning on domain-specific data.
Human Oversight: Always incorporate human review for critical applications.
Bias Auditing: If deploying in sensitive applications, conduct thorough bias auditing on relevant demographic or linguistic subgroups.

How to Get Started with the Model

You can use this model directly with the Hugging Face transformers library.

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the model and tokenizer from the Hugging Face Hub
model_name = "DeepAxion/distilbert-imdb-sentiment" 
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# put the model in eval mode
model.eval()

# Example Inference
text = "This movie totally blew me away, absolutely brilliant acting and a fantastic plot!"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# turn on eval mode
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1).item()

sentiment_labels = {0: "Negative", 1: "Positive"}

print(f"Input Text: \"{text}\"")
print(f"Predicted Sentiment: {sentiment_labels[prediction]}")
print(f"Confidence (Negative): {probabilities[0][0].item():.4f}")
print(f"Confidence (Positive): {probabilities[0][1].item():.4f}")

Training Details

Training Data

The model was fine-tuned on the IMDb Large Movie Review Dataset. This dataset consists of 50,000 highly polar movie reviews (25,000 for training, 25,000 for testing), labeled as either positive or negative. Reviews with a score of <= 4 out of 10 are labeled negative, and those with a score of >= 7 out of 10 are labeled positive.

Dataset Card: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews (or the official IMDb dataset link if different)

Preprocessing

Text was tokenized using the DistilBertTokenizerFast associated with the base model. Input sequences were truncated to a maximum length of 512 tokens and padded to the longest sequence in the batch. Labels were mapped to 0 for negative and 1 for positive.

Training Hyperparameters

Training regime: Mixed precision (fp16) was likely used for faster training and reduced memory footprint. (Confirm this if you know your specific training setup)
Optimizer: AdamW
Learning Rate: Learning rate scheduler is used
Epochs: 3
Batch Size: 8
Hardware: Google Colab A100 GPU
Framework: PyTorch

Speeds, Sizes, Times

Training Time: [E.g., Approximately 1-2 hours on a single Colab T4 GPU] (Estimate based on your experience)

Model Size: The model.safetensors file is approximately 255 MB.

Metrics

The primary evaluation metrics used were:

Accuracy: The proportion of correctly classified samples.
F1-Score (weighted/macro): A measure combining precision and recall, useful for balanced assessment.
Recall: The proportion of actual positive/negative samples that were correctly identified.
Precision: The proportion of classified postive/negative that were actually positive/negative

Result

Accuracy: 94%
Recall: 94%
Precision: 94%
F1: 93%

Summary

The fine-tuned DistilBERT model demonstrates strong performance on the IMDb sentiment classification task, achieving high accuracy, F1-score, and recall on the test set.

DeepAxion
/

distilbert-imdb-sentiment