File size: 7,521 Bytes
56b673c
 
868536d
 
f8a32b7
868536d
 
 
 
 
 
f8a32b7
868536d
 
56b673c
 
633aeab
56b673c
633aeab
56b673c
 
 
 
 
633aeab
56b673c
633aeab
56b673c
2bcce3a
633aeab
 
 
 
56b673c
2bcce3a
56b673c
633aeab
2bcce3a
56b673c
 
 
 
 
633aeab
56b673c
633aeab
56b673c
633aeab
56b673c
 
 
633aeab
 
 
 
 
 
56b673c
 
 
633aeab
 
 
56b673c
 
 
633aeab
 
 
 
56b673c
 
 
633aeab
56b673c
633aeab
 
 
56b673c
633aeab
f8a32b7
633aeab
f8a32b7
 
 
56b673c
633aeab
 
56b673c
633aeab
56b673c
f8a32b7
633aeab
 
 
 
 
56b673c
f8a32b7
56b673c
633aeab
 
 
 
 
56b673c
b1f1815
56b673c
b1f1815
633aeab
56b673c
b1f1815
 
f8a32b7
b1f1815
 
f8a32b7
b1f1815
 
 
 
 
 
 
 
 
 
 
 
 
 
f8a32b7
b1f1815
 
f8a32b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
library_name: transformers
license: mit
datasets:
- ajaykarthick/imdb-movie-reviews
language:
- en
metrics:
- accuracy
- f1
- recall
- precision
base_model:
- distilbert/distilbert-base-uncased-finetuned-sst-2-english
---

# Model Card for distilbert-imdb-sentiment

This model is a DistilBERT-based binary sentiment classifier, fine-tuned on the IMDb movie review dataset. It predicts whether a given piece of English text expresses a **Positive** or **Negative** sentiment, specifically optimized for movie review contexts.

## Model Details

### Model Description

This is a fine-tuned version of the `distilbert-base-uncased-finetuned-sst-2-english` model, further adapted for binary sentiment classification using the IMDb Large Movie Review Dataset. The base model, DistilBERT, is a smaller, faster, and lighter version of BERT, making this model efficient for inference while retaining strong performance.

The model processes input text and outputs logits for two classes: 0 (Negative) and 1 (Positive).

-   **Developed by:** Anthony Nguyen (@DeepAxion)
-   **Model type:** Text Classification (Sentiment Analysis)
-   **Language(s) (NLP):** English
-   **License:** MIT
-   **Finetuned from model:** `distilbert-base-uncased-finetuned-sst-2-english` (This model was already fine-tuned on SST-2, and we further fine-tuned it on IMDb.)

<!-- ### Model Sources

-   **Repository:** [https://github.com/DeepAxion/distilbert-imdb-sentiment](https://github.com/DeepAxion/distilbert-imdb-sentiment) (Link to your GitHub repo)
-   **Demo:** [https://huggingface.co/spaces/[Your-Username]/[Your-Space-Name]](https://huggingface.co/spaces/[Your-Username]/[Your-Space-Name]) (If you deploy a Hugging Face Space demo) -->

## Uses

### Direct Use

This model is intended for direct use in applications requiring binary sentiment classification of English text, particularly in domains related to movie reviews, literary critiques, or general consumer feedback where a positive/negative distinction is relevant. It can be integrated into web applications, chatbots, data analysis pipelines, or research projects.

### Downstream Use

This model can serve as a strong baseline for further fine-tuning on highly specific sentiment analysis tasks (e.g., product reviews for a niche industry) or as a component within larger NLP systems (e.g., content moderation, recommender systems, customer support automation).

### Out-of-Scope Use

This model is **not** intended for:
-   **Multilingual sentiment analysis:** It's trained only on English.
-   **Sarcasm or irony detection:** While it can infer sentiment, it may struggle with subtle human communication nuances like sarcasm.
-   **Fine-grained sentiment:** It only provides binary (positive/negative) classification, not granular scores or emotion detection (e.g., joy, anger, sadness).
-   **Sensitive contexts:** Do not use this model for high-stakes decisions without thorough domain-specific validation and human oversight, especially in areas like medical diagnoses, legal judgments, or financial advice.
-   **Generating text:** This is a classification model, not a generative model.

## Bias, Risks, and Limitations

* **Dataset Bias:** The model's performance and biases are influenced by the IMDb dataset. This dataset is primarily focused on movie reviews and may not generalize perfectly to other domains (e.g., product reviews, news articles) without further fine-tuning. It may also reflect biases present in the original dataset (e.g., demographic biases in movie reviews).
* **Language Nuances:** While strong, the model may misinterpret highly nuanced, ambiguous, or context-dependent language.
* **Toxic Content:** The model's training on general movie reviews does not guarantee robust performance on identifying or classifying toxic, hateful, or abusive language. Its primary function is sentiment.

### Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model.
* **Domain Adaptation:** For optimal performance on text outside of movie reviews, consider further fine-tuning on domain-specific data.
* **Human Oversight:** Always incorporate human review for critical applications.
* **Bias Auditing:** If deploying in sensitive applications, conduct thorough bias auditing on relevant demographic or linguistic subgroups.

## How to Get Started with the Model

You can use this model directly with the Hugging Face `transformers` library.

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch

# Load the model and tokenizer from the Hugging Face Hub
model_name = "DeepAxion/distilbert-imdb-sentiment" 
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# put the model in eval mode
model.eval()

# Example Inference
text = "This movie totally blew me away, absolutely brilliant acting and a fantastic plot!"

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)

# turn on eval mode
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=-1)
    prediction = torch.argmax(probabilities, dim=-1).item()

sentiment_labels = {0: "Negative", 1: "Positive"}

print(f"Input Text: \"{text}\"")
print(f"Predicted Sentiment: {sentiment_labels[prediction]}")
print(f"Confidence (Negative): {probabilities[0][0].item():.4f}")
print(f"Confidence (Positive): {probabilities[0][1].item():.4f}")
```

## Training Details

### Training Data
The model was fine-tuned on the IMDb Large Movie Review Dataset. This dataset consists of 50,000 highly polar movie reviews (25,000 for training, 25,000 for testing), labeled as either positive or negative. Reviews with a score of <= 4 out of 10 are labeled negative, and those with a score of >= 7 out of 10 are labeled positive.

Dataset Card: https://huggingface.co/datasets/ajaykarthick/imdb-movie-reviews (or the official IMDb dataset link if different)

### Preprocessing
Text was tokenized using the DistilBertTokenizerFast associated with the base model. Input sequences were truncated to a maximum length of 512 tokens and padded to the longest sequence in the batch. Labels were mapped to 0 for negative and 1 for positive.

### Training Hyperparameters
- Training regime: Mixed precision (fp16) was likely used for faster training and reduced memory footprint. (Confirm this if you know your specific training setup)

- Optimizer: AdamW

- Learning Rate: Learning rate scheduler is used

- Epochs: 3

- Batch Size: 8

- Hardware: Google Colab A100 GPU

- Framework: PyTorch

### Speeds, Sizes, Times
Training Time: [E.g., Approximately 1-2 hours on a single Colab T4 GPU] (Estimate based on your experience)

Model Size: The model.safetensors file is approximately 255 MB.

## Metrics
The primary evaluation metrics used were:

- Accuracy: The proportion of correctly classified samples.
- F1-Score (weighted/macro): A measure combining precision and recall, useful for balanced assessment.
- Recall: The proportion of actual positive/negative samples that were correctly identified.
- Precision: The proportion of classified postive/negative that were actually positive/negative

### Result
- Accuracy: 94%
- Recall: 94%
- Precision: 94%
- F1: 93%

## Summary
The fine-tuned DistilBERT model demonstrates strong performance on the IMDb sentiment classification task, achieving high accuracy, F1-score, and recall on the test set.