File size: 3,458 Bytes
98406d4
 
 
 
 
 
 
 
 
 
 
 
 
04f7a12
98406d4
04f7a12
98406d4
04f7a12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98406d4
04f7a12
 
98406d4
04f7a12
 
98406d4
04f7a12
 
98406d4
04f7a12
 
 
98406d4
04f7a12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98406d4
04f7a12
 
 
 
 
 
 
 
 
 
 
 
98406d4
04f7a12
98406d4
04f7a12
 
 
98406d4
04f7a12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98406d4
04f7a12
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

---
library_name: transformers
tags:
- autotrain
- text-classification
base_model: spacesedan/autotrain-iz7hp-zi6ki
widget:
- text: "I love AutoTrain"
datasets:
- spacesedan/goemotions-5point-sentiment
---

# 5-Point Sentiment Classifier (Longformer) — by spacesedan

A fine-tuned **Longformer** model for **5-point sentiment classification**, optimized to analyze long-form user-generated content like **Reddit posts**. This model is ideal for understanding nuanced sentiment across a spectrum from *very negative* to *very positive*.

---

## Labels

| Label Index | Sentiment        |
|-------------|------------------|
| 0           | Very Negative     |
| 1           | Negative          |
| 2           | Neutral           |
| 3           | Positive          |
| 4           | Very Positive     |

---

## Datasets Used

This model was fine-tuned using a combination of diverse and reliable datasets:

1. **[GoEmotions](https://huggingface.co/datasets/go_emotions)** by Google  
   → Converted 27 emotion labels into a 5-point sentiment scale.

2. **[Amazon Reviews (fine-grained)](https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes)**  
   → Large-scale consumer review dataset with fine-grained sentiment labels.

3. **[Kaggle: Twitter and Reddit Sentimental Analysis Dataset](https://www.kaggle.com/datasets/charangowda/twitter-and-reddit-sentimental-analysis-dataset)**  
   → Adapted into a 3-class and eventually 5-class format for compatibility.

---

## Training Configuration

| Setting                | Value              |
|------------------------|--------------------|
| Model Base             | Longformer (4096)  |
| Max Sequence Length    | 1024 tokens        |
| Epochs                 | 4                  |
| Batch Size             | 8                  |
| Gradient Accumulation  | 4                  |
| Optimizer              | `adamw_torch`      |
| Learning Rate          | `2e-5`             |
| Scheduler              | Linear             |
| Mixed Precision        | FP16               |
| Weight Decay           | 0.01               |
| Warmup Proportion      | 0.1                |
| Early Stopping         | patience=5, threshold=0.01 |

---

## Final Evaluation Metrics

| Metric               | Score   |
|----------------------|---------|
| Accuracy             | **0.671** |
| F1 Score (Macro)     | 0.642   |
| F1 Score (Weighted)  | 0.673   |
| Precision (Macro)    | 0.642   |
| Recall (Macro)       | 0.646   |
| Loss                 | 0.882   |

---

## Use Cases

- Tracking sentiment across **Reddit posts**, especially for **news** or **trending headlines**.
- Analyzing **long-form product reviews**.
- Building a sentiment dashboard for user forums or blogs.

---

## Limitations

- Model is trained on **English text** only.
- Sentiment can be **subjective**, especially across edge cases (e.g., sarcasm or dark humor).
- **5-class mapping** from GoEmotions is heuristic and might introduce some overlap.

---

## Acknowledgements

Special thanks to the original dataset creators:
- Google (GoEmotions)
- Yassir Acharki (Amazon Reviews fine-grained)
- Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)

---

## License

This model is available under the same license as the base model (Longformer) and is intended for research and educational use.

---

✅ Created and maintained by [spacesedan](https://huggingface.co/spacesedan)