File size: 3,458 Bytes
98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 98406d4 04f7a12 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
library_name: transformers
tags:
- autotrain
- text-classification
base_model: spacesedan/autotrain-iz7hp-zi6ki
widget:
- text: "I love AutoTrain"
datasets:
- spacesedan/goemotions-5point-sentiment
---
# 5-Point Sentiment Classifier (Longformer) — by spacesedan
A fine-tuned **Longformer** model for **5-point sentiment classification**, optimized to analyze long-form user-generated content like **Reddit posts**. This model is ideal for understanding nuanced sentiment across a spectrum from *very negative* to *very positive*.
---
## Labels
| Label Index | Sentiment |
|-------------|------------------|
| 0 | Very Negative |
| 1 | Negative |
| 2 | Neutral |
| 3 | Positive |
| 4 | Very Positive |
---
## Datasets Used
This model was fine-tuned using a combination of diverse and reliable datasets:
1. **[GoEmotions](https://huggingface.co/datasets/go_emotions)** by Google
→ Converted 27 emotion labels into a 5-point sentiment scale.
2. **[Amazon Reviews (fine-grained)](https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes)**
→ Large-scale consumer review dataset with fine-grained sentiment labels.
3. **[Kaggle: Twitter and Reddit Sentimental Analysis Dataset](https://www.kaggle.com/datasets/charangowda/twitter-and-reddit-sentimental-analysis-dataset)**
→ Adapted into a 3-class and eventually 5-class format for compatibility.
---
## Training Configuration
| Setting | Value |
|------------------------|--------------------|
| Model Base | Longformer (4096) |
| Max Sequence Length | 1024 tokens |
| Epochs | 4 |
| Batch Size | 8 |
| Gradient Accumulation | 4 |
| Optimizer | `adamw_torch` |
| Learning Rate | `2e-5` |
| Scheduler | Linear |
| Mixed Precision | FP16 |
| Weight Decay | 0.01 |
| Warmup Proportion | 0.1 |
| Early Stopping | patience=5, threshold=0.01 |
---
## Final Evaluation Metrics
| Metric | Score |
|----------------------|---------|
| Accuracy | **0.671** |
| F1 Score (Macro) | 0.642 |
| F1 Score (Weighted) | 0.673 |
| Precision (Macro) | 0.642 |
| Recall (Macro) | 0.646 |
| Loss | 0.882 |
---
## Use Cases
- Tracking sentiment across **Reddit posts**, especially for **news** or **trending headlines**.
- Analyzing **long-form product reviews**.
- Building a sentiment dashboard for user forums or blogs.
---
## Limitations
- Model is trained on **English text** only.
- Sentiment can be **subjective**, especially across edge cases (e.g., sarcasm or dark humor).
- **5-class mapping** from GoEmotions is heuristic and might introduce some overlap.
---
## Acknowledgements
Special thanks to the original dataset creators:
- Google (GoEmotions)
- Yassir Acharki (Amazon Reviews fine-grained)
- Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)
---
## License
This model is available under the same license as the base model (Longformer) and is intended for research and educational use.
---
✅ Created and maintained by [spacesedan](https://huggingface.co/spacesedan) |