File size: 3,458 Bytes


---
library_name: transformers
tags:
- autotrain
- text-classification
base_model: spacesedan/autotrain-iz7hp-zi6ki
widget:
- text: "I love AutoTrain"
datasets:
- spacesedan/goemotions-5point-sentiment
---

# 5-Point Sentiment Classifier (Longformer) — by spacesedan

A fine-tuned **Longformer** model for **5-point sentiment classification**, optimized to analyze long-form user-generated content like **Reddit posts**. This model is ideal for understanding nuanced sentiment across a spectrum from *very negative* to *very positive*.

---

## Labels

| Label Index | Sentiment        |
|-------------|------------------|
| 0           | Very Negative     |
| 1           | Negative          |
| 2           | Neutral           |
| 3           | Positive          |
| 4           | Very Positive     |

---

## Datasets Used

This model was fine-tuned using a combination of diverse and reliable datasets:

1. **[GoEmotions](https://huggingface.co/datasets/go_emotions)** by Google  
   → Converted 27 emotion labels into a 5-point sentiment scale.

2. **[Amazon Reviews (fine-grained)](https://huggingface.co/datasets/yassiracharki/Amazon_Reviews_for_Sentiment_Analysis_fine_grained_5_classes)**  
   → Large-scale consumer review dataset with fine-grained sentiment labels.

3. **[Kaggle: Twitter and Reddit Sentimental Analysis Dataset](https://www.kaggle.com/datasets/charangowda/twitter-and-reddit-sentimental-analysis-dataset)**  
   → Adapted into a 3-class and eventually 5-class format for compatibility.

---

## Training Configuration

| Setting                | Value              |
|------------------------|--------------------|
| Model Base             | Longformer (4096)  |
| Max Sequence Length    | 1024 tokens        |
| Epochs                 | 4                  |
| Batch Size             | 8                  |
| Gradient Accumulation  | 4                  |
| Optimizer              | `adamw_torch`      |
| Learning Rate          | `2e-5`             |
| Scheduler              | Linear             |
| Mixed Precision        | FP16               |
| Weight Decay           | 0.01               |
| Warmup Proportion      | 0.1                |
| Early Stopping         | patience=5, threshold=0.01 |

---

## Final Evaluation Metrics

| Metric               | Score   |
|----------------------|---------|
| Accuracy             | **0.671** |
| F1 Score (Macro)     | 0.642   |
| F1 Score (Weighted)  | 0.673   |
| Precision (Macro)    | 0.642   |
| Recall (Macro)       | 0.646   |
| Loss                 | 0.882   |

---

## Use Cases

- Tracking sentiment across **Reddit posts**, especially for **news** or **trending headlines**.
- Analyzing **long-form product reviews**.
- Building a sentiment dashboard for user forums or blogs.

---

## Limitations

- Model is trained on **English text** only.
- Sentiment can be **subjective**, especially across edge cases (e.g., sarcasm or dark humor).
- **5-class mapping** from GoEmotions is heuristic and might introduce some overlap.

---

## Acknowledgements

Special thanks to the original dataset creators:
- Google (GoEmotions)
- Yassir Acharki (Amazon Reviews fine-grained)
- Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)

---

## License

This model is available under the same license as the base model (Longformer) and is intended for research and educational use.

---

✅ Created and maintained by [spacesedan](https://huggingface.co/spacesedan)