spacesedan's picture
Update README.md
04f7a12 verified
metadata
library_name: transformers
tags:
  - autotrain
  - text-classification
base_model: spacesedan/autotrain-iz7hp-zi6ki
widget:
  - text: I love AutoTrain
datasets:
  - spacesedan/goemotions-5point-sentiment

5-Point Sentiment Classifier (Longformer) — by spacesedan

A fine-tuned Longformer model for 5-point sentiment classification, optimized to analyze long-form user-generated content like Reddit posts. This model is ideal for understanding nuanced sentiment across a spectrum from very negative to very positive.


Labels

Label Index Sentiment
0 Very Negative
1 Negative
2 Neutral
3 Positive
4 Very Positive

Datasets Used

This model was fine-tuned using a combination of diverse and reliable datasets:

  1. GoEmotions by Google
    → Converted 27 emotion labels into a 5-point sentiment scale.

  2. Amazon Reviews (fine-grained)
    → Large-scale consumer review dataset with fine-grained sentiment labels.

  3. Kaggle: Twitter and Reddit Sentimental Analysis Dataset
    → Adapted into a 3-class and eventually 5-class format for compatibility.


Training Configuration

Setting Value
Model Base Longformer (4096)
Max Sequence Length 1024 tokens
Epochs 4
Batch Size 8
Gradient Accumulation 4
Optimizer adamw_torch
Learning Rate 2e-5
Scheduler Linear
Mixed Precision FP16
Weight Decay 0.01
Warmup Proportion 0.1
Early Stopping patience=5, threshold=0.01

Final Evaluation Metrics

Metric Score
Accuracy 0.671
F1 Score (Macro) 0.642
F1 Score (Weighted) 0.673
Precision (Macro) 0.642
Recall (Macro) 0.646
Loss 0.882

Use Cases

  • Tracking sentiment across Reddit posts, especially for news or trending headlines.
  • Analyzing long-form product reviews.
  • Building a sentiment dashboard for user forums or blogs.

Limitations

  • Model is trained on English text only.
  • Sentiment can be subjective, especially across edge cases (e.g., sarcasm or dark humor).
  • 5-class mapping from GoEmotions is heuristic and might introduce some overlap.

Acknowledgements

Special thanks to the original dataset creators:

  • Google (GoEmotions)
  • Yassir Acharki (Amazon Reviews fine-grained)
  • Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)

License

This model is available under the same license as the base model (Longformer) and is intended for research and educational use.


✅ Created and maintained by spacesedan