metadata

library_name: transformers
license: mit
base_model: roberta-base
tags:
  - text-classification
  - multi-label-classification
  - emotion-detection
  - tensorflow
  - keras
  - generated_from_keras_callback
model-index:
  - name: fp-ai-modul-6
    results:
      - task:
          type: text-classification
          name: text_classification
        dataset:
          name: GoEmotions
          type: go_emotions
          config: go_emotions_original
          split: validation
        metrics:
          - type: f1
            name: Macro F1-Score
            value: 0.5123
            verified: false

roberta-finetuned-emotion-multilabel-tf

This model is a fine-tuned version of roberta-base on the GoEmotions dataset. It has been trained to perform multi-label text classification to detect one or more of 14 different emotions from a given text.

This model was trained as part of a final project for an AI module, demonstrating the end-to-end process of data analysis, preprocessing, model fine-tuning with TensorFlow/Keras, evaluation, and deployment on the Hugging Face Hub.

It achieves the following results on the evaluation set:

Macro F1-Score: 0.5123 (or your actual score)

Model description

This is a roberta-base model fine-tuned for multi-label emotion classification. The model takes a text as input and outputs a probability score for each of the following 14 emotions:

amusement
anger
annoyance
caring
confusion
disappointment
disgust
embarrassment
excitement
fear
gratitude
joy
love
sadness

Since it's a multi-label classification task, the output layer uses a Sigmoid activation function, and the model is trained with Binary Cross-Entropy loss.

Intended uses & limitations

How to use

You can use this model with the text-classification pipeline. Since it's a multi-label model, it's recommended to pass the top_k=None argument to see the scores for all labels.

from transformers import pipeline

# Replace "your-username/fp-ai-modul-6" with your actual model repo
classifier = pipeline("text-classification", model="your-username/fp-ai-modul-6", top_k=None)

text = "I can't believe I won the lottery! This is the best day of my life!"
predictions = classifier(text)

# Apply a threshold to filter relevant emotions
threshold = 0.35 # This threshold was tuned on the validation set
for pred in predictions[0]:
    if pred['score'] > threshold:
        print(f"Label: {pred['label']}, Score: {pred['score']:.4f}")

# Expected output:
# Label: joy, Score: 0.9876
# Label: excitement, Score: 0.9754
# Label: amusement, Score: 0.4532

Limitations

The model was trained on the GoEmotions dataset, which primarily consists of English text from Reddit comments. Its performance on other domains (e.g., formal text, poetry, other languages) may be suboptimal.
The dataset has a significant class imbalance. The model performs better on common emotions like joy and amusement and may struggle with rare emotions like embarrassment or disgust.
The roberta-base architecture is smaller and faster but may be less accurate than larger models like roberta-large.

Training and evaluation data

The model was fine-tuned on the GoEmotions dataset, a human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories. For this project, a subset of 14 primary emotions was used.

The data was split into:

Training set: 37,164 samples
Validation set: 9,291 samples

Preprocessing steps included lowercasing and tokenization with a max length of 128.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

optimizer: {'name': 'AdamW', 'weight_decay': 0.0, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': 2e-05, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}
training_precision: float32
epochs: 4
batch_size: 32
loss_function: BinaryCrossentropy (from_logits=True)

An EarlyStopping callback was used to monitor val_loss with a patience of 2, restoring the best weights at the end of training.

Training results

The final model achieved a Macro F1-Score of 0.5123 on the validation set after tuning the prediction threshold to 0.35.

Framework versions

Transformers 4.41.2
TensorFlow 2.16.1
Datasets 2.19.0
Tokenizers 0.19.1