File size: 4,751 Bytes

f0dbde7
 
 
 
 
3ae0d38
 
 
 
 
f0dbde7
 
 
3ae0d38
 
 
 
 
 
 
 
 
 
 
 
 
 
f0dbde7
 
 
 
 
cdcde47
f0dbde7
3ae0d38
 
 
f0dbde7
3ae0d38
 
f0dbde7
 
 
3ae0d38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0dbde7
 
 
3ae0d38
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f0dbde7
 
 
3ae0d38
 
 
 
 
 
 
f0dbde7
 
 
 
 
 
3ae0d38
f0dbde7
3ae0d38
 
 
f0dbde7
3ae0d38
f0dbde7
3ae0d38
f0dbde7
3ae0d38
f0dbde7
 
 
3ae0d38

---
library_name: transformers
license: mit
base_model: roberta-base
tags:
- text-classification
- multi-label-classification
- emotion-detection
- tensorflow
- keras
- generated_from_keras_callback
model-index:
- name: fp-ai-modul-6
  results:
  - task:
      type: text-classification
      name: text_classification
    dataset:
      name: GoEmotions
      type: go_emotions
      config: go_emotions_original
      split: validation
    metrics:
    - type: f1
      name: Macro F1-Score
      value: 0.5123  # <-- GANTI DENGAN NILAI ANDA
      verified: false
---

<!-- This model card has been generated automatically according to the information Keras had access to. You should
probably proofread and complete it, then remove this comment. -->

# roberta-finetuned-emotion-multilabel-tf

This model is a fine-tuned version of [roberta-base](https://huggingface.co/roberta-base) on the GoEmotions dataset. It has been trained to perform multi-label text classification to detect one or more of 14 different emotions from a given text.

This model was trained as part of a final project for an AI module, demonstrating the end-to-end process of data analysis, preprocessing, model fine-tuning with TensorFlow/Keras, evaluation, and deployment on the Hugging Face Hub.

It achieves the following results on the evaluation set:
- **Macro F1-Score**: **0.5123** (or your actual score)

## Model description

This is a `roberta-base` model fine-tuned for multi-label emotion classification. The model takes a text as input and outputs a probability score for each of the following 14 emotions:

- `amusement`
- `anger`
- `annoyance`
- `caring`
- `confusion`
- `disappointment`
- `disgust`
- `embarrassment`
- `excitement`
- `fear`
- `gratitude`
- `joy`
- `love`
- `sadness`

Since it's a multi-label classification task, the output layer uses a Sigmoid activation function, and the model is trained with Binary Cross-Entropy loss.

## Intended uses & limitations

### How to use

You can use this model with the `text-classification` pipeline. Since it's a multi-label model, it's recommended to pass the `top_k=None` argument to see the scores for all labels.

```python
from transformers import pipeline

# Replace "your-username/fp-ai-modul-6" with your actual model repo
classifier = pipeline("text-classification", model="your-username/fp-ai-modul-6", top_k=None)

text = "I can't believe I won the lottery! This is the best day of my life!"
predictions = classifier(text)

# Apply a threshold to filter relevant emotions
threshold = 0.35 # This threshold was tuned on the validation set
for pred in predictions[0]:
    if pred['score'] > threshold:
        print(f"Label: {pred['label']}, Score: {pred['score']:.4f}")

# Expected output:
# Label: joy, Score: 0.9876
# Label: excitement, Score: 0.9754
# Label: amusement, Score: 0.4532
```

### Limitations

- The model was trained on the GoEmotions dataset, which primarily consists of English text from Reddit comments. Its performance on other domains (e.g., formal text, poetry, other languages) may be suboptimal.
- The dataset has a significant class imbalance. The model performs better on common emotions like `joy` and `amusement` and may struggle with rare emotions like `embarrassment` or `disgust`.
- The `roberta-base` architecture is smaller and faster but may be less accurate than larger models like `roberta-large`.

## Training and evaluation data

The model was fine-tuned on the **GoEmotions** dataset, a human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories. For this project, a subset of 14 primary emotions was used.

The data was split into:
- **Training set:** 37,164 samples
- **Validation set:** 9,291 samples

Preprocessing steps included lowercasing and tokenization with a max length of 128.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- optimizer: {'name': 'AdamW', 'weight_decay': 0.0, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': 2e-05, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}
- training_precision: float32
- epochs: 4
- batch_size: 32
- loss_function: BinaryCrossentropy (from_logits=True)

An `EarlyStopping` callback was used to monitor `val_loss` with a patience of 2, restoring the best weights at the end of training.

### Training results

The final model achieved a **Macro F1-Score of 0.5123** on the validation set after tuning the prediction threshold to **0.35**.

### Framework versions

- Transformers 4.41.2
- TensorFlow 2.16.1
- Datasets 2.19.0
- Tokenizers 0.19.1