File size: 3,473 Bytes
1973b1c
 
e6924d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1973b1c
 
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
 
 
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
 
 
 
1973b1c
e6924d3
1973b1c
e6924d3
 
 
 
 
 
 
 
 
 
 
 
 
 
1973b1c
e6924d3
1973b1c
e6924d3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1973b1c
e6924d3
1973b1c
e6924d3
1973b1c
e6924d3
 
1973b1c
e6924d3
 
1973b1c
e6924d3
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
---
library_name: transformers
license: mit
datasets:
- hblim/customer-complaints
language:
- en
metrics:
- accuracy
base_model:
- google-bert/bert-base-uncased
tags:
- bert
- transformers
- customer-complaints
- text-classification
- multiclass
- huggingface
- fine-tuned
- wandb
---

# BERT Base (Uncased) Fine-Tuned on Customer Complaint Classification (3 Classes)

## 🧾 Model Description

This model is a fine-tuned version of [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) using Hugging Face Transformers on a custom dataset of customer complaints. The task is **multi-class text classification**, where each complaint is categorized into one of **three classes**.

The model is intended to support downstream tasks like complaint triage, issue type prediction, or support ticket classification.

Training and evaluation were tracked using [Weights & Biases](https://wandb.ai/), and all hyperparameters are reproducible and logged below.

---

## 🧠 Intended Use

- 🏷 Classify customer complaint text into 3 predefined categories
- πŸ“Š Analyze complaint trends over time
- πŸ’¬ Serve as a backend model for customer service applications

---

## πŸ“š Dataset

- Dataset Name: [hblim/customer-complaints](https://huggingface.co/datasets/hblim/customer-complaints)
- Dataset Type: Multiclass text classification
- Classes: billing, product, delivery
- Preprocessing: Standard BERT tokenization

---

## βš™οΈ Training Details

- Base Model: `bert-base-uncased`
- Epochs: **10**
- Batch Size: **1**
- Learning Rate: **1e-5**
- Weight Decay: **0.05**
- Warmup Ratio: **0.20**
- LR Scheduler: `linear`
- Optimizer: `AdamW`
- Evaluation Strategy: every **100 steps**
- Logging: every **100 steps**
- Trainer: Hugging Face `Trainer`
- Hardware: Single NVIDIA GeForce RTX 3080 GPU

---

## πŸ“ˆ Metrics

Evaluation was tracked using:
- **Accuracy**

To reproduce metrics and training logs, refer to the corresponding W&B run:
[Weights & Biases Run - `baseline-hf-hub`](https://wandb.ai/notslahify/customer%20complaints%20fine%20tuning/runs/c75ddclr)


| Step | Training Loss | Validation Loss | Accuracy   |
|------|---------------|-----------------|------------|
| 100  | 1.106100      | 1.040519        | 0.523810   |
| 200  | 0.944800      | 0.744273        | 0.738095   |
| 300  | 0.660000      | 0.385309        | 0.900000   |
| 400  | 0.412400      | 0.273423        | 0.904762   |
| 500  | 0.220800      | 0.185636        | 0.923810   |
| 600  | 0.163400      | 0.245850        | 0.919048   |
| 700  | 0.116100      | 0.180523        | 0.942857   |
| 800  | 0.097200      | 0.254475        | 0.928571   |
| 900  | 0.052200      | 0.233583        | 0.942857   |
| 1000 | 0.050700      | 0.223150        | 0.928571   |
| 1100 | 0.035100      | 0.271416        | 0.919048   |
| 1200 | 0.027700      | 0.226478        | 0.933333   |
| 1300 | 0.009000      | 0.218807        | 0.938095   |
| 1400 | 0.013600      | 0.246330        | 0.928571   |
| 1500 | 0.014500      | 0.226987        | 0.933333   |

---

## πŸš€ How to Use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("your-username/baseline-hf-hub")
tokenizer = AutoTokenizer.from_pretrained("your-username/baseline-hf-hub")

inputs = tokenizer("I want to report an issue with my account", return_tensors="pt")
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()