File size: 1,521 Bytes
ff52cdd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
pip install -r requirements.txt
python run_api.py

curl -X POST http://127.0.0.1:5000/classify -H "Content-Type: application/json" -d '{"purpose_text": "paid rent"}'


## LLM/Transformer Conceptual Plan

To adapt a transformer-based model like BERT to this classification task, I would:
- Use a pre-trained model like `bert-base-uncased` from Hugging Face Transformers.
- Tokenize the `purpose_text` field using the BERT tokenizer.
- Add a classification head (dense layer) on top of the [CLS] token representation.
- Fine-tune the model on the labeled dataset using cross-entropy loss.

Due to hardware limitations, I am not implementing this, but a minimal prototype could be done with the `Trainer` API in Hugging Face.


how the data was trained 
Raw: "Monthly apartment payment - paid"
Cleaned: "monthly apartment payment"


# Transformer-Based Classification Notes

Instead of traditional models, we could use a transformer like BERT for this task.

## Approach
1. Load a pre-trained model like `bert-base-uncased`
2. Tokenize `purpose_text` using HuggingFace's tokenizer
3. Add a classification head to the model
4. Fine-tune the model on your labeled dataset

## Benefits
- Better semantic understanding of context
- No need for manual preprocessing or TF-IDF

## Tools
- `transformers` from HuggingFace
- `datasets` for handling input
- `torch` for training

## Reason for Not Using It
Due to hardware limitations and time constraints, traditional models were preferred.