pip install -r requirements.txt python run_api.py curl -X POST http://127.0.0.1:5000/classify -H "Content-Type: application/json" -d '{"purpose_text": "paid rent"}'

LLM/Transformer Conceptual Plan

To adapt a transformer-based model like BERT to this classification task, I would:

Use a pre-trained model like bert-base-uncased from Hugging Face Transformers.
Tokenize the purpose_text field using the BERT tokenizer.
Add a classification head (dense layer) on top of the [CLS] token representation.
Fine-tune the model on the labeled dataset using cross-entropy loss.

Due to hardware limitations, I am not implementing this, but a minimal prototype could be done with the Trainer API in Hugging Face.

how the data was trained Raw: "Monthly apartment payment - paid" Cleaned: "monthly apartment payment"

Transformer-Based Classification Notes

Instead of traditional models, we could use a transformer like BERT for this task.

Approach

Load a pre-trained model like bert-base-uncased
Tokenize purpose_text using HuggingFace's tokenizer
Add a classification head to the model
Fine-tune the model on your labeled dataset

Benefits

Better semantic understanding of context
No need for manual preprocessing or TF-IDF

Tools

transformers from HuggingFace
datasets for handling input
torch for training

Reason for Not Using It

Due to hardware limitations and time constraints, traditional models were preferred.