--- license: apache-2.0 language: en tags: - text-classification - llama3 - qlora - finance - transaction-categorization pipeline_tag: text-generation datasets: - karthiksagarn/bank-statement-categorization metrics: - accuracy base_model: - meta-llama/Llama-3.2-3B-Instruct new_version: karthiksagarn/llama3-3.2b-finetuned-financial library_name: adapter-transformers --- # Llama 3 3B Financial Transaction Classifier This repository contains a **Llama 3 3B Instruct** model fine-tuned using **QLoRA** to classify bank transaction descriptions into 12 distinct financial categories. The model is designed to be a powerful, lightweight solution for personal finance management applications, automated bookkeeping, and spending analysis. This model was trained on a custom synthetic dataset of transaction descriptions. It takes a raw transaction string as input and outputs the most likely spending category. ## Model Description - **Base Model:** `meta-llama/Meta-Llama-3-8B-Instruct` (The script uses a 3B variant, but this is a common public equivalent. You can update this to the exact base model ID if available on the Hub). - **Fine-tuning Method:** QLoRA (Quantization with Low-Rank Adapters) for memory-efficient training. - **Task:** Text Classification (formatted as a Causal Language Modeling task). - **Categories (12):** `Education`, `Travel & Transport`, `Groceries`, `Miscellaneous`, `Bills & Utilities`, `Health & Fitness`, `Shopping`, `Entertainment`, `Investments`, `Income`, `Food & Drinks`, `Withdrawals`. ## How to Use To use this model, you need to load the base Llama 3 model and then apply the fine-tuned LoRA adapters from this repository. Make sure you have the `transformers`, `peft`, `accelerate`, and `bitsandbytes` libraries installed. ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline from peft import PeftModel # Define the base model and the adapter path (your Hugging Face repo) base_model_id = "meta-llama/Meta-Llama-3-8B-Instruct" adapter_id = "karthiksagarn/llama3-3.2b-finetuned-financial" # Replace with your HF repo ID # Load the base model with 4-bit quantization bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16 ) base_model = AutoModelForCausalLM.from_pretrained( base_model_id, quantization_config=bnb_config, device_map="auto", trust_remote_code=True, # token="YOUR_HUGGINGFACE_TOKEN" # Add your token if needed ) # Load the tokenizer tokenizer = AutoTokenizer.from_pretrained(adapter_id) # Load the LoRA adapter and merge it onto the base model model = PeftModel.from_pretrained(base_model, adapter_id) # Create a text-generation pipeline pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) # --- Inference --- def classify_transaction(description): # The list of labels the model was trained on labels = [ "Education", "Travel & Transport", "Groceries", "Miscellaneous", "Bills & Utilities", "Health & Fitness", "Shopping", "Entertainment", "Investments", "Income", "Food & Drinks", "Withdrawals" ] prompt = ( f"Classify the following bank transaction into one of these categories:\n" f"{', '.join(labels)}\n\n" f"Description: {description}\n\nCategory:" ) # Generate the output output = pipe(prompt, max_new_tokens=20, do_sample=False, pad_token_id=tokenizer.eos_token_id) # Clean and parse the output generated_text = output[0]["generated_text"] category = generated_text.split("Category:")[-1].strip().split("\n")[0].strip() # Fallback to ensure a valid category is returned if category in labels: return category return "Miscellaneous" # Fallback if parsing fails # --- Example Usage --- transaction1 = "Sent Rs.510.00 From ABCD Bank A/C **** To Zomato Limited On 10/01/29" transaction2 = "UPI Payment to Amazon for new headphones" transaction3 = "Salary credited from Awesome Tech Inc." transaction4 = "Recharge of Airtel mobile number" print(f"'{transaction1}' -> Category: {classify_transaction(transaction1)}") print(f"'{transaction2}' -> Category: {classify_transaction(transaction2)}") print(f"'{transaction3}' -> Category: {classify_transaction(transaction3)}") print(f"'{transaction4}' -> Category: {classify_transaction(transaction4)}") # Expected Output: # 'Sent Rs.510.00...' -> Category: Food & Drinks # 'UPI Payment to Amazon...' -> Category: Shopping # 'Salary credited from...' -> Category: Income # 'Recharge of Airtel...' -> Category: Bills & Utilities ``` ## Training & Evaluation ### Training Procedure The model was fine-tuned on a private, balanced dataset of synthetic bank transactions. - **Quantization:** 4-bit (`nf4`) with double quantization. - **LoRA Configuration:** - Rank (`r`): **16** - Alpha (`lora_alpha`): **32** - Target Modules: `q_proj`, `v_proj` - Dropout: `0.05` - **Training Hyperparameters:** - Epochs: **3** - Learning Rate: `2e-4` - Optimizer: `adamw_torch` - Effective Batch Size: **16** (2 per device * 8 accumulation steps) - Scheduler: Linear warmup ### Training and Validation Loss The model showed consistent improvement over 3 epochs, with the validation loss decreasing steadily, indicating good generalization. | Epoch | Training Loss | Validation Loss | |:-----:|:-------------:|:---------------:| | 1 | 0.464400 | 0.448071 | | 2 | 0.396200 | 0.402424 | | 3 | 0.394500 | 0.399548 | ### Evaluation Results The model achieves an overall accuracy of **86.03%** on the held-out test set. **Classification Report:** | Category | Precision | Recall | F1-Score | Support | |:---------------------|:---------:|:------:|:--------:|:-------:| | **Education** | 1.00 | 0.35 | 0.52 | 125 | | **Travel & Transport** | 1.00 | 1.00 | 1.00 | 125 | | **Groceries** | 0.95 | 1.00 | 0.98 | 125 | | **Miscellaneous** | 0.42 | 0.90 | 0.58 | 125 | | **Bills & Utilities**| 1.00 | 1.00 | 1.00 | 125 | | **Health & Fitness** | 1.00 | 1.00 | 1.00 | 125 | | **Shopping** | 0.93 | 0.62 | 0.74 | 125 | | **Entertainment** | 0.83 | 0.77 | 0.80 | 125 | | **Investments** | 0.85 | 0.75 | 0.80 | 126 | | **Income** | 1.00 | 0.97 | 0.98 | 126 | | **Food & Drinks** | 0.99 | 0.97 | 0.98 | 126 | | **Withdrawals** | 0.95 | 1.00 | 0.97 | 125 | | | | | | | | **Macro Avg** | 0.91 | 0.86 | 0.86 | 1503 | | **Weighted Avg** | 0.91 | 0.86 | 0.86 | 1503 | **Confusion Matrix:** ``` [[ 44 0 6 54 0 0 3 16 1 0 1 0] -> Education [ 0 125 0 0 0 0 0 0 0 0 0 0] -> Travel & Transport [ 0 0 125 0 0 0 0 0 0 0 0 0] -> Groceries [ 0 0 0 113 0 0 0 2 3 0 0 7] -> Miscellaneous [ 0 0 0 0 125 0 0 0 0 0 0 0] -> Bills & Utilities [ 0 0 0 0 0 125 0 0 0 0 0 0] -> Health & Fitness [ 0 0 0 42 0 0 77 2 4 0 0 0] -> Shopping [ 0 0 0 18 0 0 3 96 8 0 0 0] -> Entertainment [ 0 0 0 32 0 0 0 0 94 0 0 0] -> Investments [ 0 0 0 4 0 0 0 0 0 122 0 0] -> Income [ 0 0 0 4 0 0 0 0 0 0 122 0] -> Food & Drinks [ 0 0 0 0 0 0 0 0 0 0 0 125]] -> Withdrawals ``` ### Performance Analysis & Limitations - **High-Performing Categories:** The model is extremely reliable for clear, unambiguous categories like `Travel & Transport`, `Groceries`, `Bills & Utilities`, and `Health & Fitness`, achieving perfect or near-perfect precision and recall. - **Areas for Improvement:** - **`Education` vs. `Miscellaneous`:** The model's biggest weakness is distinguishing `Education` transactions. The recall is very low (0.35), and the confusion matrix shows that **54 out of 125** education-related transactions were incorrectly classified as `Miscellaneous`. This suggests the descriptions for these categories may be too similar or that the model needs more distinct examples. - **`Miscellaneous` Precision:** The precision for `Miscellaneous` is low (0.42) because many other classes are incorrectly "dumped" into it when the model is uncertain. - **Ambiguous Commercial Categories:** There is some confusion between `Shopping`, `Entertainment`, and `Investments`, which is expected as transaction descriptions in these areas can be vague (e.g., a payment to a large conglomerate like Amazon could be for goods, services, or media). This model provides a strong baseline for transaction classification but could be improved with a more diverse dataset, especially with clearer examples to differentiate the weaker categories.