Finance Entity Extractor (FinEE) v1.0
Production-grade Finance NER for Indian Banks
Hybrid Regex + Phi-3 LLM β’ 94.5% accuracy β’ <1ms latency
π₯ Hybrid Architecture
Runs 100% offline using Regex by default. Optional 3.8B LLM auto-downloads only for complex edge cases.
| Mode | Latency | Accuracy | Model Download |
|---|---|---|---|
| Regex (Default) | <1ms | 87% | β None |
| Regex + LLM | ~50ms | 94.5% | β 7GB (one-time) |
β‘ Install in 10 Seconds
pip install finee
from finee import extract
r = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")
print(r.amount) # 2500.0
print(r.merchant) # "Swiggy"
print(r.category) # "food"
π§ Enable LLM Mode (For Edge Cases)
from finee import FinEE
from finee.schema import ExtractionConfig
# Downloads 7GB model once, then runs locally
extractor = FinEE(ExtractionConfig(use_llm=True))
result = extractor.extract("Your complex bank message...")
Supported Backends:
- Apple Silicon β MLX (fastest)
- NVIDIA GPU β PyTorch/CUDA
- CPU β llama.cpp (GGUF)
π Output Schema Contract
Every extraction returns this guaranteed JSON structure:
{
"amount": 2500.0, // float - Always numeric
"currency": "INR", // string - ISO 4217
"type": "debit", // "debit" | "credit"
"account": "3545", // string - Last 4 digits
"date": "28-12-2025", // string - DD-MM-YYYY
"reference": "534567891234",// string - UPI/NEFT ref
"merchant": "Swiggy", // string - Normalized name
"category": "food", // string - food|shopping|transport|...
"confidence": 0.95 // float - 0.0 to 1.0
}
π¬ Verify Accuracy Yourself
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install finee
python benchmark.py --all
π Edge Case Handling
| Input | Result |
|---|---|
Rs.500.00debited from A/c1234 (no spaces) |
β amount=500.0 |
βΉ2,500 debited (Unicode) |
β amount=2500.0 |
1.5 Lakh credited (Lakhs) |
β amount=150000.0 |
Rs.500 debited. Bal: Rs.15,000 (multiple) |
β amount=500.0 |
π¦ Supported Banks
| Bank | Status |
|---|---|
| HDFC | β |
| ICICI | β |
| SBI | β |
| Axis | β |
| Kotak | β |
π Benchmark
| Metric | Value |
|---|---|
| Field Accuracy | 94.5% (with LLM) |
| Regex-only Accuracy | 87.5% |
| Latency (Regex) | <1ms |
| Throughput | 50,000+ msg/sec |
ποΈ Architecture
Input Text
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 0: Hash Cache (<1ms if seen before) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 1: Regex Engine (50+ patterns) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 2: Rule-Based Mapping (200+ VPA β merchant) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 3: Phi-3 LLM (Optional - downloads 7GB model) β
β Only called for edge cases β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ExtractionResult (Guaranteed Schema)
π Repository Structure
Finance-Entity-Extractor/
βββ src/finee/ # Core package
βββ tests/ # 88 unit tests
βββ examples/demo.ipynb # π Try in Colab!
βββ benchmark.py # Verify accuracy
βββ CHANGELOG.md # Release history
βββ CONTRIBUTING.md # How to contribute
π€ Contributing
See CONTRIBUTING.md for:
- Git Flow branching strategy
- How to run tests
- Release process
π License
MIT License
Made with β€οΈ by Ranjit Behera
PyPI β’ GitHub β’ Hugging Face
- Downloads last month
- 319
Model tree for Ranjit0034/finance-entity-extractor
Base model
microsoft/Phi-3-mini-4k-instruct