Finance Entity Extractor (FinEE) v1.0

PyPI Tests License

Open In Colab

Production-grade Finance NER for Indian Banks
Hybrid Regex + Phi-3 LLM β€’ 94.5% accuracy β€’ <1ms latency


πŸ”₯ Hybrid Architecture

Runs 100% offline using Regex by default. Optional 3.8B LLM auto-downloads only for complex edge cases.

Mode Latency Accuracy Model Download
Regex (Default) <1ms 87% ❌ None
Regex + LLM ~50ms 94.5% βœ… 7GB (one-time)

⚑ Install in 10 Seconds

pip install finee
from finee import extract

r = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")

print(r.amount)    # 2500.0
print(r.merchant)  # "Swiggy"
print(r.category)  # "food"

Try it now: Open In Colab


🧠 Enable LLM Mode (For Edge Cases)

from finee import FinEE
from finee.schema import ExtractionConfig

# Downloads 7GB model once, then runs locally
extractor = FinEE(ExtractionConfig(use_llm=True))
result = extractor.extract("Your complex bank message...")

Supported Backends:

  • Apple Silicon β†’ MLX (fastest)
  • NVIDIA GPU β†’ PyTorch/CUDA
  • CPU β†’ llama.cpp (GGUF)

πŸ“‹ Output Schema Contract

Every extraction returns this guaranteed JSON structure:

{
  "amount": 2500.0,           // float - Always numeric
  "currency": "INR",          // string - ISO 4217
  "type": "debit",            // "debit" | "credit"
  "account": "3545",          // string - Last 4 digits
  "date": "28-12-2025",       // string - DD-MM-YYYY
  "reference": "534567891234",// string - UPI/NEFT ref
  "merchant": "Swiggy",       // string - Normalized name
  "category": "food",         // string - food|shopping|transport|...
  "confidence": 0.95          // float - 0.0 to 1.0
}

πŸ”¬ Verify Accuracy Yourself

git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install finee
python benchmark.py --all

πŸ’€ Edge Case Handling

Input Result
Rs.500.00debited from A/c1234 (no spaces) βœ… amount=500.0
β‚Ή2,500 debited (Unicode) βœ… amount=2500.0
1.5 Lakh credited (Lakhs) βœ… amount=150000.0
Rs.500 debited. Bal: Rs.15,000 (multiple) βœ… amount=500.0

🏦 Supported Banks

Bank Status
HDFC βœ…
ICICI βœ…
SBI βœ…
Axis βœ…
Kotak βœ…

πŸ“Š Benchmark

Metric Value
Field Accuracy 94.5% (with LLM)
Regex-only Accuracy 87.5%
Latency (Regex) <1ms
Throughput 50,000+ msg/sec

πŸ—οΈ Architecture

Input Text
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TIER 0: Hash Cache (<1ms if seen before)                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TIER 1: Regex Engine (50+ patterns)                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TIER 2: Rule-Based Mapping (200+ VPA β†’ merchant)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ TIER 3: Phi-3 LLM (Optional - downloads 7GB model)         β”‚
β”‚         Only called for edge cases                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚
    β–Ό
ExtractionResult (Guaranteed Schema)

πŸ“ Repository Structure

Finance-Entity-Extractor/
β”œβ”€β”€ src/finee/              # Core package
β”œβ”€β”€ tests/                  # 88 unit tests
β”œβ”€β”€ examples/demo.ipynb     # πŸ‘ˆ Try in Colab!
β”œβ”€β”€ benchmark.py            # Verify accuracy
β”œβ”€β”€ CHANGELOG.md            # Release history
└── CONTRIBUTING.md         # How to contribute

🀝 Contributing

See CONTRIBUTING.md for:

  • Git Flow branching strategy
  • How to run tests
  • Release process

πŸ“„ License

MIT License


Made with ❀️ by Ranjit Behera

PyPI β€’ GitHub β€’ Hugging Face

Downloads last month
319
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Ranjit0034/finance-entity-extractor

Quantized
(152)
this model