File size: 6,608 Bytes
dcc24f8 6a76e07 810c162 6a76e07 dcc24f8 810c162 dcc24f8 810c162 dcc24f8 810c162 dcc24f8 354e581 dcc24f8 354e581 dcc24f8 6a76e07 dcc24f8 6a76e07 354e581 dcc24f8 810c162 dcc24f8 354e581 dcc24f8 354e581 6a76e07 354e581 6a76e07 810c162 354e581 dcc24f8 354e581 dcc24f8 6a76e07 354e581 6a76e07 dcc24f8 354e581 810c162 354e581 810c162 354e581 810c162 354e581 810c162 dcc24f8 354e581 810c162 354e581 810c162 dcc24f8 354e581 dcc24f8 354e581 810c162 354e581 6a76e07 354e581 810c162 354e581 dcc24f8 354e581 dcc24f8 6a76e07 810c162 6a76e07 810c162 6a76e07 dcc24f8 810c162 dcc24f8 354e581 dcc24f8 354e581 dcc24f8 810c162 dcc24f8 354e581 dcc24f8 354e581 810c162 354e581 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
---
language:
- en
license: mit
library_name: transformers
tags:
- finance
- entity-extraction
- ner
- phi-3
- production
- indian-banking
base_model: microsoft/Phi-3-mini-4k-instruct
pipeline_tag: text-generation
---
<div align="center">
# Finance Entity Extractor (FinEE) v1.0
[](https://pypi.org/project/finee/)
[](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml)
[](https://opensource.org/licenses/MIT)
[](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
**Production-grade Finance NER for Indian Banks**
<br>
*Hybrid Regex + Phi-3 LLM β’ 94.5% accuracy β’ <1ms latency*
</div>
---
## π₯ Hybrid Architecture
> **Runs 100% offline using Regex by default.**
> **Optional 3.8B LLM auto-downloads only for complex edge cases.**
| Mode | Latency | Accuracy | Model Download |
|------|---------|----------|----------------|
| **Regex (Default)** | <1ms | 87% | β None |
| **Regex + LLM** | ~50ms | 94.5% | β
7GB (one-time) |
---
## β‘ Install in 10 Seconds
```bash
pip install finee
```
```python
from finee import extract
r = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")
print(r.amount) # 2500.0
print(r.merchant) # "Swiggy"
print(r.category) # "food"
```
**Try it now:** [](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
---
## π§ Enable LLM Mode (For Edge Cases)
```python
from finee import FinEE
from finee.schema import ExtractionConfig
# Downloads 7GB model once, then runs locally
extractor = FinEE(ExtractionConfig(use_llm=True))
result = extractor.extract("Your complex bank message...")
```
**Supported Backends:**
- Apple Silicon β MLX (fastest)
- NVIDIA GPU β PyTorch/CUDA
- CPU β llama.cpp (GGUF)
---
## π Output Schema Contract
Every extraction returns this **guaranteed JSON structure**:
```json
{
"amount": 2500.0, // float - Always numeric
"currency": "INR", // string - ISO 4217
"type": "debit", // "debit" | "credit"
"account": "3545", // string - Last 4 digits
"date": "28-12-2025", // string - DD-MM-YYYY
"reference": "534567891234",// string - UPI/NEFT ref
"merchant": "Swiggy", // string - Normalized name
"category": "food", // string - food|shopping|transport|...
"confidence": 0.95 // float - 0.0 to 1.0
}
```
---
## π¬ Verify Accuracy Yourself
```bash
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
cd Finance-Entity-Extractor
pip install finee
python benchmark.py --all
```
---
## π Edge Case Handling
| Input | Result |
|-------|--------|
| `Rs.500.00debited from A/c1234` (no spaces) | β
amount=500.0 |
| `βΉ2,500 debited` (Unicode) | β
amount=2500.0 |
| `1.5 Lakh credited` (Lakhs) | β
amount=150000.0 |
| `Rs.500 debited. Bal: Rs.15,000` (multiple) | β
amount=500.0 |
---
## π¦ Supported Banks
| Bank | Status |
|------|--------|
| HDFC | β
|
| ICICI | β
|
| SBI | β
|
| Axis | β
|
| Kotak | β
|
---
## π Benchmark
| Metric | Value |
|--------|-------|
| **Field Accuracy** | 94.5% (with LLM) |
| **Regex-only Accuracy** | 87.5% |
| **Latency (Regex)** | <1ms |
| **Throughput** | 50,000+ msg/sec |
---
## ποΈ Architecture
```
Input Text
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 0: Hash Cache (<1ms if seen before) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 1: Regex Engine (50+ patterns) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 2: Rule-Based Mapping (200+ VPA β merchant) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TIER 3: Phi-3 LLM (Optional - downloads 7GB model) β
β Only called for edge cases β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ExtractionResult (Guaranteed Schema)
```
---
## π Repository Structure
```
Finance-Entity-Extractor/
βββ src/finee/ # Core package
βββ tests/ # 88 unit tests
βββ examples/demo.ipynb # π Try in Colab!
βββ benchmark.py # Verify accuracy
βββ CHANGELOG.md # Release history
βββ CONTRIBUTING.md # How to contribute
```
---
## π€ Contributing
See [CONTRIBUTING.md](CONTRIBUTING.md) for:
- Git Flow branching strategy
- How to run tests
- Release process
---
## π License
MIT License
---
<div align="center">
**Made with β€οΈ by Ranjit Behera**
[PyPI](https://pypi.org/project/finee/) β’ [GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor) β’ [Hugging Face](https://huggingface.co/Ranjit0034/finance-entity-extractor)
</div>
|