Ranjit Behera commited on
Commit Β·
810c162
1
Parent(s): 2b1ff82
docs: clarify hybrid architecture (Regex default + optional LLM)
Browse files
README.md
CHANGED
|
@@ -21,17 +21,30 @@ pipeline_tag: text-generation
|
|
| 21 |
[](https://pypi.org/project/finee/)
|
| 22 |
[](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml)
|
| 23 |
[](https://opensource.org/licenses/MIT)
|
|
|
|
| 24 |
[](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
|
| 25 |
|
| 26 |
-
**
|
| 27 |
<br>
|
| 28 |
-
*94.5%
|
| 29 |
|
| 30 |
</div>
|
| 31 |
|
| 32 |
---
|
| 33 |
|
| 34 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
```bash
|
| 37 |
pip install finee
|
|
@@ -47,7 +60,25 @@ print(r.merchant) # "Swiggy"
|
|
| 47 |
print(r.category) # "food"
|
| 48 |
```
|
| 49 |
|
| 50 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
---
|
| 53 |
|
|
@@ -65,9 +96,7 @@ Every extraction returns this **guaranteed JSON structure**:
|
|
| 65 |
"reference": "534567891234",// string - UPI/NEFT ref
|
| 66 |
"merchant": "Swiggy", // string - Normalized name
|
| 67 |
"category": "food", // string - food|shopping|transport|...
|
| 68 |
-
"
|
| 69 |
-
"confidence": 0.95, // float - 0.0 to 1.0
|
| 70 |
-
"confidence_level": "HIGH" // "LOW" | "MEDIUM" | "HIGH"
|
| 71 |
}
|
| 72 |
```
|
| 73 |
|
|
@@ -75,55 +104,46 @@ Every extraction returns this **guaranteed JSON structure**:
|
|
| 75 |
|
| 76 |
## π¬ Verify Accuracy Yourself
|
| 77 |
|
| 78 |
-
Don't trust "99% accuracy" claims. **Run the benchmark:**
|
| 79 |
-
|
| 80 |
```bash
|
| 81 |
-
# Clone and test
|
| 82 |
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
|
| 83 |
cd Finance-Entity-Extractor
|
| 84 |
pip install finee
|
| 85 |
-
|
| 86 |
-
# Run benchmark
|
| 87 |
python benchmark.py --all
|
| 88 |
```
|
| 89 |
|
| 90 |
-
**Test on YOUR data:**
|
| 91 |
-
```bash
|
| 92 |
-
python benchmark.py --file your_transactions.jsonl
|
| 93 |
-
```
|
| 94 |
-
|
| 95 |
---
|
| 96 |
|
| 97 |
-
## π
|
| 98 |
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
|
| 101 |
-
|
| 102 |
-
|-----------|-------|--------|
|
| 103 |
-
| **Missing spaces** | `Rs.500.00debited from A/c1234` | β
amount=500.0 |
|
| 104 |
-
| **Weird formatting** | `Rs 2,500/-debited dt:28/12/25` | β
amount=2500.0 |
|
| 105 |
-
| **Mixed case** | `RS. 1500 DEBITED from ACCT` | β
amount=1500.0, type=debit |
|
| 106 |
-
| **Unicode symbols** | `βΉ2,500 debited from β’β’β’β’ 3545` | β
amount=2500.0 |
|
| 107 |
-
| **Multiple amounts** | `Rs.500 debited. Bal: Rs.15,000` | β
amount=500.0 (first) |
|
| 108 |
-
| **Truncated SMS** | `Rs.2500 debited from A/c...3545 to swi...` | β
amount=2500.0 |
|
| 109 |
-
| **Extra noise** | `ALERT! Dear Customer, Rs.500 debited... Ignore if done by you.` | β
amount=500.0 |
|
| 110 |
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
---
|
| 117 |
|
| 118 |
-
##
|
| 119 |
|
| 120 |
-
|
|
| 121 |
-
|------
|
| 122 |
-
|
|
| 123 |
-
|
|
| 124 |
-
|
|
| 125 |
-
|
|
| 126 |
-
| Kotak | β
| β
| β
| β
|
|
| 127 |
|
| 128 |
---
|
| 129 |
|
|
@@ -139,20 +159,18 @@ Input Text
|
|
| 139 |
β
|
| 140 |
βΌ
|
| 141 |
ββββββββββββββββββββββββββοΏ½οΏ½ββββββββββββββββββββββββββββββββββββ
|
| 142 |
-
β TIER 1: Regex Engine (50+
|
| 143 |
-
β Extract: amount, date, reference, account, vpa, type β
|
| 144 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 145 |
β
|
| 146 |
βΌ
|
| 147 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 148 |
β TIER 2: Rule-Based Mapping (200+ VPA β merchant) β
|
| 149 |
-
β Map: vpa β merchant, merchant β category β
|
| 150 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 151 |
β
|
| 152 |
βΌ
|
| 153 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 154 |
-
β TIER 3: LLM (Optional
|
| 155 |
-
β
|
| 156 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 157 |
β
|
| 158 |
βΌ
|
|
@@ -161,66 +179,32 @@ ExtractionResult (Guaranteed Schema)
|
|
| 161 |
|
| 162 |
---
|
| 163 |
|
| 164 |
-
## π Benchmark Results
|
| 165 |
-
|
| 166 |
-
| Metric | Value |
|
| 167 |
-
|--------|-------|
|
| 168 |
-
| **Field Accuracy** | 94.5% |
|
| 169 |
-
| **Latency (Regex)** | <1ms |
|
| 170 |
-
| **Latency (LLM)** | ~50ms |
|
| 171 |
-
| **Throughput** | 50,000+ msg/sec |
|
| 172 |
-
| **Banks Tested** | 5 (HDFC, ICICI, SBI, Axis, Kotak) |
|
| 173 |
-
|
| 174 |
-
---
|
| 175 |
-
|
| 176 |
-
## π» CLI Usage
|
| 177 |
-
|
| 178 |
-
```bash
|
| 179 |
-
# Extract from text
|
| 180 |
-
finee extract "Rs.500 debited from A/c 1234"
|
| 181 |
-
|
| 182 |
-
# Show version
|
| 183 |
-
finee --version
|
| 184 |
-
|
| 185 |
-
# Check available backends
|
| 186 |
-
finee backends
|
| 187 |
-
```
|
| 188 |
-
|
| 189 |
-
---
|
| 190 |
-
|
| 191 |
## π Repository Structure
|
| 192 |
|
| 193 |
```
|
| 194 |
Finance-Entity-Extractor/
|
| 195 |
-
βββ src/finee/ # Core package
|
| 196 |
-
β βββ extractor.py # Pipeline orchestrator
|
| 197 |
-
β βββ regex_engine.py # 50+ regex patterns
|
| 198 |
-
β βββ merchants.py # 200+ VPA mappings
|
| 199 |
-
β βββ backends/ # MLX, PyTorch, GGUF
|
| 200 |
βββ tests/ # 88 unit tests
|
| 201 |
-
βββ examples/
|
| 202 |
-
βββ
|
| 203 |
-
βββ
|
| 204 |
-
|
| 205 |
-
βββ README.md
|
| 206 |
```
|
| 207 |
|
| 208 |
---
|
| 209 |
|
| 210 |
## π€ Contributing
|
| 211 |
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
pytest tests/
|
| 217 |
-
```
|
| 218 |
|
| 219 |
---
|
| 220 |
|
| 221 |
## π License
|
| 222 |
|
| 223 |
-
MIT License
|
| 224 |
|
| 225 |
---
|
| 226 |
|
|
@@ -228,6 +212,6 @@ MIT License - see [LICENSE](LICENSE)
|
|
| 228 |
|
| 229 |
**Made with β€οΈ by Ranjit Behera**
|
| 230 |
|
| 231 |
-
[PyPI](https://pypi.org/project/finee/)
|
| 232 |
|
| 233 |
</div>
|
|
|
|
| 21 |
[](https://pypi.org/project/finee/)
|
| 22 |
[](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml)
|
| 23 |
[](https://opensource.org/licenses/MIT)
|
| 24 |
+
|
| 25 |
[](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
|
| 26 |
|
| 27 |
+
**Production-grade Finance NER for Indian Banks**
|
| 28 |
<br>
|
| 29 |
+
*Hybrid Regex + Phi-3 LLM β’ 94.5% accuracy β’ <1ms latency*
|
| 30 |
|
| 31 |
</div>
|
| 32 |
|
| 33 |
---
|
| 34 |
|
| 35 |
+
## π₯ Hybrid Architecture
|
| 36 |
+
|
| 37 |
+
> **Runs 100% offline using Regex by default.**
|
| 38 |
+
> **Optional 3.8B LLM auto-downloads only for complex edge cases.**
|
| 39 |
+
|
| 40 |
+
| Mode | Latency | Accuracy | Model Download |
|
| 41 |
+
|------|---------|----------|----------------|
|
| 42 |
+
| **Regex (Default)** | <1ms | 87% | β None |
|
| 43 |
+
| **Regex + LLM** | ~50ms | 94.5% | β
7GB (one-time) |
|
| 44 |
+
|
| 45 |
+
---
|
| 46 |
+
|
| 47 |
+
## β‘ Install in 10 Seconds
|
| 48 |
|
| 49 |
```bash
|
| 50 |
pip install finee
|
|
|
|
| 60 |
print(r.category) # "food"
|
| 61 |
```
|
| 62 |
|
| 63 |
+
**Try it now:** [](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## π§ Enable LLM Mode (For Edge Cases)
|
| 68 |
+
|
| 69 |
+
```python
|
| 70 |
+
from finee import FinEE
|
| 71 |
+
from finee.schema import ExtractionConfig
|
| 72 |
+
|
| 73 |
+
# Downloads 7GB model once, then runs locally
|
| 74 |
+
extractor = FinEE(ExtractionConfig(use_llm=True))
|
| 75 |
+
result = extractor.extract("Your complex bank message...")
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
**Supported Backends:**
|
| 79 |
+
- Apple Silicon β MLX (fastest)
|
| 80 |
+
- NVIDIA GPU β PyTorch/CUDA
|
| 81 |
+
- CPU β llama.cpp (GGUF)
|
| 82 |
|
| 83 |
---
|
| 84 |
|
|
|
|
| 96 |
"reference": "534567891234",// string - UPI/NEFT ref
|
| 97 |
"merchant": "Swiggy", // string - Normalized name
|
| 98 |
"category": "food", // string - food|shopping|transport|...
|
| 99 |
+
"confidence": 0.95 // float - 0.0 to 1.0
|
|
|
|
|
|
|
| 100 |
}
|
| 101 |
```
|
| 102 |
|
|
|
|
| 104 |
|
| 105 |
## π¬ Verify Accuracy Yourself
|
| 106 |
|
|
|
|
|
|
|
| 107 |
```bash
|
|
|
|
| 108 |
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
|
| 109 |
cd Finance-Entity-Extractor
|
| 110 |
pip install finee
|
|
|
|
|
|
|
| 111 |
python benchmark.py --all
|
| 112 |
```
|
| 113 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
---
|
| 115 |
|
| 116 |
+
## π Edge Case Handling
|
| 117 |
|
| 118 |
+
| Input | Result |
|
| 119 |
+
|-------|--------|
|
| 120 |
+
| `Rs.500.00debited from A/c1234` (no spaces) | β
amount=500.0 |
|
| 121 |
+
| `βΉ2,500 debited` (Unicode) | β
amount=2500.0 |
|
| 122 |
+
| `1.5 Lakh credited` (Lakhs) | β
amount=150000.0 |
|
| 123 |
+
| `Rs.500 debited. Bal: Rs.15,000` (multiple) | β
amount=500.0 |
|
| 124 |
|
| 125 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
+
## π¦ Supported Banks
|
| 128 |
+
|
| 129 |
+
| Bank | Status |
|
| 130 |
+
|------|--------|
|
| 131 |
+
| HDFC | β
|
|
| 132 |
+
| ICICI | β
|
|
| 133 |
+
| SBI | β
|
|
| 134 |
+
| Axis | β
|
|
| 135 |
+
| Kotak | β
|
|
| 136 |
|
| 137 |
---
|
| 138 |
|
| 139 |
+
## π Benchmark
|
| 140 |
|
| 141 |
+
| Metric | Value |
|
| 142 |
+
|--------|-------|
|
| 143 |
+
| **Field Accuracy** | 94.5% (with LLM) |
|
| 144 |
+
| **Regex-only Accuracy** | 87.5% |
|
| 145 |
+
| **Latency (Regex)** | <1ms |
|
| 146 |
+
| **Throughput** | 50,000+ msg/sec |
|
|
|
|
| 147 |
|
| 148 |
---
|
| 149 |
|
|
|
|
| 159 |
β
|
| 160 |
βΌ
|
| 161 |
ββββββββββββββββββββββββββοΏ½οΏ½ββββββββββββββββββββββββββββββββββββ
|
| 162 |
+
β TIER 1: Regex Engine (50+ patterns) β
|
|
|
|
| 163 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 164 |
β
|
| 165 |
βΌ
|
| 166 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 167 |
β TIER 2: Rule-Based Mapping (200+ VPA β merchant) β
|
|
|
|
| 168 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 169 |
β
|
| 170 |
βΌ
|
| 171 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 172 |
+
β TIER 3: Phi-3 LLM (Optional - downloads 7GB model) β
|
| 173 |
+
β Only called for edge cases β
|
| 174 |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 175 |
β
|
| 176 |
βΌ
|
|
|
|
| 179 |
|
| 180 |
---
|
| 181 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 182 |
## π Repository Structure
|
| 183 |
|
| 184 |
```
|
| 185 |
Finance-Entity-Extractor/
|
| 186 |
+
βββ src/finee/ # Core package
|
|
|
|
|
|
|
|
|
|
|
|
|
| 187 |
βββ tests/ # 88 unit tests
|
| 188 |
+
βββ examples/demo.ipynb # π Try in Colab!
|
| 189 |
+
βββ benchmark.py # Verify accuracy
|
| 190 |
+
βββ CHANGELOG.md # Release history
|
| 191 |
+
βββ CONTRIBUTING.md # How to contribute
|
|
|
|
| 192 |
```
|
| 193 |
|
| 194 |
---
|
| 195 |
|
| 196 |
## π€ Contributing
|
| 197 |
|
| 198 |
+
See [CONTRIBUTING.md](CONTRIBUTING.md) for:
|
| 199 |
+
- Git Flow branching strategy
|
| 200 |
+
- How to run tests
|
| 201 |
+
- Release process
|
|
|
|
|
|
|
| 202 |
|
| 203 |
---
|
| 204 |
|
| 205 |
## π License
|
| 206 |
|
| 207 |
+
MIT License
|
| 208 |
|
| 209 |
---
|
| 210 |
|
|
|
|
| 212 |
|
| 213 |
**Made with β€οΈ by Ranjit Behera**
|
| 214 |
|
| 215 |
+
[PyPI](https://pypi.org/project/finee/) β’ [GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor) β’ [Hugging Face](https://huggingface.co/Ranjit0034/finance-entity-extractor)
|
| 216 |
|
| 217 |
</div>
|