Ranjit Behera
commited on
Commit
Β·
354e581
1
Parent(s):
dcc24f8
Add PyPI package, Colab demo, and Schema Contract
Browse files- Package built and ready for PyPI (dist/)
- Interactive Colab notebook (examples/demo.ipynb)
- Documented output JSON schema as contract
- Updated badges and installation instructions
- README.md +163 -134
- examples/demo.ipynb +202 -0
README.md
CHANGED
|
@@ -20,190 +20,219 @@ pipeline_tag: text-generation
|
|
| 20 |
|
| 21 |
# Finance Entity Extractor (FinEE) v1.0
|
| 22 |
|
| 23 |
-
<a href="https://
|
| 24 |
-
<img src="https://img.shields.io/
|
|
|
|
|
|
|
|
|
|
| 25 |
</a>
|
| 26 |
<a href="https://opensource.org/licenses/MIT">
|
| 27 |
<img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License">
|
| 28 |
</a>
|
| 29 |
-
<a href="https://
|
| 30 |
-
<img src="https://
|
| 31 |
-
</a>
|
| 32 |
-
<a href="https://github.com/ggerganov/llama.cpp">
|
| 33 |
-
<img src="https://img.shields.io/badge/GGUF-Compatible-purple?style=for-the-badge" alt="GGUF">
|
| 34 |
-
</a>
|
| 35 |
-
<a href="https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml">
|
| 36 |
-
<img src="https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml/badge.svg" alt="Tests">
|
| 37 |
</a>
|
| 38 |
|
| 39 |
<br>
|
| 40 |
|
| 41 |
-
**
|
| 42 |
<br>
|
| 43 |
-
*
|
| 44 |
-
|
| 45 |
-
[ [Model Card](https://huggingface.co/Ranjit0034/finance-entity-extractor) ] Β· [ [GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor) ] Β· [ [Quick Start](#quick-start-with-finee-library) ]
|
| 46 |
|
| 47 |
</div>
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
-
##
|
| 52 |
|
| 53 |
-
|
|
|
|
|
|
|
| 54 |
|
| 55 |
-
|
| 56 |
-
|-------|------------|-------------------------|---------------|------|
|
| 57 |
-
| **FinEE-3.8B (Ours)** | 3.8B | **94.5%** | **45ms** | Free |
|
| 58 |
-
| Llama-3-8B-Instruct | 8B | 89.4% | 120ms | Free |
|
| 59 |
-
| GPT-3.5-Turbo | ~175B | 94.1% | ~500ms | $0.002/1K |
|
| 60 |
-
| GPT-4 | ~1.7T | 96.8% | ~800ms | $0.03/1K |
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
| macOS Apple Silicon | MLX | β
Full Support |
|
| 67 |
-
| Linux + NVIDIA GPU | PyTorch/Transformers | β
Full Support |
|
| 68 |
-
| Linux + CPU | PyTorch/GGUF | β
Full Support |
|
| 69 |
-
| Windows | GGUF/llama.cpp | β
Full Support |
|
| 70 |
|
| 71 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 72 |
|
| 73 |
-
|
| 74 |
|
| 75 |
-
|
| 76 |
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
```
|
| 88 |
|
| 89 |
-
|
| 90 |
|
| 91 |
-
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
#
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
#
|
| 110 |
-
|
| 111 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
```
|
| 113 |
|
| 114 |
-
|
|
|
|
|
|
|
| 115 |
|
| 116 |
```bash
|
| 117 |
-
#
|
| 118 |
finee extract "Rs.500 debited from A/c 1234"
|
| 119 |
|
| 120 |
# Check available backends
|
| 121 |
finee backends
|
|
|
|
|
|
|
|
|
|
| 122 |
```
|
| 123 |
|
| 124 |
---
|
| 125 |
|
| 126 |
-
##
|
| 127 |
-
|
| 128 |
-
This project demonstrates how to:
|
| 129 |
-
1. **Parse** 40K+ emails from a Gmail MBOX export
|
| 130 |
-
2. **Classify** emails into categories using Phi-3 Mini
|
| 131 |
-
3. **Discover** patterns in financial emails (transactions, amounts, dates)
|
| 132 |
-
4. **Fine-tune** a local LLM using LoRA for entity extraction
|
| 133 |
-
5. **Extract** structured data: amount, transaction type, account, date, reference
|
| 134 |
-
|
| 135 |
-
## ποΈ Project Structure
|
| 136 |
|
| 137 |
```
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
β
|
| 143 |
-
|
| 144 |
-
β
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
β
|
| 148 |
-
β
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
```
|
| 156 |
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
| Entity | Description | Example |
|
| 160 |
-
|--------|-------------|---------|
|
| 161 |
-
| `amount` | Transaction amount | "2500.00" |
|
| 162 |
-
| `type` | Debit or Credit | "debit" |
|
| 163 |
-
| `account` | Account identifier | "3545" |
|
| 164 |
-
| `date` | Transaction date | "28-12-25" |
|
| 165 |
-
| `reference` | UPI/NEFT reference | "534567891234" |
|
| 166 |
-
| `merchant` | Merchant name | "swiggy" |
|
| 167 |
-
| `category` | Transaction category | "food" |
|
| 168 |
-
| `confidence` | Extraction confidence | "HIGH" |
|
| 169 |
-
|
| 170 |
-
## π Benchmark Results
|
| 171 |
-
|
| 172 |
-
### Multi-Bank Validation (v8)
|
| 173 |
-
|
| 174 |
-
| Bank | Field Accuracy | Status |
|
| 175 |
-
|------|----------------|--------|
|
| 176 |
-
| ICICI | 96.2% | β
|
|
| 177 |
-
| HDFC | 95.0% | β
|
|
| 178 |
-
| SBI | 93.3% | β
|
|
| 179 |
-
| Axis | 93.3% | β
|
|
| 180 |
-
| Kotak | 92.0% | β
|
|
| 181 |
-
| **Overall** | **94.5%** | β
|
|
| 182 |
-
|
| 183 |
-
### Field-Level Accuracy
|
| 184 |
-
|
| 185 |
-
| Field | Accuracy |
|
| 186 |
-
|-------|----------|
|
| 187 |
-
| Amount | 98.5% |
|
| 188 |
-
| Type | 99.2% |
|
| 189 |
-
| Date | 97.8% |
|
| 190 |
-
| Account | 96.1% |
|
| 191 |
-
| Reference | 72.7% |
|
| 192 |
|
| 193 |
## π€ Contributing
|
| 194 |
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
|
|
|
|
|
|
|
|
|
| 198 |
|
| 199 |
-
|
| 200 |
|
| 201 |
-
##
|
| 202 |
|
| 203 |
-
- [
|
| 204 |
-
- [MLX team](https://github.com/ml-explore) for the amazing framework
|
| 205 |
-
- [Hugging Face](https://huggingface.co/) for model hosting
|
| 206 |
|
| 207 |
---
|
| 208 |
|
|
|
|
|
|
|
| 209 |
**Made with β€οΈ by Ranjit Behera**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
# Finance Entity Extractor (FinEE) v1.0
|
| 22 |
|
| 23 |
+
<a href="https://pypi.org/project/finee/">
|
| 24 |
+
<img src="https://img.shields.io/pypi/v/finee?style=for-the-badge&logo=pypi&logoColor=white" alt="PyPI">
|
| 25 |
+
</a>
|
| 26 |
+
<a href="https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml">
|
| 27 |
+
<img src="https://github.com/Ranjitbehera0034/Finance-Entity-Extractor/actions/workflows/tests.yml/badge.svg" alt="Tests">
|
| 28 |
</a>
|
| 29 |
<a href="https://opensource.org/licenses/MIT">
|
| 30 |
<img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License">
|
| 31 |
</a>
|
| 32 |
+
<a href="https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb">
|
| 33 |
+
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
</a>
|
| 35 |
|
| 36 |
<br>
|
| 37 |
|
| 38 |
+
**Extract structured financial data from Indian banking messages in one command.**
|
| 39 |
<br>
|
| 40 |
+
*94.5% field accuracy across HDFC, ICICI, SBI, Axis, Kotak.*
|
|
|
|
|
|
|
| 41 |
|
| 42 |
</div>
|
| 43 |
|
| 44 |
---
|
| 45 |
|
| 46 |
+
## β‘ One-Command Installation
|
| 47 |
|
| 48 |
+
```bash
|
| 49 |
+
pip install finee
|
| 50 |
+
```
|
| 51 |
|
| 52 |
+
That's it. No cloning, no setup.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
---
|
| 55 |
+
|
| 56 |
+
## π 30-Second Quick Start
|
| 57 |
+
|
| 58 |
+
```python
|
| 59 |
+
from finee import extract
|
| 60 |
|
| 61 |
+
# Parse any Indian bank message
|
| 62 |
+
result = extract("Rs.2500 debited from A/c XX3545 to swiggy@ybl on 28-12-2025")
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
+
print(result.amount) # 2500.0
|
| 65 |
+
print(result.merchant) # "Swiggy"
|
| 66 |
+
print(result.category) # "food"
|
| 67 |
+
print(result.confidence) # Confidence.HIGH
|
| 68 |
+
```
|
| 69 |
|
| 70 |
+
**Try it live:** [](https://colab.research.google.com/github/Ranjitbehera0034/Finance-Entity-Extractor/blob/main/examples/demo.ipynb)
|
| 71 |
|
| 72 |
+
---
|
| 73 |
|
| 74 |
+
## π Output Schema Contract
|
| 75 |
+
|
| 76 |
+
Every extraction returns a guaranteed JSON structure:
|
| 77 |
+
|
| 78 |
+
```json
|
| 79 |
+
{
|
| 80 |
+
"amount": 2500.0, // float - Always numeric, never "Rs. 2,500"
|
| 81 |
+
"currency": "INR", // string - ISO 4217 code
|
| 82 |
+
"type": "debit", // string - "debit" | "credit"
|
| 83 |
+
"account": "3545", // string - Last 4 digits only
|
| 84 |
+
"date": "28-12-2025", // string - DD-MM-YYYY format
|
| 85 |
+
"reference": "534567891234",// string - UPI/NEFT reference
|
| 86 |
+
"merchant": "Swiggy", // string - Normalized name (not "VPA-SWIGGY-BLR")
|
| 87 |
+
"category": "food", // string - Enum: food|shopping|transport|bills|...
|
| 88 |
+
"vpa": "swiggy@ybl", // string - Raw VPA
|
| 89 |
+
"confidence": 0.95, // float - 0.0 to 1.0
|
| 90 |
+
"confidence_level": "HIGH" // string - "LOW" | "MEDIUM" | "HIGH"
|
| 91 |
+
}
|
| 92 |
+
```
|
| 93 |
|
| 94 |
+
### Type Definitions (TypeScript-style)
|
| 95 |
+
|
| 96 |
+
```typescript
|
| 97 |
+
interface ExtractionResult {
|
| 98 |
+
amount: number | null;
|
| 99 |
+
currency: "INR";
|
| 100 |
+
type: "debit" | "credit" | null;
|
| 101 |
+
account: string | null;
|
| 102 |
+
date: string | null; // DD-MM-YYYY
|
| 103 |
+
reference: string | null;
|
| 104 |
+
merchant: string | null;
|
| 105 |
+
category: Category | null;
|
| 106 |
+
vpa: string | null;
|
| 107 |
+
confidence: number; // 0.0 - 1.0
|
| 108 |
+
confidence_level: "LOW" | "MEDIUM" | "HIGH";
|
| 109 |
+
}
|
| 110 |
+
|
| 111 |
+
type Category =
|
| 112 |
+
| "food" | "shopping" | "transport" | "bills"
|
| 113 |
+
| "entertainment" | "travel" | "grocery" | "fuel"
|
| 114 |
+
| "healthcare" | "education" | "investment" | "transfer" | "other";
|
| 115 |
```
|
| 116 |
|
| 117 |
+
---
|
| 118 |
|
| 119 |
+
## π¦ Supported Banks
|
| 120 |
+
|
| 121 |
+
| Bank | Debit | Credit | UPI | NEFT/IMPS |
|
| 122 |
+
|------|:-----:|:------:|:---:|:---------:|
|
| 123 |
+
| HDFC | β
| β
| β
| β
|
|
| 124 |
+
| ICICI | β
| β
| β
| β
|
|
| 125 |
+
| SBI | β
| β
| β
| β
|
|
| 126 |
+
| Axis | β
| β
| β
| β
|
|
| 127 |
+
| Kotak | β
| β
| β
| β
|
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## π Benchmark
|
| 132 |
|
| 133 |
+
| Metric | Value |
|
| 134 |
+
|--------|-------|
|
| 135 |
+
| Field Accuracy | 94.5% |
|
| 136 |
+
| Latency (Regex mode) | <1ms |
|
| 137 |
+
| Latency (LLM mode) | ~50ms |
|
| 138 |
+
| Throughput | 50,000+ msg/sec |
|
| 139 |
+
|
| 140 |
+
---
|
| 141 |
+
|
| 142 |
+
## π§ Installation Options
|
| 143 |
+
|
| 144 |
+
```bash
|
| 145 |
+
# Core (Regex + Rules only, no ML)
|
| 146 |
+
pip install finee
|
| 147 |
+
|
| 148 |
+
# With Apple Silicon backend
|
| 149 |
+
pip install "finee[metal]"
|
| 150 |
+
|
| 151 |
+
# With NVIDIA GPU backend
|
| 152 |
+
pip install "finee[cuda]"
|
| 153 |
+
|
| 154 |
+
# With CPU backend (llama.cpp)
|
| 155 |
+
pip install "finee[cpu]"
|
| 156 |
```
|
| 157 |
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## π» CLI Usage
|
| 161 |
|
| 162 |
```bash
|
| 163 |
+
# Extract from text
|
| 164 |
finee extract "Rs.500 debited from A/c 1234"
|
| 165 |
|
| 166 |
# Check available backends
|
| 167 |
finee backends
|
| 168 |
+
|
| 169 |
+
# Show version
|
| 170 |
+
finee --version
|
| 171 |
```
|
| 172 |
|
| 173 |
---
|
| 174 |
|
| 175 |
+
## ποΈ Architecture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 176 |
|
| 177 |
```
|
| 178 |
+
Input Text
|
| 179 |
+
β
|
| 180 |
+
βΌ
|
| 181 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 182 |
+
β TIER 0: Hash Cache (<1ms if seen before) β
|
| 183 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 184 |
+
β
|
| 185 |
+
βΌ
|
| 186 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 187 |
+
β TIER 1: Regex Engine β
|
| 188 |
+
β Extract: amount, date, reference, account, vpa, type β
|
| 189 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 190 |
+
β
|
| 191 |
+
βΌ
|
| 192 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 193 |
+
β TIER 2: Rule-Based Mapping β
|
| 194 |
+
β Map: vpa β merchant, merchant β category β
|
| 195 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 196 |
+
β
|
| 197 |
+
βΌ
|
| 198 |
+
ββββββββββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββββββββββββββββββββββββββββ
|
| 199 |
+
β TIER 3: LLM (Optional, for missing fields) β
|
| 200 |
+
β Targeted prompts for: merchant, category only β
|
| 201 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 202 |
+
β
|
| 203 |
+
βΌ
|
| 204 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 205 |
+
β TIER 4: Validation + Normalization β
|
| 206 |
+
β JSON repair, date normalization, confidence scoring β
|
| 207 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 208 |
+
β
|
| 209 |
+
βΌ
|
| 210 |
+
ExtractionResult (Guaranteed Schema)
|
| 211 |
```
|
| 212 |
|
| 213 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 214 |
|
| 215 |
## π€ Contributing
|
| 216 |
|
| 217 |
+
```bash
|
| 218 |
+
git clone https://github.com/Ranjitbehera0034/Finance-Entity-Extractor.git
|
| 219 |
+
cd Finance-Entity-Extractor
|
| 220 |
+
pip install -e ".[dev]"
|
| 221 |
+
pytest tests/
|
| 222 |
+
```
|
| 223 |
|
| 224 |
+
---
|
| 225 |
|
| 226 |
+
## π License
|
| 227 |
|
| 228 |
+
MIT License - see [LICENSE](LICENSE)
|
|
|
|
|
|
|
| 229 |
|
| 230 |
---
|
| 231 |
|
| 232 |
+
<div align="center">
|
| 233 |
+
|
| 234 |
**Made with β€οΈ by Ranjit Behera**
|
| 235 |
+
|
| 236 |
+
[GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor) Β· [PyPI](https://pypi.org/project/finee/) Β· [Hugging Face](https://huggingface.co/Ranjit0034/finance-entity-extractor)
|
| 237 |
+
|
| 238 |
+
</div>
|
examples/demo.ipynb
ADDED
|
@@ -0,0 +1,202 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"nbformat": 4,
|
| 3 |
+
"nbformat_minor": 0,
|
| 4 |
+
"metadata": {
|
| 5 |
+
"colab": {
|
| 6 |
+
"provenance": [],
|
| 7 |
+
"gpuType": "T4"
|
| 8 |
+
},
|
| 9 |
+
"kernelspec": {
|
| 10 |
+
"name": "python3",
|
| 11 |
+
"display_name": "Python 3"
|
| 12 |
+
},
|
| 13 |
+
"language_info": {
|
| 14 |
+
"name": "python"
|
| 15 |
+
}
|
| 16 |
+
},
|
| 17 |
+
"cells": [
|
| 18 |
+
{
|
| 19 |
+
"cell_type": "markdown",
|
| 20 |
+
"source": [
|
| 21 |
+
"# π¦ FinEE - Finance Entity Extractor\n",
|
| 22 |
+
"\n",
|
| 23 |
+
"**Extract structured financial data from Indian banking messages in seconds.**\n",
|
| 24 |
+
"\n",
|
| 25 |
+
"This notebook demonstrates the `finee` Python package - a production-ready tool for parsing bank transaction messages."
|
| 26 |
+
],
|
| 27 |
+
"metadata": {
|
| 28 |
+
"id": "intro"
|
| 29 |
+
}
|
| 30 |
+
},
|
| 31 |
+
{
|
| 32 |
+
"cell_type": "markdown",
|
| 33 |
+
"source": [
|
| 34 |
+
"## π¦ Installation\n",
|
| 35 |
+
"\n",
|
| 36 |
+
"Install the package directly from PyPI:"
|
| 37 |
+
],
|
| 38 |
+
"metadata": {
|
| 39 |
+
"id": "install_header"
|
| 40 |
+
}
|
| 41 |
+
},
|
| 42 |
+
{
|
| 43 |
+
"cell_type": "code",
|
| 44 |
+
"execution_count": null,
|
| 45 |
+
"metadata": {
|
| 46 |
+
"id": "install"
|
| 47 |
+
},
|
| 48 |
+
"outputs": [],
|
| 49 |
+
"source": [
|
| 50 |
+
"!pip install finee -q"
|
| 51 |
+
]
|
| 52 |
+
},
|
| 53 |
+
{
|
| 54 |
+
"cell_type": "markdown",
|
| 55 |
+
"source": [
|
| 56 |
+
"## π Quick Demo\n",
|
| 57 |
+
"\n",
|
| 58 |
+
"Let's extract entities from a real HDFC Bank UPI transaction message:"
|
| 59 |
+
],
|
| 60 |
+
"metadata": {
|
| 61 |
+
"id": "demo_header"
|
| 62 |
+
}
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"cell_type": "code",
|
| 66 |
+
"source": [
|
| 67 |
+
"from finee import extract\n",
|
| 68 |
+
"\n",
|
| 69 |
+
"# Sample HDFC Bank UPI transaction\n",
|
| 70 |
+
"message = \"\"\"\n",
|
| 71 |
+
"HDFC Bank: Rs.2,500.00 debited from A/c XX3545 on 28-12-2025.\n",
|
| 72 |
+
"VPA: swiggy@ybl. UPI Ref: 534567891234.\n",
|
| 73 |
+
"Not you? Call 18002586161\n",
|
| 74 |
+
"\"\"\"\n",
|
| 75 |
+
"\n",
|
| 76 |
+
"# Extract entities (uses Regex + Rules, no GPU needed)\n",
|
| 77 |
+
"result = extract(message)\n",
|
| 78 |
+
"\n",
|
| 79 |
+
"# Print structured output\n",
|
| 80 |
+
"print(\"π Extracted Entities:\")\n",
|
| 81 |
+
"print(f\" Amount: βΉ{result.amount}\")\n",
|
| 82 |
+
"print(f\" Type: {result.type}\")\n",
|
| 83 |
+
"print(f\" Account: ****{result.account}\")\n",
|
| 84 |
+
"print(f\" Date: {result.date}\")\n",
|
| 85 |
+
"print(f\" Reference: {result.reference}\")\n",
|
| 86 |
+
"print(f\" Merchant: {result.merchant}\")\n",
|
| 87 |
+
"print(f\" Category: {result.category}\")\n",
|
| 88 |
+
"print(f\" Confidence: {result.confidence.value}\")"
|
| 89 |
+
],
|
| 90 |
+
"metadata": {
|
| 91 |
+
"id": "demo_hdfc"
|
| 92 |
+
},
|
| 93 |
+
"execution_count": null,
|
| 94 |
+
"outputs": []
|
| 95 |
+
},
|
| 96 |
+
{
|
| 97 |
+
"cell_type": "markdown",
|
| 98 |
+
"source": [
|
| 99 |
+
"## π JSON Output\n",
|
| 100 |
+
"\n",
|
| 101 |
+
"Get the result as a clean JSON object:"
|
| 102 |
+
],
|
| 103 |
+
"metadata": {
|
| 104 |
+
"id": "json_header"
|
| 105 |
+
}
|
| 106 |
+
},
|
| 107 |
+
{
|
| 108 |
+
"cell_type": "code",
|
| 109 |
+
"source": [
|
| 110 |
+
"import json\n",
|
| 111 |
+
"\n",
|
| 112 |
+
"# Export as JSON\n",
|
| 113 |
+
"json_output = result.to_dict()\n",
|
| 114 |
+
"print(json.dumps(json_output, indent=2))"
|
| 115 |
+
],
|
| 116 |
+
"metadata": {
|
| 117 |
+
"id": "json_output"
|
| 118 |
+
},
|
| 119 |
+
"execution_count": null,
|
| 120 |
+
"outputs": []
|
| 121 |
+
},
|
| 122 |
+
{
|
| 123 |
+
"cell_type": "markdown",
|
| 124 |
+
"source": [
|
| 125 |
+
"## π¦ Multi-Bank Support\n",
|
| 126 |
+
"\n",
|
| 127 |
+
"FinEE works across all major Indian banks:"
|
| 128 |
+
],
|
| 129 |
+
"metadata": {
|
| 130 |
+
"id": "multibank_header"
|
| 131 |
+
}
|
| 132 |
+
},
|
| 133 |
+
{
|
| 134 |
+
"cell_type": "code",
|
| 135 |
+
"source": [
|
| 136 |
+
"banks = {\n",
|
| 137 |
+
" \"ICICI\": \"Dear Customer, Rs.1500 debited from Acct XX9876 on 15-01-2025 to amazon@apl. Ref: 987654321012\",\n",
|
| 138 |
+
" \"SBI\": \"SBI: Rs.350 debited from a/c XX1234 on 10-01-25. UPI txn to zomato@paytm. Ref: 456789012345\",\n",
|
| 139 |
+
" \"Axis\": \"Axis Bank: INR 800 debited from A/c 5678 on 05-01-2025. Info: UPI-UBER. Bal: Rs.12,500\",\n",
|
| 140 |
+
" \"Kotak\": \"Rs.2000 credited to Kotak A/c XX4321 on 20-01-2025 from rahul.sharma@okicici. Ref: 321654987012\"\n",
|
| 141 |
+
"}\n",
|
| 142 |
+
"\n",
|
| 143 |
+
"print(\"π¦ Multi-Bank Extraction Results:\\n\")\n",
|
| 144 |
+
"for bank, msg in banks.items():\n",
|
| 145 |
+
" r = extract(msg)\n",
|
| 146 |
+
" print(f\"{bank:6} | βΉ{str(r.amount):>8} | {r.type:6} | {(r.merchant or 'N/A'):12} | {r.confidence.value}\")"
|
| 147 |
+
],
|
| 148 |
+
"metadata": {
|
| 149 |
+
"id": "multibank_demo"
|
| 150 |
+
},
|
| 151 |
+
"execution_count": null,
|
| 152 |
+
"outputs": []
|
| 153 |
+
},
|
| 154 |
+
{
|
| 155 |
+
"cell_type": "markdown",
|
| 156 |
+
"source": [
|
| 157 |
+
"## β‘ Performance\n",
|
| 158 |
+
"\n",
|
| 159 |
+
"The Regex+Rules pipeline is blazing fast:"
|
| 160 |
+
],
|
| 161 |
+
"metadata": {
|
| 162 |
+
"id": "perf_header"
|
| 163 |
+
}
|
| 164 |
+
},
|
| 165 |
+
{
|
| 166 |
+
"cell_type": "code",
|
| 167 |
+
"source": [
|
| 168 |
+
"import time\n",
|
| 169 |
+
"\n",
|
| 170 |
+
"# Benchmark\n",
|
| 171 |
+
"test_msg = \"Rs.500 debited from A/c 1234 to paytm@ybl on 01-01-2025\"\n",
|
| 172 |
+
"\n",
|
| 173 |
+
"start = time.time()\n",
|
| 174 |
+
"for _ in range(1000):\n",
|
| 175 |
+
" extract(test_msg)\n",
|
| 176 |
+
"elapsed = (time.time() - start) * 1000 # ms\n",
|
| 177 |
+
"\n",
|
| 178 |
+
"print(f\"β‘ 1000 extractions in {elapsed:.1f}ms\")\n",
|
| 179 |
+
"print(f\" Average: {elapsed/1000:.3f}ms per message\")\n",
|
| 180 |
+
"print(f\" Throughput: {1000000/elapsed:.0f} messages/second\")"
|
| 181 |
+
],
|
| 182 |
+
"metadata": {
|
| 183 |
+
"id": "benchmark"
|
| 184 |
+
},
|
| 185 |
+
"execution_count": null,
|
| 186 |
+
"outputs": []
|
| 187 |
+
},
|
| 188 |
+
{
|
| 189 |
+
"cell_type": "markdown",
|
| 190 |
+
"source": [
|
| 191 |
+
"## π Learn More\n",
|
| 192 |
+
"\n",
|
| 193 |
+
"- π¦ **PyPI**: `pip install finee`\n",
|
| 194 |
+
"- π **GitHub**: [Ranjitbehera0034/Finance-Entity-Extractor](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor)\n",
|
| 195 |
+
"- π€ **Model**: [Ranjit0034/finance-entity-extractor](https://huggingface.co/Ranjit0034/finance-entity-extractor)"
|
| 196 |
+
],
|
| 197 |
+
"metadata": {
|
| 198 |
+
"id": "links"
|
| 199 |
+
}
|
| 200 |
+
}
|
| 201 |
+
]
|
| 202 |
+
}
|