Ranjit0034 commited on
Commit
082dc5c
·
verified ·
1 Parent(s): a60c3fc

Upload docs/model_cards/finee-llama-8b-README.md with huggingface_hub

Browse files
docs/model_cards/finee-llama-8b-README.md ADDED
@@ -0,0 +1,140 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - hi
6
+ - ta
7
+ - te
8
+ - bn
9
+ - kn
10
+ tags:
11
+ - finance
12
+ - entity-extraction
13
+ - indian-banking
14
+ - llama
15
+ - finee
16
+ base_model: meta-llama/Llama-3.1-8B-Instruct
17
+ datasets:
18
+ - Ranjit0034/finee-dataset
19
+ pipeline_tag: text-generation
20
+ ---
21
+
22
+ # FinEE Llama 8B - Indian Financial Entity Extractor
23
+
24
+ <p align="center">
25
+ <img src="https://img.shields.io/badge/Model-Llama_3.1_8B-blue" alt="Model">
26
+ <img src="https://img.shields.io/badge/Task-Entity_Extraction-green" alt="Task">
27
+ <img src="https://img.shields.io/badge/Languages-6-orange" alt="Languages">
28
+ <img src="https://img.shields.io/badge/Accuracy-95%2B-brightgreen" alt="Accuracy">
29
+ </p>
30
+
31
+ ## Model Description
32
+
33
+ FinEE Llama 8B is a fine-tuned version of Llama 3.1 8B Instruct, specialized for extracting financial entities from Indian banking messages (SMS, emails, statements).
34
+
35
+ ### Key Features
36
+
37
+ - 🏦 **Multi-Bank Support**: HDFC, ICICI, SBI, Axis, Kotak, and 20+ Indian banks
38
+ - 💳 **All Transactions**: UPI, NEFT, IMPS, Credit Card, EMI, Refunds
39
+ - 🌐 **Multilingual**: English, Hindi, Tamil, Telugu, Bengali, Kannada
40
+ - 📊 **Structured Output**: Clean JSON with all entities
41
+ - ⚡ **Fast**: <100ms per extraction (quantized)
42
+
43
+ ## Usage
44
+
45
+ ### With Transformers
46
+
47
+ ```python
48
+ from transformers import AutoModelForCausalLM, AutoTokenizer
49
+
50
+ model = AutoModelForCausalLM.from_pretrained(
51
+ "Ranjit0034/finee-llama-8b",
52
+ torch_dtype="auto",
53
+ device_map="auto"
54
+ )
55
+ tokenizer = AutoTokenizer.from_pretrained("Ranjit0034/finee-llama-8b")
56
+
57
+ message = "HDFC Bank: Rs.2,500 debited from A/c XX1234 on 12-Jan-26. UPI:swiggy@ybl. Ref:123456789012"
58
+
59
+ prompt = f"""Extract financial entities from this message:
60
+
61
+ {message}
62
+
63
+ JSON:"""
64
+
65
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
66
+ outputs = model.generate(**inputs, max_new_tokens=256)
67
+ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
68
+ print(result)
69
+ ```
70
+
71
+ ### With MLX (Apple Silicon)
72
+
73
+ ```python
74
+ from mlx_lm import load, generate
75
+
76
+ model, tokenizer = load("Ranjit0034/finee-llama-8b")
77
+ output = generate(model, tokenizer, prompt, max_tokens=256)
78
+ print(output)
79
+ ```
80
+
81
+ ### With FinEE Package
82
+
83
+ ```python
84
+ from finee import FinancialExtractor
85
+
86
+ extractor = FinancialExtractor(model="Ranjit0034/finee-llama-8b")
87
+ result = extractor.extract("HDFC Bank: Rs.2,500 debited...")
88
+ print(result)
89
+ # {'amount': 2500.0, 'type': 'debit', 'merchant': 'Swiggy', 'category': 'food'}
90
+ ```
91
+
92
+ ## Output Schema
93
+
94
+ ```json
95
+ {
96
+ "amount": 2500.0,
97
+ "type": "debit",
98
+ "account": "1234",
99
+ "bank": "HDFC",
100
+ "date": "2026-01-12",
101
+ "reference": "123456789012",
102
+ "merchant": "Swiggy",
103
+ "vpa": "swiggy@ybl",
104
+ "category": "food",
105
+ "is_p2m": true
106
+ }
107
+ ```
108
+
109
+ ## Training
110
+
111
+ - **Base Model**: meta-llama/Llama-3.1-8B-Instruct
112
+ - **Training Data**: 152K+ samples ([finee-dataset](https://huggingface.co/datasets/Ranjit0034/finee-dataset))
113
+ - **Method**: LoRA fine-tuning (rank=16)
114
+ - **Hardware**: Apple M2 Ultra (MLX)
115
+
116
+ ## Benchmarks
117
+
118
+ | Metric | Score |
119
+ |--------|-------|
120
+ | Amount Accuracy | 99.2% |
121
+ | Type Accuracy | 98.5% |
122
+ | Merchant Detection | 92.3% |
123
+ | Category Accuracy | 88.7% |
124
+ | Overall F1 | 94.8% |
125
+
126
+ ## Limitations
127
+
128
+ - Optimized for Indian banking messages
129
+ - May not work well with non-Indian formats
130
+ - Requires structured input (not handwritten)
131
+
132
+ ## Related
133
+
134
+ - 📦 [FinEE Package](https://pypi.org/project/finee/) - Python library
135
+ - 📊 [Training Dataset](https://huggingface.co/datasets/Ranjit0034/finee-dataset)
136
+ - 💻 [GitHub](https://github.com/Ranjitbehera0034/Finance-Entity-Extractor)
137
+
138
+ ## License
139
+
140
+ Apache 2.0