x commited on
Commit
31d170a
·
verified ·
1 Parent(s): bea79cc

Add model card

Browse files
Files changed (1) hide show
  1. README.md +99 -0
README.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - hi
6
+ tags:
7
+ - ner
8
+ - address-parsing
9
+ - indian-addresses
10
+ - bert
11
+ - crf
12
+ datasets:
13
+ - custom
14
+ metrics:
15
+ - f1
16
+ - precision
17
+ - recall
18
+ model-index:
19
+ - name: indian-address-parser-model
20
+ results:
21
+ - task:
22
+ type: token-classification
23
+ name: Named Entity Recognition
24
+ metrics:
25
+ - type: f1
26
+ value: 0.80
27
+ name: F1 (micro)
28
+ - type: precision
29
+ value: 0.79
30
+ name: Precision (micro)
31
+ - type: recall
32
+ value: 0.81
33
+ name: Recall (micro)
34
+ ---
35
+
36
+ # Indian Address Parser Model
37
+
38
+ A fine-tuned **IndicBERTv2-SS + CRF** model for parsing unstructured Indian addresses into structured components.
39
+
40
+ ## Model Description
41
+
42
+ - **Base Model**: [ai4bharat/IndicBERTv2-SS](https://huggingface.co/ai4bharat/IndicBERTv2-SS)
43
+ - **Architecture**: BERT + Conditional Random Field (CRF) layer
44
+ - **Languages**: English, Hindi (Latin and Devanagari scripts)
45
+ - **Training Data**: 600+ annotated Delhi addresses
46
+
47
+ ## Performance
48
+
49
+ | Entity Type | Precision | Recall | F1-Score |
50
+ |---------------|-----------|--------|----------|
51
+ | AREA | 0.87 | 0.87 | 0.87 |
52
+ | CITY | 1.00 | 1.00 | 1.00 |
53
+ | FLOOR | 0.85 | 0.85 | 0.85 |
54
+ | GALI | 0.75 | 0.67 | 0.71 |
55
+ | HOUSE_NUMBER | 0.79 | 0.79 | 0.79 |
56
+ | KHASRA | 0.75 | 0.82 | 0.78 |
57
+ | PINCODE | 1.00 | 1.00 | 1.00 |
58
+ | **Overall** | **0.79** | **0.81**| **0.80** |
59
+
60
+ ## Supported Entity Types
61
+
62
+ - `HOUSE_NUMBER` - House/Plot/Flat numbers
63
+ - `FLOOR` - Floor indicators (Ground, First, etc.)
64
+ - `BLOCK` - Block identifiers
65
+ - `SECTOR` - Sector numbers
66
+ - `GALI` - Gali (lane) numbers
67
+ - `COLONY` - Colony/Society names
68
+ - `AREA` - Area/Locality names
69
+ - `SUBAREA` - Sub-area names
70
+ - `KHASRA` - Khasra (land record) numbers
71
+ - `PINCODE` - 6-digit postal codes
72
+ - `CITY` - City names
73
+ - `STATE` - State names
74
+
75
+ ## Usage
76
+
77
+ ```python
78
+ from address_parser import AddressParser
79
+
80
+ # Load model
81
+ parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model")
82
+
83
+ # Parse address
84
+ result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041")
85
+
86
+ # Access structured output
87
+ print(result.house_number) # "PLOT NO752"
88
+ print(result.floor) # "FIRST FLOOR"
89
+ print(result.city) # "NEW DELHI"
90
+ print(result.pincode) # "110041"
91
+ ```
92
+
93
+ ## Demo
94
+
95
+ Try the live demo: [HuggingFace Space](https://huggingface.co/spaces/YOUR_USERNAME/indian-address-parser)
96
+
97
+ ## License
98
+
99
+ MIT License