File size: 2,580 Bytes
31d170a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
license: mit
language:
  - en
  - hi
tags:
  - ner
  - address-parsing
  - indian-addresses
  - bert
  - crf
datasets:
  - custom
metrics:
  - f1
  - precision
  - recall
model-index:
  - name: indian-address-parser-model
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        metrics:
          - type: f1
            value: 0.80
            name: F1 (micro)
          - type: precision
            value: 0.79
            name: Precision (micro)
          - type: recall
            value: 0.81
            name: Recall (micro)
---

# Indian Address Parser Model

A fine-tuned **IndicBERTv2-SS + CRF** model for parsing unstructured Indian addresses into structured components.

## Model Description

- **Base Model**: [ai4bharat/IndicBERTv2-SS](https://huggingface.co/ai4bharat/IndicBERTv2-SS)
- **Architecture**: BERT + Conditional Random Field (CRF) layer
- **Languages**: English, Hindi (Latin and Devanagari scripts)
- **Training Data**: 600+ annotated Delhi addresses

## Performance

| Entity Type   | Precision | Recall | F1-Score |
|---------------|-----------|--------|----------|
| AREA          | 0.87      | 0.87   | 0.87     |
| CITY          | 1.00      | 1.00   | 1.00     |
| FLOOR         | 0.85      | 0.85   | 0.85     |
| GALI          | 0.75      | 0.67   | 0.71     |
| HOUSE_NUMBER  | 0.79      | 0.79   | 0.79     |
| KHASRA        | 0.75      | 0.82   | 0.78     |
| PINCODE       | 1.00      | 1.00   | 1.00     |
| **Overall**   | **0.79**  | **0.81**| **0.80** |

## Supported Entity Types

- `HOUSE_NUMBER` - House/Plot/Flat numbers
- `FLOOR` - Floor indicators (Ground, First, etc.)
- `BLOCK` - Block identifiers
- `SECTOR` - Sector numbers
- `GALI` - Gali (lane) numbers
- `COLONY` - Colony/Society names
- `AREA` - Area/Locality names
- `SUBAREA` - Sub-area names
- `KHASRA` - Khasra (land record) numbers
- `PINCODE` - 6-digit postal codes
- `CITY` - City names
- `STATE` - State names

## Usage

```python
from address_parser import AddressParser

# Load model
parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model")

# Parse address
result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041")

# Access structured output
print(result.house_number)  # "PLOT NO752"
print(result.floor)         # "FIRST FLOOR"
print(result.city)          # "NEW DELHI"
print(result.pincode)       # "110041"
```

## Demo

Try the live demo: [HuggingFace Space](https://huggingface.co/spaces/YOUR_USERNAME/indian-address-parser)

## License

MIT License