| --- |
| license: mit |
| language: |
| - en |
| - hi |
| tags: |
| - ner |
| - address-parsing |
| - indian-addresses |
| - bert |
| - crf |
| datasets: |
| - custom |
| metrics: |
| - f1 |
| - precision |
| - recall |
| model-index: |
| - name: indian-address-parser-model |
| results: |
| - task: |
| type: token-classification |
| name: Named Entity Recognition |
| metrics: |
| - type: f1 |
| value: 0.80 |
| name: F1 (micro) |
| - type: precision |
| value: 0.79 |
| name: Precision (micro) |
| - type: recall |
| value: 0.81 |
| name: Recall (micro) |
| --- |
| |
| # Indian Address Parser Model |
|
|
| A fine-tuned **IndicBERTv2-SS + CRF** model for parsing unstructured Indian addresses into structured components. |
|
|
| ## Model Description |
|
|
| - **Base Model**: [ai4bharat/IndicBERTv2-SS](https://huggingface.co/ai4bharat/IndicBERTv2-SS) |
| - **Architecture**: BERT + Conditional Random Field (CRF) layer |
| - **Languages**: English, Hindi (Latin and Devanagari scripts) |
| - **Training Data**: 600+ annotated Delhi addresses |
|
|
| ## Performance |
|
|
| | Entity Type | Precision | Recall | F1-Score | |
| |---------------|-----------|--------|----------| |
| | AREA | 0.87 | 0.87 | 0.87 | |
| | CITY | 1.00 | 1.00 | 1.00 | |
| | FLOOR | 0.85 | 0.85 | 0.85 | |
| | GALI | 0.75 | 0.67 | 0.71 | |
| | HOUSE_NUMBER | 0.79 | 0.79 | 0.79 | |
| | KHASRA | 0.75 | 0.82 | 0.78 | |
| | PINCODE | 1.00 | 1.00 | 1.00 | |
| | **Overall** | **0.79** | **0.81**| **0.80** | |
| |
| ## Supported Entity Types |
| |
| - `HOUSE_NUMBER` - House/Plot/Flat numbers |
| - `FLOOR` - Floor indicators (Ground, First, etc.) |
| - `BLOCK` - Block identifiers |
| - `SECTOR` - Sector numbers |
| - `GALI` - Gali (lane) numbers |
| - `COLONY` - Colony/Society names |
| - `AREA` - Area/Locality names |
| - `SUBAREA` - Sub-area names |
| - `KHASRA` - Khasra (land record) numbers |
| - `PINCODE` - 6-digit postal codes |
| - `CITY` - City names |
| - `STATE` - State names |
|
|
| ## Usage |
|
|
| ```python |
| from address_parser import AddressParser |
| |
| # Load model |
| parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model") |
| |
| # Parse address |
| result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041") |
| |
| # Access structured output |
| print(result.house_number) # "PLOT NO752" |
| print(result.floor) # "FIRST FLOOR" |
| print(result.city) # "NEW DELHI" |
| print(result.pincode) # "110041" |
| ``` |
|
|
| ## Demo |
|
|
| Try the live demo: [HuggingFace Space](https://huggingface.co/spaces/YOUR_USERNAME/indian-address-parser) |
|
|
| ## License |
|
|
| MIT License |
|
|