| | --- |
| | license: mit |
| | language: |
| | - en |
| | - hi |
| | tags: |
| | - ner |
| | - address-parsing |
| | - indian-addresses |
| | - bert |
| | - crf |
| | datasets: |
| | - custom |
| | metrics: |
| | - f1 |
| | - precision |
| | - recall |
| | model-index: |
| | - name: indian-address-parser-model |
| | results: |
| | - task: |
| | type: token-classification |
| | name: Named Entity Recognition |
| | metrics: |
| | - type: f1 |
| | value: 0.80 |
| | name: F1 (micro) |
| | - type: precision |
| | value: 0.79 |
| | name: Precision (micro) |
| | - type: recall |
| | value: 0.81 |
| | name: Recall (micro) |
| | --- |
| | |
| | # Indian Address Parser Model |
| |
|
| | A fine-tuned **IndicBERTv2-SS + CRF** model for parsing unstructured Indian addresses into structured components. |
| |
|
| | ## Model Description |
| |
|
| | - **Base Model**: [ai4bharat/IndicBERTv2-SS](https://huggingface.co/ai4bharat/IndicBERTv2-SS) |
| | - **Architecture**: BERT + Conditional Random Field (CRF) layer |
| | - **Languages**: English, Hindi (Latin and Devanagari scripts) |
| | - **Training Data**: 600+ annotated Delhi addresses |
| |
|
| | ## Performance |
| |
|
| | | Entity Type | Precision | Recall | F1-Score | |
| | |---------------|-----------|--------|----------| |
| | | AREA | 0.87 | 0.87 | 0.87 | |
| | | CITY | 1.00 | 1.00 | 1.00 | |
| | | FLOOR | 0.85 | 0.85 | 0.85 | |
| | | GALI | 0.75 | 0.67 | 0.71 | |
| | | HOUSE_NUMBER | 0.79 | 0.79 | 0.79 | |
| | | KHASRA | 0.75 | 0.82 | 0.78 | |
| | | PINCODE | 1.00 | 1.00 | 1.00 | |
| | | **Overall** | **0.79** | **0.81**| **0.80** | |
| | |
| | ## Supported Entity Types |
| | |
| | - `HOUSE_NUMBER` - House/Plot/Flat numbers |
| | - `FLOOR` - Floor indicators (Ground, First, etc.) |
| | - `BLOCK` - Block identifiers |
| | - `SECTOR` - Sector numbers |
| | - `GALI` - Gali (lane) numbers |
| | - `COLONY` - Colony/Society names |
| | - `AREA` - Area/Locality names |
| | - `SUBAREA` - Sub-area names |
| | - `KHASRA` - Khasra (land record) numbers |
| | - `PINCODE` - 6-digit postal codes |
| | - `CITY` - City names |
| | - `STATE` - State names |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from address_parser import AddressParser |
| | |
| | # Load model |
| | parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model") |
| | |
| | # Parse address |
| | result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041") |
| | |
| | # Access structured output |
| | print(result.house_number) # "PLOT NO752" |
| | print(result.floor) # "FIRST FLOOR" |
| | print(result.city) # "NEW DELHI" |
| | print(result.pincode) # "110041" |
| | ``` |
| |
|
| | ## Demo |
| |
|
| | Try the live demo: [HuggingFace Space](https://huggingface.co/spaces/YOUR_USERNAME/indian-address-parser) |
| |
|
| | ## License |
| |
|
| | MIT License |
| |
|