Indian Address Parser Model
A fine-tuned IndicBERTv2-SS + CRF model for parsing unstructured Indian addresses into structured components.
Model Description
- Base Model: ai4bharat/IndicBERTv2-SS
- Architecture: BERT + Conditional Random Field (CRF) layer
- Languages: English, Hindi (Latin and Devanagari scripts)
- Training Data: 600+ annotated Delhi addresses
Performance
| Entity Type | Precision | Recall | F1-Score |
|---|---|---|---|
| AREA | 0.87 | 0.87 | 0.87 |
| CITY | 1.00 | 1.00 | 1.00 |
| FLOOR | 0.85 | 0.85 | 0.85 |
| GALI | 0.75 | 0.67 | 0.71 |
| HOUSE_NUMBER | 0.79 | 0.79 | 0.79 |
| KHASRA | 0.75 | 0.82 | 0.78 |
| PINCODE | 1.00 | 1.00 | 1.00 |
| Overall | 0.79 | 0.81 | 0.80 |
Supported Entity Types
HOUSE_NUMBER- House/Plot/Flat numbersFLOOR- Floor indicators (Ground, First, etc.)BLOCK- Block identifiersSECTOR- Sector numbersGALI- Gali (lane) numbersCOLONY- Colony/Society namesAREA- Area/Locality namesSUBAREA- Sub-area namesKHASRA- Khasra (land record) numbersPINCODE- 6-digit postal codesCITY- City namesSTATE- State names
Usage
from address_parser import AddressParser
# Load model
parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model")
# Parse address
result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041")
# Access structured output
print(result.house_number) # "PLOT NO752"
print(result.floor) # "FIRST FLOOR"
print(result.city) # "NEW DELHI"
print(result.pincode) # "110041"
Demo
Try the live demo: HuggingFace Space
License
MIT License
- Downloads last month
- 48
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Space using x2aqq/indian-address-parser-model 1
Evaluation results
- F1 (micro)self-reported0.800
- Precision (micro)self-reported0.790
- Recall (micro)self-reported0.810