Indian Address Parser Model

A fine-tuned IndicBERTv2-SS + CRF model for parsing unstructured Indian addresses into structured components.

Model Description

  • Base Model: ai4bharat/IndicBERTv2-SS
  • Architecture: BERT + Conditional Random Field (CRF) layer
  • Languages: English, Hindi (Latin and Devanagari scripts)
  • Training Data: 600+ annotated Delhi addresses

Performance

Entity Type Precision Recall F1-Score
AREA 0.87 0.87 0.87
CITY 1.00 1.00 1.00
FLOOR 0.85 0.85 0.85
GALI 0.75 0.67 0.71
HOUSE_NUMBER 0.79 0.79 0.79
KHASRA 0.75 0.82 0.78
PINCODE 1.00 1.00 1.00
Overall 0.79 0.81 0.80

Supported Entity Types

  • HOUSE_NUMBER - House/Plot/Flat numbers
  • FLOOR - Floor indicators (Ground, First, etc.)
  • BLOCK - Block identifiers
  • SECTOR - Sector numbers
  • GALI - Gali (lane) numbers
  • COLONY - Colony/Society names
  • AREA - Area/Locality names
  • SUBAREA - Sub-area names
  • KHASRA - Khasra (land record) numbers
  • PINCODE - 6-digit postal codes
  • CITY - City names
  • STATE - State names

Usage

from address_parser import AddressParser

# Load model
parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model")

# Parse address
result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041")

# Access structured output
print(result.house_number)  # "PLOT NO752"
print(result.floor)         # "FIRST FLOOR"
print(result.city)          # "NEW DELHI"
print(result.pincode)       # "110041"

Demo

Try the live demo: HuggingFace Space

License

MIT License

Downloads last month
48
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using x2aqq/indian-address-parser-model 1

Evaluation results