x2aqq's picture
Add model card
31d170a verified
metadata
license: mit
language:
  - en
  - hi
tags:
  - ner
  - address-parsing
  - indian-addresses
  - bert
  - crf
datasets:
  - custom
metrics:
  - f1
  - precision
  - recall
model-index:
  - name: indian-address-parser-model
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        metrics:
          - type: f1
            value: 0.8
            name: F1 (micro)
          - type: precision
            value: 0.79
            name: Precision (micro)
          - type: recall
            value: 0.81
            name: Recall (micro)

Indian Address Parser Model

A fine-tuned IndicBERTv2-SS + CRF model for parsing unstructured Indian addresses into structured components.

Model Description

  • Base Model: ai4bharat/IndicBERTv2-SS
  • Architecture: BERT + Conditional Random Field (CRF) layer
  • Languages: English, Hindi (Latin and Devanagari scripts)
  • Training Data: 600+ annotated Delhi addresses

Performance

Entity Type Precision Recall F1-Score
AREA 0.87 0.87 0.87
CITY 1.00 1.00 1.00
FLOOR 0.85 0.85 0.85
GALI 0.75 0.67 0.71
HOUSE_NUMBER 0.79 0.79 0.79
KHASRA 0.75 0.82 0.78
PINCODE 1.00 1.00 1.00
Overall 0.79 0.81 0.80

Supported Entity Types

  • HOUSE_NUMBER - House/Plot/Flat numbers
  • FLOOR - Floor indicators (Ground, First, etc.)
  • BLOCK - Block identifiers
  • SECTOR - Sector numbers
  • GALI - Gali (lane) numbers
  • COLONY - Colony/Society names
  • AREA - Area/Locality names
  • SUBAREA - Sub-area names
  • KHASRA - Khasra (land record) numbers
  • PINCODE - 6-digit postal codes
  • CITY - City names
  • STATE - State names

Usage

from address_parser import AddressParser

# Load model
parser = AddressParser.from_pretrained("YOUR_USERNAME/indian-address-parser-model")

# Parse address
result = parser.parse("PLOT NO752 FIRST FLOOR, BLOCK H-3, NEW DELHI, 110041")

# Access structured output
print(result.house_number)  # "PLOT NO752"
print(result.floor)         # "FIRST FLOOR"
print(result.city)          # "NEW DELHI"
print(result.pincode)       # "110041"

Demo

Try the live demo: HuggingFace Space

License

MIT License