IndicLaw-Class / README.md
nprak26's picture
Update README.md
d109545 verified
---
license: apache-2.0
language:
- en
- kn
metrics:
- accuracy
base_model:
- distilbert/distilbert-base-multilingual-cased
pipeline_tag: text-classification
tags:
- legal
- code-mixed
---
# IndicLaw-Class: Code-Mixed Legal Intent Classifier
`IndicLaw-Class` is a lightweight multilingual transformer-based classifier that identifies legal intent from code-mixed Indian queries (e.g., Kannada-English, Hinglish). It is fine-tuned on citizen-style queries for real-world legal triage applications.
---
## Model Overview
- **Architecture**: [`distilbert-base-multilingual-cased`](https://huggingface.co/distilbert-base-multilingual-cased)
- **Task**: Multi-class text classification (6 legal categories)
- **Input Style**: Informal, code-mixed queries like:
- `divorce file maadbeku without husband consent`
- `builder flat delay case haakbeku`
- `rent refund maadbeku, owner refusing`
---
## Legal Categories
The model classifies input into one of the following categories:
| Label | Description |
|------------------|------------------------------------|
| Family Law | Divorce, custody, alimony, marriage |
| Property Law | Inheritance, land disputes, transfer |
| Criminal Law | FIRs, police misconduct, assault |
| Consumer Complaints | E-commerce, refund issues, builders |
| Rent & Tenancy | Eviction, deposit disputes, lease |
| Public Services | Certificates, ID updates, ration |
---
## Environmental Impact
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact/#compute) presented in Lacoste et al. (2019).
Hardware Type: More information needed
Hours used: More information needed
Cloud Provider: More information needed
Compute Region: More information needed
Carbon Emitted: More information needed
---
## Citation
```bibtex
@misc{nishanth_prakash_2025,
author = { nishanth prakash },
title = { IndicLaw-Class (Revision 87ae96e) },
year = 2025,
url = { https://huggingface.co/nprak26/IndicLaw-Class },
doi = { 10.57967/hf/5964 },
publisher = { Hugging Face }
}
```
---
## How to Get Started With the Model
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
# Load model and tokenizer from your local folder
model_dir = "./indiclaw-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSequenceClassification.from_pretrained(model_dir)
# Load label map (from labels.txt you saved earlier)
label_map = {}
with open(f"{model_dir}/labels.txt", "r") as f:
for line in f:
idx, label = line.strip().split("\t")
label_map[int(idx)] = label
# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
# Test inputs
examples = [
"wife divorce file maadbeku",
"flat possession delay aadmele builder case file madbeku",
"tenant evict maadbeku no notice"
]
# Run predictions
for text in examples:
result = classifier(text)[0]
label_str = result["label"]
if "label" in label_str.lower():
label_id = int(label_str.split("_")[-1])
else:
label_id = int(label_str)
label_name = label_map[label_id]
print(f"Input: {text}\nPredicted: {label_name} (confidence: {result['score']:.2f})\n")
---