|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
- kn |
|
|
metrics: |
|
|
- accuracy |
|
|
base_model: |
|
|
- distilbert/distilbert-base-multilingual-cased |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- legal |
|
|
- code-mixed |
|
|
--- |
|
|
|
|
|
# IndicLaw-Class: Code-Mixed Legal Intent Classifier |
|
|
|
|
|
`IndicLaw-Class` is a lightweight multilingual transformer-based classifier that identifies legal intent from code-mixed Indian queries (e.g., Kannada-English, Hinglish). It is fine-tuned on citizen-style queries for real-world legal triage applications. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
- **Architecture**: [`distilbert-base-multilingual-cased`](https://huggingface.co/distilbert-base-multilingual-cased) |
|
|
- **Task**: Multi-class text classification (6 legal categories) |
|
|
- **Input Style**: Informal, code-mixed queries like: |
|
|
- `divorce file maadbeku without husband consent` |
|
|
- `builder flat delay case haakbeku` |
|
|
- `rent refund maadbeku, owner refusing` |
|
|
|
|
|
--- |
|
|
|
|
|
## Legal Categories |
|
|
|
|
|
The model classifies input into one of the following categories: |
|
|
|
|
|
| Label | Description | |
|
|
|------------------|------------------------------------| |
|
|
| Family Law | Divorce, custody, alimony, marriage | |
|
|
| Property Law | Inheritance, land disputes, transfer | |
|
|
| Criminal Law | FIRs, police misconduct, assault | |
|
|
| Consumer Complaints | E-commerce, refund issues, builders | |
|
|
| Rent & Tenancy | Eviction, deposit disputes, lease | |
|
|
| Public Services | Certificates, ID updates, ration | |
|
|
|
|
|
--- |
|
|
|
|
|
## Environmental Impact |
|
|
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact/#compute) presented in Lacoste et al. (2019). |
|
|
|
|
|
Hardware Type: More information needed |
|
|
Hours used: More information needed |
|
|
Cloud Provider: More information needed |
|
|
Compute Region: More information needed |
|
|
Carbon Emitted: More information needed |
|
|
|
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{nishanth_prakash_2025, |
|
|
author = { nishanth prakash }, |
|
|
title = { IndicLaw-Class (Revision 87ae96e) }, |
|
|
year = 2025, |
|
|
url = { https://huggingface.co/nprak26/IndicLaw-Class }, |
|
|
doi = { 10.57967/hf/5964 }, |
|
|
publisher = { Hugging Face } |
|
|
} |
|
|
``` |
|
|
--- |
|
|
## How to Get Started With the Model |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer from your local folder |
|
|
model_dir = "./indiclaw-classifier" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_dir) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
|
|
|
|
|
# Load label map (from labels.txt you saved earlier) |
|
|
label_map = {} |
|
|
with open(f"{model_dir}/labels.txt", "r") as f: |
|
|
for line in f: |
|
|
idx, label = line.strip().split("\t") |
|
|
label_map[int(idx)] = label |
|
|
|
|
|
# Create pipeline |
|
|
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer) |
|
|
|
|
|
# Test inputs |
|
|
examples = [ |
|
|
"wife divorce file maadbeku", |
|
|
"flat possession delay aadmele builder case file madbeku", |
|
|
"tenant evict maadbeku no notice" |
|
|
] |
|
|
|
|
|
# Run predictions |
|
|
for text in examples: |
|
|
result = classifier(text)[0] |
|
|
label_str = result["label"] |
|
|
if "label" in label_str.lower(): |
|
|
label_id = int(label_str.split("_")[-1]) |
|
|
else: |
|
|
label_id = int(label_str) |
|
|
label_name = label_map[label_id] |
|
|
print(f"Input: {text}\nPredicted: {label_name} (confidence: {result['score']:.2f})\n") |
|
|
|
|
|
|
|
|
--- |