|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- ne |
|
|
- en |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
base_model: sentence-transformers/all-MiniLM-L6-v2 |
|
|
new_version: 1.0.0 |
|
|
pipeline_tag: text-classification |
|
|
library_name: scikit-learn |
|
|
tags: |
|
|
- hybrid-model |
|
|
- logistic-regression |
|
|
- sentence-transformers |
|
|
- sbert |
|
|
- ne-en |
|
|
- rule-based |
|
|
- text-priority |
|
|
- low-resource-nlp |
|
|
- multilingual |
|
|
- civictech |
|
|
- complaint-triage |
|
|
- emergency-detection |
|
|
eval_results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Priority Detection (Nepali + English) |
|
|
dataset: |
|
|
name: priority_clean.csv (custom) |
|
|
type: csv |
|
|
size: 266 samples |
|
|
metrics: |
|
|
accuracy: 0.725 |
|
|
f1_macro: 0.72 |
|
|
precision_macro: 0.73 |
|
|
recall_macro: 0.73 |
|
|
per_class: |
|
|
HIGH: |
|
|
precision: 0.73 |
|
|
recall: 0.66 |
|
|
f1: 0.69 |
|
|
MEDIUM: |
|
|
precision: 0.74 |
|
|
recall: 0.8 |
|
|
f1: 0.76 |
|
|
LOW: |
|
|
precision: 0.71 |
|
|
recall: 0.72 |
|
|
f1: 0.71 |
|
|
--- |
|
|
|
|
|
# Priority Classification Model (Nepali + English Hybrid) |
|
|
|
|
|
## Model Overview |
|
|
This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text. |
|
|
It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets. |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
| Component | Description | |
|
|
|------------|-------------| |
|
|
| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) | |
|
|
| **Classifier** | Logistic Regression (multiclass, balanced weights) | |
|
|
| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English | |
|
|
| **Features** | SBERT embeddings + priority keyword preservation | |
|
|
| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions | |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Summary |
|
|
|
|
|
| Metric | Value | |
|
|
|---------|-------| |
|
|
| **Total raw samples** | 266 | |
|
|
| **After preprocessing & augmentation** | 594 | |
|
|
| **Train/Test Split** | 445 / 149 | |
|
|
| **Embedding Dimension** | 384 | |
|
|
| **Classes** | `HIGH`, `MEDIUM`, `LOW` | |
|
|
| **Test Accuracy** | **72.5%** | |
|
|
| **Macro F1-score** | **0.72** | |
|
|
|
|
|
### Label Distribution (After Normalization) |
|
|
| Label | Count | |
|
|
|--------|-------| |
|
|
| HIGH | 203 | |
|
|
| MEDIUM | 29 | |
|
|
| LOW | 34 | |
|
|
|
|
|
### Label Distribution (After Augmentation) |
|
|
| Label | Count | |
|
|
|--------|-------| |
|
|
| HIGH | 200 | |
|
|
| MEDIUM | 194 | |
|
|
| LOW | 200 | |
|
|
|
|
|
--- |
|
|
|
|
|
## Classification Report |
|
|
|
|
|
| Class | Precision | Recall | F1 | Support | |
|
|
|--------|------------|--------|----|----------| |
|
|
| HIGH | 0.73 | 0.66 | 0.69 | 50 | |
|
|
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 | |
|
|
| LOW | 0.71 | 0.72 | 0.71 | 50 | |
|
|
| **Overall Accuracy** | | | **0.725** | 149 | |
|
|
|
|
|
**Performance is acceptable (≥70%)** given dataset size. |
|
|
The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases. |
|
|
|
|
|
--- |
|
|
|
|
|
## Inference (Usage) |
|
|
|
|
|
### Using the model directly (ML only or Hybrid) |
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
import joblib |
|
|
from priority_det import Embedder, predict_priority |
|
|
|
|
|
# Download the model |
|
|
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib") |
|
|
|
|
|
# Load the classifier |
|
|
bundle = joblib.load(model_path) |
|
|
clf = bundle["clf"] |
|
|
label_map = bundle["label_map"] |
|
|
|
|
|
# Initialize the embedder |
|
|
embedder = Embedder() |
|
|
|
|
|
# Predict |
|
|
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।" |
|
|
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True) |
|
|
print(result) |