HighOnCaffiene's picture
Updated README.md
8d379f5 verified
---
license: apache-2.0
language:
- ne
- en
metrics:
- accuracy
- f1
- precision
- recall
base_model: sentence-transformers/all-MiniLM-L6-v2
new_version: 1.0.0
pipeline_tag: text-classification
library_name: scikit-learn
tags:
- hybrid-model
- logistic-regression
- sentence-transformers
- sbert
- ne-en
- rule-based
- text-priority
- low-resource-nlp
- multilingual
- civictech
- complaint-triage
- emergency-detection
eval_results:
- task:
type: text-classification
name: Priority Detection (Nepali + English)
dataset:
name: priority_clean.csv (custom)
type: csv
size: 266 samples
metrics:
accuracy: 0.725
f1_macro: 0.72
precision_macro: 0.73
recall_macro: 0.73
per_class:
HIGH:
precision: 0.73
recall: 0.66
f1: 0.69
MEDIUM:
precision: 0.74
recall: 0.8
f1: 0.76
LOW:
precision: 0.71
recall: 0.72
f1: 0.71
---
# Priority Classification Model (Nepali + English Hybrid)
## Model Overview
This model automatically classifies citizen complaints or service requests into **priority levels**`HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.
---
## Model Architecture
| Component | Description |
|------------|-------------|
| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
| **Classifier** | Logistic Regression (multiclass, balanced weights) |
| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
| **Features** | SBERT embeddings + priority keyword preservation |
| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |
---
## Training Summary
| Metric | Value |
|---------|-------|
| **Total raw samples** | 266 |
| **After preprocessing & augmentation** | 594 |
| **Train/Test Split** | 445 / 149 |
| **Embedding Dimension** | 384 |
| **Classes** | `HIGH`, `MEDIUM`, `LOW` |
| **Test Accuracy** | **72.5%** |
| **Macro F1-score** | **0.72** |
### Label Distribution (After Normalization)
| Label | Count |
|--------|-------|
| HIGH | 203 |
| MEDIUM | 29 |
| LOW | 34 |
### Label Distribution (After Augmentation)
| Label | Count |
|--------|-------|
| HIGH | 200 |
| MEDIUM | 194 |
| LOW | 200 |
---
## Classification Report
| Class | Precision | Recall | F1 | Support |
|--------|------------|--------|----|----------|
| HIGH | 0.73 | 0.66 | 0.69 | 50 |
| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
| LOW | 0.71 | 0.72 | 0.71 | 50 |
| **Overall Accuracy** | | | **0.725** | 149 |
**Performance is acceptable (≥70%)** given dataset size.
The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.
---
## Inference (Usage)
### Using the model directly (ML only or Hybrid)
```python
from huggingface_hub import hf_hub_download
import joblib
from priority_det import Embedder, predict_priority
# Download the model
model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")
# Load the classifier
bundle = joblib.load(model_path)
clf = bundle["clf"]
label_map = bundle["label_map"]
# Initialize the embedder
embedder = Embedder()
# Predict
text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
print(result)