Updated README.md

Browse files

added model metadata and README.md updates

Files changed (1) hide show

README.md +137 -3

README.md CHANGED Viewed

@@ -1,6 +1,140 @@
 ---
-license: mit
 language:
-- en
 - ne
----

 ---
+license: apache-2.0
 language:
 - ne
+- en
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+base_model: sentence-transformers/all-MiniLM-L6-v2
+new_version: 1.0.0
+pipeline_tag: text-classification
+library_name: scikit-learn
+tags:
+- hybrid-model
+- logistic-regression
+- sentence-transformers
+- sbert
+- ne-en
+- rule-based
+- text-priority
+- low-resource-nlp
+- multilingual
+- civictech
+- complaint-triage
+- emergency-detection
+eval_results:
+- task:
+    type: text-classification
+    name: Priority Detection (Nepali + English)
+  dataset:
+    name: priority_clean.csv (custom)
+    type: csv
+    size: 266 samples
+  metrics:
+    accuracy: 0.725
+    f1_macro: 0.72
+    precision_macro: 0.73
+    recall_macro: 0.73
+    per_class:
+      HIGH:
+        precision: 0.73
+        recall: 0.66
+        f1: 0.69
+      MEDIUM:
+        precision: 0.74
+        recall: 0.8
+        f1: 0.76
+      LOW:
+        precision: 0.71
+        recall: 0.72
+        f1: 0.71
+---
+# Priority Classification Model (Nepali + English Hybrid)
+## Model Overview
+This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
+It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.
+---
+## Model Architecture
+| Component | Description |
+|------------|-------------|
+| **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
+| **Classifier** | Logistic Regression (multiclass, balanced weights) |
+| **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
+| **Features** | SBERT embeddings + priority keyword preservation |
+| **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |
+---
+## Training Summary
+| Metric | Value |
+|---------|-------|
+| **Total raw samples** | 266 |
+| **After preprocessing & augmentation** | 594 |
+| **Train/Test Split** | 445 / 149 |
+| **Embedding Dimension** | 384 |
+| **Classes** | `HIGH`, `MEDIUM`, `LOW` |
+| **Test Accuracy** | **72.5%** |
+| **Macro F1-score** | **0.72** |
+### Label Distribution (After Normalization)
+| Label | Count |
+|--------|-------|
+| HIGH | 203 |
+| MEDIUM | 29 |
+| LOW | 34 |
+### Label Distribution (After Augmentation)
+| Label | Count |
+|--------|-------|
+| HIGH | 200 |
+| MEDIUM | 194 |
+| LOW | 200 |
+---
+## Classification Report
+| Class | Precision | Recall | F1 | Support |
+|--------|------------|--------|----|----------|
+| HIGH | 0.73 | 0.66 | 0.69 | 50 |
+| MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
+| LOW | 0.71 | 0.72 | 0.71 | 50 |
+| **Overall Accuracy** | | | **0.725** | 149 |
+**Performance is acceptable (≥70%)** given dataset size.
+The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.
+---
+## Inference (Usage)
+### Using the model directly (ML only or Hybrid)
+```python
+from huggingface_hub import hf_hub_download
+import joblib
+from priority_det import Embedder, predict_priority
+# Download the model
+model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")
+# Load the classifier
+bundle = joblib.load(model_path)
+clf = bundle["clf"]
+label_map = bundle["label_map"]
+# Initialize the embedder
+embedder = Embedder()
+# Predict
+text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
+result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
+print(result)