HighOnCaffiene commited on
Commit
8d379f5
·
verified ·
1 Parent(s): 6b0db41

Updated README.md

Browse files

added model metadata and README.md updates

Files changed (1) hide show
  1. README.md +137 -3
README.md CHANGED
@@ -1,6 +1,140 @@
1
  ---
2
- license: mit
3
  language:
4
- - en
5
  - ne
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  language:
 
4
  - ne
5
+ - en
6
+ metrics:
7
+ - accuracy
8
+ - f1
9
+ - precision
10
+ - recall
11
+ base_model: sentence-transformers/all-MiniLM-L6-v2
12
+ new_version: 1.0.0
13
+ pipeline_tag: text-classification
14
+ library_name: scikit-learn
15
+ tags:
16
+ - hybrid-model
17
+ - logistic-regression
18
+ - sentence-transformers
19
+ - sbert
20
+ - ne-en
21
+ - rule-based
22
+ - text-priority
23
+ - low-resource-nlp
24
+ - multilingual
25
+ - civictech
26
+ - complaint-triage
27
+ - emergency-detection
28
+ eval_results:
29
+ - task:
30
+ type: text-classification
31
+ name: Priority Detection (Nepali + English)
32
+ dataset:
33
+ name: priority_clean.csv (custom)
34
+ type: csv
35
+ size: 266 samples
36
+ metrics:
37
+ accuracy: 0.725
38
+ f1_macro: 0.72
39
+ precision_macro: 0.73
40
+ recall_macro: 0.73
41
+ per_class:
42
+ HIGH:
43
+ precision: 0.73
44
+ recall: 0.66
45
+ f1: 0.69
46
+ MEDIUM:
47
+ precision: 0.74
48
+ recall: 0.8
49
+ f1: 0.76
50
+ LOW:
51
+ precision: 0.71
52
+ recall: 0.72
53
+ f1: 0.71
54
+ ---
55
+
56
+ # Priority Classification Model (Nepali + English Hybrid)
57
+
58
+ ## Model Overview
59
+ This model automatically classifies citizen complaints or service requests into **priority levels** — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
60
+ It supports **both Nepali and English** inputs and uses a **hybrid ML + rule-based approach** to ensure robustness, especially on small datasets.
61
+
62
+ ---
63
+
64
+ ## Model Architecture
65
+
66
+ | Component | Description |
67
+ |------------|-------------|
68
+ | **Embedder** | [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
69
+ | **Classifier** | Logistic Regression (multiclass, balanced weights) |
70
+ | **Rule-based Layer** | Keyword-based fallback for urgency terms in Nepali and English |
71
+ | **Features** | SBERT embeddings + priority keyword preservation |
72
+ | **Hybrid Inference** | Combines ML prediction confidence with rules for safer decisions |
73
+
74
+ ---
75
+
76
+ ## Training Summary
77
+
78
+ | Metric | Value |
79
+ |---------|-------|
80
+ | **Total raw samples** | 266 |
81
+ | **After preprocessing & augmentation** | 594 |
82
+ | **Train/Test Split** | 445 / 149 |
83
+ | **Embedding Dimension** | 384 |
84
+ | **Classes** | `HIGH`, `MEDIUM`, `LOW` |
85
+ | **Test Accuracy** | **72.5%** |
86
+ | **Macro F1-score** | **0.72** |
87
+
88
+ ### Label Distribution (After Normalization)
89
+ | Label | Count |
90
+ |--------|-------|
91
+ | HIGH | 203 |
92
+ | MEDIUM | 29 |
93
+ | LOW | 34 |
94
+
95
+ ### Label Distribution (After Augmentation)
96
+ | Label | Count |
97
+ |--------|-------|
98
+ | HIGH | 200 |
99
+ | MEDIUM | 194 |
100
+ | LOW | 200 |
101
+
102
+ ---
103
+
104
+ ## Classification Report
105
+
106
+ | Class | Precision | Recall | F1 | Support |
107
+ |--------|------------|--------|----|----------|
108
+ | HIGH | 0.73 | 0.66 | 0.69 | 50 |
109
+ | MEDIUM | 0.74 | 0.80 | 0.76 | 49 |
110
+ | LOW | 0.71 | 0.72 | 0.71 | 50 |
111
+ | **Overall Accuracy** | | | **0.725** | 149 |
112
+
113
+ **Performance is acceptable (≥70%)** given dataset size.
114
+ The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.
115
+
116
+ ---
117
+
118
+ ## Inference (Usage)
119
+
120
+ ### Using the model directly (ML only or Hybrid)
121
+ ```python
122
+ from huggingface_hub import hf_hub_download
123
+ import joblib
124
+ from priority_det import Embedder, predict_priority
125
+
126
+ # Download the model
127
+ model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")
128
+
129
+ # Load the classifier
130
+ bundle = joblib.load(model_path)
131
+ clf = bundle["clf"]
132
+ label_map = bundle["label_map"]
133
+
134
+ # Initialize the embedder
135
+ embedder = Embedder()
136
+
137
+ # Predict
138
+ text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
139
+ result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
140
+ print(result)