README.md · HighOnCaffiene/grievance-priority-classifier at main

grievance-priority-classifier / README.md

HighOnCaffiene

Updated README.md

8d379f5 verified 3 months ago

preview code

raw

history blame contribute delete

3.63 kB

	---
	license: apache-2.0
	language:
	- ne
	- en
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model: sentence-transformers/all-MiniLM-L6-v2
	new_version: 1.0.0
	pipeline_tag: text-classification
	library_name: scikit-learn
	tags:
	- hybrid-model
	- logistic-regression
	- sentence-transformers
	- sbert
	- ne-en
	- rule-based
	- text-priority
	- low-resource-nlp
	- multilingual
	- civictech
	- complaint-triage
	- emergency-detection
	eval_results:
	- task:
	type: text-classification
	name: Priority Detection (Nepali + English)
	dataset:
	name: priority_clean.csv (custom)
	type: csv
	size: 266 samples
	metrics:
	accuracy: 0.725
	f1_macro: 0.72
	precision_macro: 0.73
	recall_macro: 0.73
	per_class:
	HIGH:
	precision: 0.73
	recall: 0.66
	f1: 0.69
	MEDIUM:
	precision: 0.74
	recall: 0.8
	f1: 0.76
	LOW:
	precision: 0.71
	recall: 0.72
	f1: 0.71
	---

	# Priority Classification Model (Nepali + English Hybrid)

	## Model Overview
	This model automatically classifies citizen complaints or service requests into priority levels — `HIGH`, `MEDIUM`, or `LOW` — based on the urgency and nature of the text.
	It supports both Nepali and English inputs and uses a hybrid ML + rule-based approach to ensure robustness, especially on small datasets.

	---

	## Model Architecture

	\| Component \| Description \|
	\|------------\|-------------\|
	\| Embedder \| [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) \|
	\| Classifier \| Logistic Regression (multiclass, balanced weights) \|
	\| Rule-based Layer \| Keyword-based fallback for urgency terms in Nepali and English \|
	\| Features \| SBERT embeddings + priority keyword preservation \|
	\| Hybrid Inference \| Combines ML prediction confidence with rules for safer decisions \|

	---

	## Training Summary

	\| Metric \| Value \|
	\|---------\|-------\|
	\| Total raw samples \| 266 \|
	\| After preprocessing & augmentation \| 594 \|
	\| Train/Test Split \| 445 / 149 \|
	\| Embedding Dimension \| 384 \|
	\| Classes \| `HIGH`, `MEDIUM`, `LOW` \|
	\| Test Accuracy \| 72.5% \|
	\| Macro F1-score \| 0.72 \|

	### Label Distribution (After Normalization)
	\| Label \| Count \|
	\|--------\|-------\|
	\| HIGH \| 203 \|
	\| MEDIUM \| 29 \|
	\| LOW \| 34 \|

	### Label Distribution (After Augmentation)
	\| Label \| Count \|
	\|--------\|-------\|
	\| HIGH \| 200 \|
	\| MEDIUM \| 194 \|
	\| LOW \| 200 \|

	---

	## Classification Report

	\| Class \| Precision \| Recall \| F1 \| Support \|
	\|--------\|------------\|--------\|----\|----------\|
	\| HIGH \| 0.73 \| 0.66 \| 0.69 \| 50 \|
	\| MEDIUM \| 0.74 \| 0.80 \| 0.76 \| 49 \|
	\| LOW \| 0.71 \| 0.72 \| 0.71 \| 50 \|
	\| Overall Accuracy \| \| \| 0.725 \| 149 \|

	Performance is acceptable (≥70%) given dataset size.
	The model performs best on clearly marked “urgent/emergency” cases and slightly lower on borderline MEDIUM cases.

	---

	## Inference (Usage)

	### Using the model directly (ML only or Hybrid)
	```python
	from huggingface_hub import hf_hub_download
	import joblib
	from priority_det import Embedder, predict_priority

	# Download the model
	model_path = hf_hub_download(repo_id="your-username/priority-classifier", filename="classifier.joblib")

	# Load the classifier
	bundle = joblib.load(model_path)
	clf = bundle["clf"]
	label_map = bundle["label_map"]

	# Initialize the embedder
	embedder = Embedder()

	# Predict
	text = "पानी आपूर्ति बन्द छ। तत्काल समाधान चाहिन्छ।"
	result = predict_priority(text, embedder, clf, label_map, use_hybrid=True)
	print(result)