PhoBERT Vietnamese Cau Cuu Classifier
PhoBERT-based Vietnamese Facebook comment classifier for detecting "cầu cứu" comments during natural-disaster situations.
Labels
0:khong_cau_cuu1:cau_cuu
Intended use
This model is designed to prioritize high recall for emergency rescue requests in Vietnamese social-media comments, especially when comments may contain distress language, location hints, phone numbers, or SOS markers.
Training setup
- Base model:
vinai/phobert-base - Fine-tuning method: LoRA / PEFT
- Evaluation checkpoint source:
/content/phobert-cau-cuu/saved_model/checkpoint-171 - Decision threshold for deployment:
0.4941 - Threshold selection policy:
target_recallwith validation target recall0.88
Validation metrics at selected threshold
- Accuracy:
0.8469 - F1 macro:
0.8380 - F1 (
cau_cuu):0.8000 - Recall (
cau_cuu):0.8955 - Precision (
cau_cuu):0.7229
Test metrics
- Accuracy:
0.8520 - F1 macro:
0.8430 - F1 (
cau_cuu):0.8054 - Recall (
cau_cuu):0.9091 - Precision (
cau_cuu):0.7229
Confusion matrix on test set
107 23
6 60
Recommended inference rule
Convert logits to probabilities and classify as cau_cuu when:
prob_cau_cuu >= 0.4941
This threshold was chosen on the validation set to preserve strong recall while improving F1(cau_cuu) and overall accuracy.
Example loading code
import torch
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer
repo_id = "dat201204/phobert-vi-caucu-classifier"
threshold = 0.4941
tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=False)
model = AutoPeftModelForSequenceClassification.from_pretrained(repo_id)
model.eval()
text = "Cuu voi, nha em dang ngap va co nguoi gia bi ket"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
prob_cau_cuu = torch.softmax(logits, dim=-1)[0, 1].item()
label = "cau_cuu" if prob_cau_cuu >= threshold else "khong_cau_cuu"
print({"label": label, "prob_cau_cuu": prob_cau_cuu})
Limitations
- The dataset was weakly supervised in the first labeling stage and may contain residual noise.
- The model is optimized for disaster-response triage, not for general sentiment or topic classification.
- Human verification is still recommended for high-stakes rescue coordination.
Model tree for dat201204/phobert-vi-caucu-classifier
Base model
vinai/phobert-base