PhoBERT Vietnamese Cau Cuu Classifier

PhoBERT-based Vietnamese Facebook comment classifier for detecting "cầu cứu" comments during natural-disaster situations.

Labels

0: khong_cau_cuu
1: cau_cuu

Intended use

This model is designed to prioritize high recall for emergency rescue requests in Vietnamese social-media comments, especially when comments may contain distress language, location hints, phone numbers, or SOS markers.

Training setup

Base model: vinai/phobert-base
Fine-tuning method: LoRA / PEFT
Evaluation checkpoint source: /content/phobert-cau-cuu/saved_model/checkpoint-171
Decision threshold for deployment: 0.4941
Threshold selection policy: target_recall with validation target recall 0.88

Validation metrics at selected threshold

Accuracy: 0.8469
F1 macro: 0.8380
F1 (cau_cuu): 0.8000
Recall (cau_cuu): 0.8955
Precision (cau_cuu): 0.7229

Test metrics

Accuracy: 0.8520
F1 macro: 0.8430
F1 (cau_cuu): 0.8054
Recall (cau_cuu): 0.9091
Precision (cau_cuu): 0.7229

Confusion matrix on test set

107 23
6 60

Recommended inference rule

Convert logits to probabilities and classify as cau_cuu when:

prob_cau_cuu >= 0.4941

This threshold was chosen on the validation set to preserve strong recall while improving F1(cau_cuu) and overall accuracy.

Example loading code

import torch
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer

repo_id = "dat201204/phobert-vi-caucu-classifier"
threshold = 0.4941

tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=False)
model = AutoPeftModelForSequenceClassification.from_pretrained(repo_id)
model.eval()

text = "Cuu voi, nha em dang ngap va co nguoi gia bi ket"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
    prob_cau_cuu = torch.softmax(logits, dim=-1)[0, 1].item()

label = "cau_cuu" if prob_cau_cuu >= threshold else "khong_cau_cuu"
print({"label": label, "prob_cau_cuu": prob_cau_cuu})

Limitations

The dataset was weakly supervised in the first labeling stage and may contain residual noise.
The model is optimized for disaster-response triage, not for general sentiment or topic classification.
Human verification is still recommended for high-stakes rescue coordination.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for dat201204/phobert-vi-caucu-classifier

Base model

vinai/phobert-base

Adapter

(2)

this model

dat201204
/

phobert-vi-caucu-classifier