PhoBERT Vietnamese Cau Cuu Classifier

PhoBERT-based Vietnamese Facebook comment classifier for detecting "cầu cứu" comments during natural-disaster situations.

Labels

  • 0: khong_cau_cuu
  • 1: cau_cuu

Intended use

This model is designed to prioritize high recall for emergency rescue requests in Vietnamese social-media comments, especially when comments may contain distress language, location hints, phone numbers, or SOS markers.

Training setup

  • Base model: vinai/phobert-base
  • Fine-tuning method: LoRA / PEFT
  • Evaluation checkpoint source: /content/phobert-cau-cuu/saved_model/checkpoint-171
  • Decision threshold for deployment: 0.4941
  • Threshold selection policy: target_recall with validation target recall 0.88

Validation metrics at selected threshold

  • Accuracy: 0.8469
  • F1 macro: 0.8380
  • F1 (cau_cuu): 0.8000
  • Recall (cau_cuu): 0.8955
  • Precision (cau_cuu): 0.7229

Test metrics

  • Accuracy: 0.8520
  • F1 macro: 0.8430
  • F1 (cau_cuu): 0.8054
  • Recall (cau_cuu): 0.9091
  • Precision (cau_cuu): 0.7229

Confusion matrix on test set

107 23
6 60

Recommended inference rule

Convert logits to probabilities and classify as cau_cuu when:

prob_cau_cuu >= 0.4941

This threshold was chosen on the validation set to preserve strong recall while improving F1(cau_cuu) and overall accuracy.

Example loading code

import torch
from peft import AutoPeftModelForSequenceClassification
from transformers import AutoTokenizer

repo_id = "dat201204/phobert-vi-caucu-classifier"
threshold = 0.4941

tokenizer = AutoTokenizer.from_pretrained(repo_id, use_fast=False)
model = AutoPeftModelForSequenceClassification.from_pretrained(repo_id)
model.eval()

text = "Cuu voi, nha em dang ngap va co nguoi gia bi ket"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
    logits = model(**inputs).logits
    prob_cau_cuu = torch.softmax(logits, dim=-1)[0, 1].item()

label = "cau_cuu" if prob_cau_cuu >= threshold else "khong_cau_cuu"
print({"label": label, "prob_cau_cuu": prob_cau_cuu})

Limitations

  • The dataset was weakly supervised in the first labeling stage and may contain residual noise.
  • The model is optimized for disaster-response triage, not for general sentiment or topic classification.
  • Human verification is still recommended for high-stakes rescue coordination.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dat201204/phobert-vi-caucu-classifier

Adapter
(2)
this model

Space using dat201204/phobert-vi-caucu-classifier 1