ESG Actionability Classifier for Vietnamese Banking Reports
Model description
This model is a Vietnamese text classification model fine-tuned from PhoBERT-large to classify ESG-related sentences in banking annual reports according to their actionability level.
The model is designed as Module 3 (Actionability Classification) in a multi-stage ESG-washing analysis framework. It does not assess factual correctness or ESG performance, but focuses on identifying whether a disclosure describes concrete actions, future plans, or vague commitments.
The model predicts one of three labels:
- Implemented: concrete actions or achieved results (often with time references or quantitative indicators)
- Planning: stated plans, targets, or future-oriented commitments
- Indeterminate: general or symbolic statements without specific actions or evidence
Intended use
Primary intended use
- Analyzing ESG disclosure quality in Vietnamese banking annual reports.
- Supporting ESG-washing risk analysis by distinguishing substantive actions from symbolic language.
Example downstream usage
- Measuring the proportion of Implemented vs. Indeterminate ESG statements at the bank-year level.
- Serving as an intermediate module before evidence linking and ESG-washing risk scoring.
Out-of-scope use
- Determining the factual truthfulness of ESG claims.
- Legal, regulatory, or investment decision-making without human review.
- Application to non-banking or non-Vietnamese text without re-validation.
Training data
The model was trained using a hybrid labeling strategy:
- LLM-generated labels as a semantic teacher for actionability
- Weak labeling rules based on linguistic and domain-specific patterns (e.g., time references, quantitative indicators)
- A pseudo-gold set sampled from high-confidence LLM labels for calibration and evaluation
Training/validation data:
- Total labeled samples: 5,997
- Train set: 5,097
- Validation set: 900
Label distribution (train):
- Implemented: ~37%
- Planning: ~3%
- Indeterminate: ~60%
Class imbalance was handled using class-weighted loss during training.
Training procedure
- Base model: PhoBERT-large
- Task: 3-class sentence-level classification
- Loss: Cross-entropy with class weights
- Evaluation metric: Macro-F1
- Input representation:
- Narrative text: sentence with local context (previous + next sentence)
- Tables/KPI-like text: sentence only
Evaluation results
Validation set (900 samples)
- Accuracy: 0.839
- Macro-F1: 0.734
Per-class (validation):
| Label | Precision | Recall | F1 |
|---|---|---|---|
| Implemented | 0.79 | 0.82 | 0.81 |
| Planning | 0.48 | 0.55 | 0.52 |
| Indeterminate | 0.89 | 0.86 | 0.88 |
Pseudo-gold test set (498 samples, balanced)
- Accuracy: 0.916
- Macro-F1: 0.916
Note: The pseudo-gold set is derived from high-confidence LLM labels and is balanced across classes. It may not fully reflect real-world class distributions.
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "huypham71/esg-action"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
labels = ["Implemented", "Planning", "Indeterminate"]
text = "Năm 2023, ngân hàng đã giảm 15% lượng khí thải CO2."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze()
pred = labels[int(probs.argmax())]
print(pred, float(probs.max()))
Limitations
The model captures linguistic actionability, not actual ESG performance.
Planning statements are relatively rare, which may affect robustness on unseen corpora.
Performance may degrade on domains outside Vietnamese banking reports.
Ethical considerations
Outputs should be interpreted as analytical signals, not definitive judgments.
Automated classification may reflect biases present in disclosure styles or training data.
- Downloads last month
- -