ESG Actionability Classifier for Vietnamese Banking Reports

Model description

This model is a Vietnamese text classification model fine-tuned from PhoBERT-large to classify ESG-related sentences in banking annual reports according to their actionability level.

The model is designed as Module 3 (Actionability Classification) in a multi-stage ESG-washing analysis framework. It does not assess factual correctness or ESG performance, but focuses on identifying whether a disclosure describes concrete actions, future plans, or vague commitments.

The model predicts one of three labels:

Implemented: concrete actions or achieved results (often with time references or quantitative indicators)
Planning: stated plans, targets, or future-oriented commitments
Indeterminate: general or symbolic statements without specific actions or evidence

Intended use

Primary intended use

Analyzing ESG disclosure quality in Vietnamese banking annual reports.
Supporting ESG-washing risk analysis by distinguishing substantive actions from symbolic language.

Example downstream usage

Measuring the proportion of Implemented vs. Indeterminate ESG statements at the bank-year level.
Serving as an intermediate module before evidence linking and ESG-washing risk scoring.

Out-of-scope use

Determining the factual truthfulness of ESG claims.
Legal, regulatory, or investment decision-making without human review.
Application to non-banking or non-Vietnamese text without re-validation.

Training data

The model was trained using a hybrid labeling strategy:

LLM-generated labels as a semantic teacher for actionability
Weak labeling rules based on linguistic and domain-specific patterns (e.g., time references, quantitative indicators)
A pseudo-gold set sampled from high-confidence LLM labels for calibration and evaluation

Training/validation data:

Total labeled samples: 5,997
Train set: 5,097
Validation set: 900

Label distribution (train):

Implemented: ~37%
Planning: ~3%
Indeterminate: ~60%

Class imbalance was handled using class-weighted loss during training.

Training procedure

Base model: PhoBERT-large
Task: 3-class sentence-level classification
Loss: Cross-entropy with class weights
Evaluation metric: Macro-F1
Input representation:
- Narrative text: sentence with local context (previous + next sentence)
- Tables/KPI-like text: sentence only

Evaluation results

Validation set (900 samples)

Accuracy: 0.839
Macro-F1: 0.734

Per-class (validation):

Label	Precision	Recall	F1
Implemented	0.79	0.82	0.81
Planning	0.48	0.55	0.52
Indeterminate	0.89	0.86	0.88

Pseudo-gold test set (498 samples, balanced)

Accuracy: 0.916
Macro-F1: 0.916

Note: The pseudo-gold set is derived from high-confidence LLM labels and is balanced across classes. It may not fully reflect real-world class distributions.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "huypham71/esg-action"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = ["Implemented", "Planning", "Indeterminate"]

text = "Năm 2023, ngân hàng đã giảm 15% lượng khí thải CO2."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1).squeeze()

pred = labels[int(probs.argmax())]
print(pred, float(probs.max()))

Limitations

The model captures linguistic actionability, not actual ESG performance.

Planning statements are relatively rare, which may affect robustness on unseen corpora.

Performance may degrade on domains outside Vietnamese banking reports.

Ethical considerations

Outputs should be interpreted as analytical signals, not definitive judgments.

Automated classification may reflect biases present in disclosure styles or training data.

Downloads last month: -

Safetensors

Model size

0.4B params

Tensor type

F32