ESG Actionability Classifier for Vietnamese Banking Reports

Model description

This model is a Vietnamese text classification model fine-tuned from PhoBERT-large to classify ESG-related sentences in banking annual reports according to their actionability level.

The model is designed as Module 3 (Actionability Classification) in a multi-stage ESG-washing analysis framework. It does not assess factual correctness or ESG performance, but focuses on identifying whether a disclosure describes concrete actions, future plans, or vague commitments.

The model predicts one of three labels:

  • Implemented: concrete actions or achieved results (often with time references or quantitative indicators)
  • Planning: stated plans, targets, or future-oriented commitments
  • Indeterminate: general or symbolic statements without specific actions or evidence

Intended use

Primary intended use

  • Analyzing ESG disclosure quality in Vietnamese banking annual reports.
  • Supporting ESG-washing risk analysis by distinguishing substantive actions from symbolic language.

Example downstream usage

  • Measuring the proportion of Implemented vs. Indeterminate ESG statements at the bank-year level.
  • Serving as an intermediate module before evidence linking and ESG-washing risk scoring.

Out-of-scope use

  • Determining the factual truthfulness of ESG claims.
  • Legal, regulatory, or investment decision-making without human review.
  • Application to non-banking or non-Vietnamese text without re-validation.

Training data

The model was trained using a hybrid labeling strategy:

  • LLM-generated labels as a semantic teacher for actionability
  • Weak labeling rules based on linguistic and domain-specific patterns (e.g., time references, quantitative indicators)
  • A pseudo-gold set sampled from high-confidence LLM labels for calibration and evaluation

Training/validation data:

  • Total labeled samples: 5,997
  • Train set: 5,097
  • Validation set: 900

Label distribution (train):

  • Implemented: ~37%
  • Planning: ~3%
  • Indeterminate: ~60%

Class imbalance was handled using class-weighted loss during training.


Training procedure

  • Base model: PhoBERT-large
  • Task: 3-class sentence-level classification
  • Loss: Cross-entropy with class weights
  • Evaluation metric: Macro-F1
  • Input representation:
    • Narrative text: sentence with local context (previous + next sentence)
    • Tables/KPI-like text: sentence only

Evaluation results

Validation set (900 samples)

  • Accuracy: 0.839
  • Macro-F1: 0.734

Per-class (validation):

Label Precision Recall F1
Implemented 0.79 0.82 0.81
Planning 0.48 0.55 0.52
Indeterminate 0.89 0.86 0.88

Pseudo-gold test set (498 samples, balanced)

  • Accuracy: 0.916
  • Macro-F1: 0.916

Note: The pseudo-gold set is derived from high-confidence LLM labels and is balanced across classes. It may not fully reflect real-world class distributions.


How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "huypham71/esg-action"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = ["Implemented", "Planning", "Indeterminate"]

text = "Năm 2023, ngân hàng đã giảm 15% lượng khí thải CO2."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1).squeeze()

pred = labels[int(probs.argmax())]
print(pred, float(probs.max()))

Limitations

The model captures linguistic actionability, not actual ESG performance.

Planning statements are relatively rare, which may affect robustness on unseen corpora.

Performance may degrade on domains outside Vietnamese banking reports.

Ethical considerations

Outputs should be interpreted as analytical signals, not definitive judgments.

Automated classification may reflect biases present in disclosure styles or training data.

Downloads last month
-
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support