esg-action / README.md
huypham71's picture
Upload ESG Action
7ca3abf verified
---
language: vi
license: mit
library_name: transformers
pipeline_tag: text-classification
tags:
- esg
- esg-washing
- actionability
- banking
- vietnamese
- nlp
- sustainability
---
# ESG Actionability Classifier for Vietnamese Banking Reports
## Model description
This model is a Vietnamese text classification model fine-tuned from **PhoBERT-large** to classify **ESG-related sentences** in banking annual reports according to their **actionability level**.
The model is designed as **Module 3 (Actionability Classification)** in a multi-stage ESG-washing analysis framework. It does **not** assess factual correctness or ESG performance, but focuses on identifying whether a disclosure describes concrete actions, future plans, or vague commitments.
The model predicts one of three labels:
- **Implemented**: concrete actions or achieved results (often with time references or quantitative indicators)
- **Planning**: stated plans, targets, or future-oriented commitments
- **Indeterminate**: general or symbolic statements without specific actions or evidence
---
## Intended use
### Primary intended use
- Analyzing ESG disclosure quality in Vietnamese banking annual reports.
- Supporting ESG-washing risk analysis by distinguishing substantive actions from symbolic language.
### Example downstream usage
- Measuring the proportion of *Implemented* vs. *Indeterminate* ESG statements at the bank-year level.
- Serving as an intermediate module before evidence linking and ESG-washing risk scoring.
### Out-of-scope use
- Determining the factual truthfulness of ESG claims.
- Legal, regulatory, or investment decision-making without human review.
- Application to non-banking or non-Vietnamese text without re-validation.
---
## Training data
The model was trained using a **hybrid labeling strategy**:
- **LLM-generated labels** as a semantic teacher for actionability
- **Weak labeling rules** based on linguistic and domain-specific patterns (e.g., time references, quantitative indicators)
- A **pseudo-gold set** sampled from high-confidence LLM labels for calibration and evaluation
Training/validation data:
- Total labeled samples: **5,997**
- Train set: **5,097**
- Validation set: **900**
Label distribution (train):
- Implemented: ~37%
- Planning: ~3%
- Indeterminate: ~60%
Class imbalance was handled using **class-weighted loss** during training.
---
## Training procedure
- Base model: **PhoBERT-large**
- Task: 3-class sentence-level classification
- Loss: Cross-entropy with class weights
- Evaluation metric: **Macro-F1**
- Input representation:
- Narrative text: sentence with local context (previous + next sentence)
- Tables/KPI-like text: sentence only
---
## Evaluation results
### Validation set (900 samples)
- Accuracy: **0.839**
- Macro-F1: **0.734**
Per-class (validation):
| Label | Precision | Recall | F1 |
|------|-----------|--------|----|
| Implemented | 0.79 | 0.82 | 0.81 |
| Planning | 0.48 | 0.55 | 0.52 |
| Indeterminate | 0.89 | 0.86 | 0.88 |
### Pseudo-gold test set (498 samples, balanced)
- Accuracy: **0.916**
- Macro-F1: **0.916**
> Note: The pseudo-gold set is derived from high-confidence LLM labels and is balanced across classes. It may not fully reflect real-world class distributions.
---
## How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "huypham71/esg-action"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
labels = ["Implemented", "Planning", "Indeterminate"]
text = "Năm 2023, ngân hàng đã giảm 15% lượng khí thải CO2."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze()
pred = labels[int(probs.argmax())]
print(pred, float(probs.max()))
```
---
## Limitations
The model captures linguistic actionability, not actual ESG performance.
Planning statements are relatively rare, which may affect robustness on unseen corpora.
Performance may degrade on domains outside Vietnamese banking reports.
## Ethical considerations
Outputs should be interpreted as analytical signals, not definitive judgments.
Automated classification may reflect biases present in disclosure styles or training data.