| |
|
| | --- |
| | language: vi |
| | license: mit |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | tags: |
| | - esg |
| | - esg-washing |
| | - actionability |
| | - banking |
| | - vietnamese |
| | - nlp |
| | - sustainability |
| | --- |
| | |
| | # ESG Actionability Classifier for Vietnamese Banking Reports |
| |
|
| | ## Model description |
| | This model is a Vietnamese text classification model fine-tuned from **PhoBERT-large** to classify **ESG-related sentences** in banking annual reports according to their **actionability level**. |
| |
|
| | The model is designed as **Module 3 (Actionability Classification)** in a multi-stage ESG-washing analysis framework. It does **not** assess factual correctness or ESG performance, but focuses on identifying whether a disclosure describes concrete actions, future plans, or vague commitments. |
| |
|
| | The model predicts one of three labels: |
| | - **Implemented**: concrete actions or achieved results (often with time references or quantitative indicators) |
| | - **Planning**: stated plans, targets, or future-oriented commitments |
| | - **Indeterminate**: general or symbolic statements without specific actions or evidence |
| |
|
| | --- |
| |
|
| | ## Intended use |
| |
|
| | ### Primary intended use |
| | - Analyzing ESG disclosure quality in Vietnamese banking annual reports. |
| | - Supporting ESG-washing risk analysis by distinguishing substantive actions from symbolic language. |
| |
|
| | ### Example downstream usage |
| | - Measuring the proportion of *Implemented* vs. *Indeterminate* ESG statements at the bank-year level. |
| | - Serving as an intermediate module before evidence linking and ESG-washing risk scoring. |
| |
|
| | ### Out-of-scope use |
| | - Determining the factual truthfulness of ESG claims. |
| | - Legal, regulatory, or investment decision-making without human review. |
| | - Application to non-banking or non-Vietnamese text without re-validation. |
| |
|
| | --- |
| |
|
| | ## Training data |
| |
|
| | The model was trained using a **hybrid labeling strategy**: |
| | - **LLM-generated labels** as a semantic teacher for actionability |
| | - **Weak labeling rules** based on linguistic and domain-specific patterns (e.g., time references, quantitative indicators) |
| | - A **pseudo-gold set** sampled from high-confidence LLM labels for calibration and evaluation |
| |
|
| | Training/validation data: |
| | - Total labeled samples: **5,997** |
| | - Train set: **5,097** |
| | - Validation set: **900** |
| |
|
| | Label distribution (train): |
| | - Implemented: ~37% |
| | - Planning: ~3% |
| | - Indeterminate: ~60% |
| |
|
| | Class imbalance was handled using **class-weighted loss** during training. |
| |
|
| | --- |
| |
|
| | ## Training procedure |
| | - Base model: **PhoBERT-large** |
| | - Task: 3-class sentence-level classification |
| | - Loss: Cross-entropy with class weights |
| | - Evaluation metric: **Macro-F1** |
| | - Input representation: |
| | - Narrative text: sentence with local context (previous + next sentence) |
| | - Tables/KPI-like text: sentence only |
| |
|
| | --- |
| |
|
| | ## Evaluation results |
| |
|
| | ### Validation set (900 samples) |
| | - Accuracy: **0.839** |
| | - Macro-F1: **0.734** |
| |
|
| | Per-class (validation): |
| |
|
| | | Label | Precision | Recall | F1 | |
| | |------|-----------|--------|----| |
| | | Implemented | 0.79 | 0.82 | 0.81 | |
| | | Planning | 0.48 | 0.55 | 0.52 | |
| | | Indeterminate | 0.89 | 0.86 | 0.88 | |
| |
|
| | ### Pseudo-gold test set (498 samples, balanced) |
| | - Accuracy: **0.916** |
| | - Macro-F1: **0.916** |
| |
|
| | > Note: The pseudo-gold set is derived from high-confidence LLM labels and is balanced across classes. It may not fully reflect real-world class distributions. |
| |
|
| | --- |
| |
|
| | ## How to use |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | model_id = "huypham71/esg-action" |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| | |
| | labels = ["Implemented", "Planning", "Indeterminate"] |
| | |
| | text = "Năm 2023, ngân hàng đã giảm 15% lượng khí thải CO2." |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
| | |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits |
| | probs = torch.softmax(logits, dim=-1).squeeze() |
| | |
| | pred = labels[int(probs.argmax())] |
| | print(pred, float(probs.max())) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | The model captures linguistic actionability, not actual ESG performance. |
| |
|
| | Planning statements are relatively rare, which may affect robustness on unseen corpora. |
| |
|
| | Performance may degrade on domains outside Vietnamese banking reports. |
| |
|
| | ## Ethical considerations |
| |
|
| | Outputs should be interpreted as analytical signals, not definitive judgments. |
| |
|
| | Automated classification may reflect biases present in disclosure styles or training data. |
| |
|
| |
|