File size: 4,882 Bytes
a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a1b4eb0 d905696 a0322e2 a1b4eb0 be9ffbb a1b4eb0 d905696 be9ffbb d905696 a1b4eb0 d905696 a1b4eb0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
---
language: vi
tags:
- nlp
- text-classification
- vietnamese
- esg
- sustainability
- banking
library_name: transformers
pipeline_tag: text-classification
license: mit
---
# PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports
## Model description
This model is a Vietnamese text classification model fine-tuned from **PhoBERT** to classify sentences from **banking annual reports** into ESG-related topics. It is designed as **Module 2 (ESG Topic Classification)** in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk.
The model predicts one of six labels:
- `E` (Environmental)
- `S_labor` (Social – labor/workforce)
- `S_community` (Social – community/CSR)
- `S_product` (Social – product/customer)
- `G` (Governance)
- `Non_ESG` (not ESG-related)
> Note: The model focuses on **textual disclosure topic classification**, not factual verification of ESG claims.
---
## Intended use
### Primary intended use
- Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports.
- Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking).
### Example downstream usage
- Keep only ESG sentences (`E`, `S_*`, `G`) and discard `Non_ESG` for later actionability/evidence modules.
- Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars.
### Out-of-scope use
- Determining whether a bank is actually “greenwashing/ESG-washing” in the real world.
- Use on domains far from banking annual reports (e.g., social media) without re-validation.
- Legal, compliance, or investment decision-making without human review.
---
## Training data
The model was trained using a **hybrid labeling strategy**:
- **LLM pre-labels** (teacher) to bootstrap semantic topic boundaries
- **Weak labeling rules** (filter) to override trivial non-ESG content with high precision
- A **manually annotated gold set** used for calibration and evaluation
Hybrid label sources:
- `llm`: 2,897 samples (LLM-only)
- `llm_weak_agree`: 2,083 samples (LLM + weak labels agree, higher confidence)
Total labeled samples for training/validation: **4,980**
- Train: **4,233**
- Validation: **747**
Gold set (manual) for final test: **500** samples, balanced across labels.
---
## Training procedure
- Base model: PhoBERT fine-tuning with a 6-class classification head.
- Objective: Cross-entropy loss (with class-balancing strategy).
- Context-aware input: sentence-level classification with local context window available in the corpus (`prev + sent + next`) depending on block type.
---
## Evaluation results
### Validation set (747 samples)
- Macro-F1: **0.8598**
- Micro-F1: **0.8635**
- Weighted-F1: **0.8628**
Per-class (validation):
| Label | Precision | Recall | F1 | Support |
|---|---:|---:|---:|---:|
| E | 0.8310 | 0.8806 | 0.8551 | 67 |
| S_labor | 0.9000 | 0.8675 | 0.8834 | 83 |
| S_community | 0.8732 | 0.8611 | 0.8671 | 72 |
| S_product | 0.8426 | 0.8922 | 0.8667 | 102 |
| G | 0.8372 | 0.7606 | 0.7970 | 142 |
| Non_ESG | 0.8785 | 0.9004 | 0.8893 | 281 |
### Gold test set (500 samples)
- Macro-F1: **0.9665**
- Micro-F1: **0.9660**
Per-class (gold):
| Label | Precision | Recall | F1 | Support |
|---|---:|---:|---:|---:|
| E | 0.9872 | 0.9625 | 0.9747 | 80 |
| S_labor | 0.9873 | 0.9750 | 0.9811 | 80 |
| S_community | 0.9634 | 0.9875 | 0.9753 | 80 |
| S_product | 0.9506 | 0.9625 | 0.9565 | 80 |
| G | 0.9659 | 0.9444 | 0.9551 | 90 |
| Non_ESG | 0.9457 | 0.9667 | 0.9560 | 90 |
> Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus.
---
## How to use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "YOUR_USERNAME/YOUR_MODEL_REPO" # replace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"]
text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1).squeeze().tolist()
pred = labels[int(torch.tensor(probs).argmax())]
print(pred, max(probs))
```
---
## Limitations
The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains.
ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation.
The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text.
```
|