File size: 4,882 Bytes
a1b4eb0
 
d905696
 
 
 
 
 
 
 
 
 
 
a1b4eb0
 
d905696
a1b4eb0
d905696
 
a1b4eb0
d905696
 
 
 
 
 
 
a1b4eb0
d905696
a1b4eb0
 
 
d905696
 
 
 
a1b4eb0
d905696
 
 
a1b4eb0
d905696
 
 
 
a1b4eb0
 
 
d905696
 
 
 
 
a1b4eb0
d905696
 
 
a1b4eb0
d905696
 
 
a1b4eb0
d905696
a1b4eb0
d905696
a1b4eb0
d905696
 
 
 
a1b4eb0
 
 
d905696
a1b4eb0
d905696
 
 
 
a1b4eb0
d905696
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a1b4eb0
d905696
a1b4eb0
 
 
d905696
a1b4eb0
 
 
 
 
d905696
 
 
a1b4eb0
d905696
a1b4eb0
d905696
a1b4eb0
 
 
d905696
 
a1b4eb0
d905696
 
a0322e2
a1b4eb0
be9ffbb
 
a1b4eb0
 
d905696
be9ffbb
d905696
a1b4eb0
d905696
a1b4eb0
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
---
language: vi
tags:
- nlp
- text-classification
- vietnamese
- esg
- sustainability
- banking
library_name: transformers
pipeline_tag: text-classification
license: mit
---

# PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports

## Model description
This model is a Vietnamese text classification model fine-tuned from **PhoBERT** to classify sentences from **banking annual reports** into ESG-related topics. It is designed as **Module 2 (ESG Topic Classification)** in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk.

The model predicts one of six labels:
- `E` (Environmental)
- `S_labor` (Social – labor/workforce)
- `S_community` (Social – community/CSR)
- `S_product` (Social – product/customer)
- `G` (Governance)
- `Non_ESG` (not ESG-related)

> Note: The model focuses on **textual disclosure topic classification**, not factual verification of ESG claims.

---

## Intended use
### Primary intended use
- Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports.
- Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking).

### Example downstream usage
- Keep only ESG sentences (`E`, `S_*`, `G`) and discard `Non_ESG` for later actionability/evidence modules.
- Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars.

### Out-of-scope use
- Determining whether a bank is actually “greenwashing/ESG-washing” in the real world.
- Use on domains far from banking annual reports (e.g., social media) without re-validation.
- Legal, compliance, or investment decision-making without human review.

---

## Training data
The model was trained using a **hybrid labeling strategy**:
- **LLM pre-labels** (teacher) to bootstrap semantic topic boundaries
- **Weak labeling rules** (filter) to override trivial non-ESG content with high precision
- A **manually annotated gold set** used for calibration and evaluation

Hybrid label sources:
- `llm`: 2,897 samples (LLM-only)
- `llm_weak_agree`: 2,083 samples (LLM + weak labels agree, higher confidence)

Total labeled samples for training/validation: **4,980**
- Train: **4,233**
- Validation: **747**

Gold set (manual) for final test: **500** samples, balanced across labels.

---

## Training procedure
- Base model: PhoBERT fine-tuning with a 6-class classification head.
- Objective: Cross-entropy loss (with class-balancing strategy).
- Context-aware input: sentence-level classification with local context window available in the corpus (`prev + sent + next`) depending on block type.

---

## Evaluation results

### Validation set (747 samples)
- Macro-F1: **0.8598**
- Micro-F1: **0.8635**
- Weighted-F1: **0.8628**

Per-class (validation):
| Label | Precision | Recall | F1 | Support |
|---|---:|---:|---:|---:|
| E | 0.8310 | 0.8806 | 0.8551 | 67 |
| S_labor | 0.9000 | 0.8675 | 0.8834 | 83 |
| S_community | 0.8732 | 0.8611 | 0.8671 | 72 |
| S_product | 0.8426 | 0.8922 | 0.8667 | 102 |
| G | 0.8372 | 0.7606 | 0.7970 | 142 |
| Non_ESG | 0.8785 | 0.9004 | 0.8893 | 281 |

### Gold test set (500 samples)
- Macro-F1: **0.9665**
- Micro-F1: **0.9660**

Per-class (gold):
| Label | Precision | Recall | F1 | Support |
|---|---:|---:|---:|---:|
| E | 0.9872 | 0.9625 | 0.9747 | 80 |
| S_labor | 0.9873 | 0.9750 | 0.9811 | 80 |
| S_community | 0.9634 | 0.9875 | 0.9753 | 80 |
| S_product | 0.9506 | 0.9625 | 0.9565 | 80 |
| G | 0.9659 | 0.9444 | 0.9551 | 90 |
| Non_ESG | 0.9457 | 0.9667 | 0.9560 | 90 |

> Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus.

---

## How to use

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_id = "YOUR_USERNAME/YOUR_MODEL_REPO"  # replace
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"]

text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256)

with torch.no_grad():
    logits = model(**inputs).logits
    probs = torch.softmax(logits, dim=-1).squeeze().tolist()

pred = labels[int(torch.tensor(probs).argmax())]
print(pred, max(probs))
```

---

## Limitations

The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains.

ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation.

The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text.

```