| | --- |
| | --- |
| | language: vi |
| | tags: |
| | - nlp |
| | - text-classification |
| | - vietnamese |
| | - esg |
| | - sustainability |
| | - banking |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | license: mit |
| | --- |
| | |
| | # PhoBERT ESG Topic Classifier for Vietnamese Banking Annual Reports |
| |
|
| | ## Model description |
| | This model is a Vietnamese text classification model fine-tuned from **PhoBERT** to classify sentences from **banking annual reports** into ESG-related topics. It is designed as **Module 2 (ESG Topic Classification)** in an ESG-washing analysis pipeline, where downstream modules assess actionability, evidence support, and report-level ESG-washing risk. |
| |
|
| | The model predicts one of six labels: |
| | - `E` (Environmental) |
| | - `S_labor` (Social – labor/workforce) |
| | - `S_community` (Social – community/CSR) |
| | - `S_product` (Social – product/customer) |
| | - `G` (Governance) |
| | - `Non_ESG` (not ESG-related) |
| |
|
| | > Note: The model focuses on **textual disclosure topic classification**, not factual verification of ESG claims. |
| |
|
| | --- |
| |
|
| | ## Intended use |
| | ### Primary intended use |
| | - Filtering and categorizing ESG-related sentences in Vietnamese banking annual reports. |
| | - Supporting ESG-washing analysis pipelines (e.g., actionability classification and evidence linking). |
| |
|
| | ### Example downstream usage |
| | - Keep only ESG sentences (`E`, `S_*`, `G`) and discard `Non_ESG` for later actionability/evidence modules. |
| | - Aggregate predicted topics by bank-year to analyze disclosure patterns across ESG pillars. |
| |
|
| | ### Out-of-scope use |
| | - Determining whether a bank is actually “greenwashing/ESG-washing” in the real world. |
| | - Use on domains far from banking annual reports (e.g., social media) without re-validation. |
| | - Legal, compliance, or investment decision-making without human review. |
| |
|
| | --- |
| |
|
| | ## Training data |
| | The model was trained using a **hybrid labeling strategy**: |
| | - **LLM pre-labels** (teacher) to bootstrap semantic topic boundaries |
| | - **Weak labeling rules** (filter) to override trivial non-ESG content with high precision |
| | - A **manually annotated gold set** used for calibration and evaluation |
| |
|
| | Hybrid label sources: |
| | - `llm`: 2,897 samples (LLM-only) |
| | - `llm_weak_agree`: 2,083 samples (LLM + weak labels agree, higher confidence) |
| |
|
| | Total labeled samples for training/validation: **4,980** |
| | - Train: **4,233** |
| | - Validation: **747** |
| |
|
| | Gold set (manual) for final test: **500** samples, balanced across labels. |
| |
|
| | --- |
| |
|
| | ## Training procedure |
| | - Base model: PhoBERT fine-tuning with a 6-class classification head. |
| | - Objective: Cross-entropy loss (with class-balancing strategy). |
| | - Context-aware input: sentence-level classification with local context window available in the corpus (`prev + sent + next`) depending on block type. |
| |
|
| | --- |
| |
|
| | ## Evaluation results |
| |
|
| | ### Validation set (747 samples) |
| | - Macro-F1: **0.8598** |
| | - Micro-F1: **0.8635** |
| | - Weighted-F1: **0.8628** |
| |
|
| | Per-class (validation): |
| | | Label | Precision | Recall | F1 | Support | |
| | |---|---:|---:|---:|---:| |
| | | E | 0.8310 | 0.8806 | 0.8551 | 67 | |
| | | S_labor | 0.9000 | 0.8675 | 0.8834 | 83 | |
| | | S_community | 0.8732 | 0.8611 | 0.8671 | 72 | |
| | | S_product | 0.8426 | 0.8922 | 0.8667 | 102 | |
| | | G | 0.8372 | 0.7606 | 0.7970 | 142 | |
| | | Non_ESG | 0.8785 | 0.9004 | 0.8893 | 281 | |
| |
|
| | ### Gold test set (500 samples) |
| | - Macro-F1: **0.9665** |
| | - Micro-F1: **0.9660** |
| |
|
| | Per-class (gold): |
| | | Label | Precision | Recall | F1 | Support | |
| | |---|---:|---:|---:|---:| |
| | | E | 0.9872 | 0.9625 | 0.9747 | 80 | |
| | | S_labor | 0.9873 | 0.9750 | 0.9811 | 80 | |
| | | S_community | 0.9634 | 0.9875 | 0.9753 | 80 | |
| | | S_product | 0.9506 | 0.9625 | 0.9565 | 80 | |
| | | G | 0.9659 | 0.9444 | 0.9551 | 90 | |
| | | Non_ESG | 0.9457 | 0.9667 | 0.9560 | 90 | |
| |
|
| | > Note: The gold test set is balanced and may not reflect real-world class frequencies in annual reports. Always validate on your target corpus. |
| |
|
| | --- |
| |
|
| | ## How to use |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | model_id = "YOUR_USERNAME/YOUR_MODEL_REPO" # replace |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForSequenceClassification.from_pretrained(model_id) |
| | |
| | labels = ["E", "S_labor", "S_community", "S_product", "G", "Non_ESG"] |
| | |
| | text = "Ngân hàng đã triển khai chương trình giảm phát thải và tiết kiệm năng lượng trong năm 2024." |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=256) |
| | |
| | with torch.no_grad(): |
| | logits = model(**inputs).logits |
| | probs = torch.softmax(logits, dim=-1).squeeze().tolist() |
| | |
| | pred = labels[int(torch.tensor(probs).argmax())] |
| | print(pred, max(probs)) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Limitations |
| |
|
| | The model is trained on Vietnamese banking annual report language and structure; performance may degrade on other domains. |
| |
|
| | ESG boundaries can be ambiguous; some governance-related financial-risk text may be misclassified without domain adaptation. |
| |
|
| | The model does not verify the truthfulness of ESG claims; it only categorizes topics based on text. |
| |
|
| | ``` |
| | |
| | |