Jidi1997
/

ClimateBERT_GPROP_Detector

@@ -13,65 +13,116 @@ metrics:
   - accuracy
 ---
-# Green Shareholder Proposal Classifier
-## Model Summary
-This model is a fine-tuned version of [`climatebert/distilroberta-base-climate-detector`](https://huggingface.co/climatebert/distilroberta-base-climate-detector), specifically designed to classify **shareholder proposals** into binary categories: green (climate/environmental) or non-green.
 It was trained on a highly curated dataset of Institutional Shareholder Services (ISS) proposals, achieving an **F1 score of 0.981** on the validation set.
-## Model Details
-- **Base Model:** `climatebert/distilroberta-base-climate-detector`
-- **Task:** Binary Sequence Classification
-  - `Label 1`: Green / Climate-related proposal
-  - `Label 0`: Non-green proposal
-- **Language:** English
-- **License:** Apache 2.0 (Model weights). *Note: The dataset used for fine-tuning contains derived data subject to ISS licensing terms.*
-## Uses
-### Direct Use
-The model takes a structured text input describing a shareholder proposal and predicts whether it is conceptually focused on climate change or environmental sustainability.
-**Recommended Input Format:**
 To achieve optimal performance, input text should mirror the structure of the training data:
-> "A {sponsor_type}-type sponsor has filed a shareholder proposal to a(an) {sic2_des}-sector company. This proposal requests: {resolution}. [It falls under a broader agenda class that may include items not directly relevant to this specific proposal: {AgendaCodeInformation}]"
-### Out-of-Scope Use
-- Applying the model to non-English texts.
-- Using the model for automated legal or compliance decision-making without human oversight.
-- Generalizing to broad ESG topics outside of strict environmental/climate scopes (e.g., social or governance issues like gender equality or animal welfare are explicitly trained as negative classes).
-## Training Data
-The model was fine-tuned on a custom stratified dataset of 1,500 manually curated ISS shareholder proposals. The dataset underwent rigorous rule-based correction to exclude tangentially environmental or purely social/governance proposals.
-For full details on data sampling, text construction, and labeling rules, please refer to the **[Dataset Card](在这里填入你的数据集链接)**.
-- **Train split:** 1,200 examples
-- **Validation split:** 300 examples
-## Training Procedure
-### Hyperparameters
-The model was trained using the Hugging Face `Trainer` API with the following hyperparameters:
-- **Learning rate:** 2e-05
-- **Train batch size:** 16
-- **Eval batch size:** 16
-- **Seed:** 42
-- **Weight decay:** 0.05
-- **Optimizer:** AdamW
-- **Number of epochs:** 10
-### Training Results
-The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best performing based on the validation F1 score.
-| Epoch | Training Loss | Validation Loss | Accuracy | F1 (Binary) |
 |:---:|:---:|:---:|:---:|:---:|
 | 1 | 0.3060 | 0.0968 | 0.9667 | 0.9675 |
 | 2 | 0.0954 | 0.0898 | 0.9733 | 0.9740 |
@@ -80,15 +131,39 @@ The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best
 | 5 | 0.0395 | 0.1026 | 0.9800 | 0.9803 |
 | 6 | 0.0350 | 0.1308 | 0.9733 | 0.9744 |
 | 7 | 0.0094 | 0.1108 | 0.9767 | 0.9772 |
-| **8** | **0.0003** | **0.1182** | **0.9800** | **0.9806** |
 | 9 | 0.0004 | 0.1154 | 0.9767 | 0.9773 |
 | 10 | 0.0002 | 0.1229 | 0.9767 | 0.9773 |
-## Limitations and Bias
-While the model achieves high accuracy on the validation set, its performance is tightly coupled with the specific linguistic patterns and taxonomy of the ISS database (e.g., SIC-2 sector descriptions, ISS agenda codes). It may exhibit lower confidence or accuracy when processing unstructured news articles, raw corporate filings, or proposals from different jurisdictional contexts outside the US/global norm represented in the training set.
-## Citation
 If you use this model in your research, please cite the associated working paper:
-*(Citation details forthcoming)*

   - accuracy
 ---
+<div align="center">
+# 🌿 Green Shareholder Proposal Classifier
+<p align="center">
+  <img src="https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge&logo=apache" alt="License"/>
+  <img src="https://img.shields.io/badge/Language-English-blue?style=for-the-badge&logo=googletranslate&logoColor=white" alt="Language"/>
+  <img src="https://img.shields.io/badge/F1%20Score-0.981-brightgreen?style=for-the-badge&logo=checkmarx&logoColor=white" alt="F1 Score"/>
+  <img src="https://img.shields.io/badge/Task-Text%20Classification-orange?style=for-the-badge&logo=openai&logoColor=white" alt="Task"/>
+  <img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
+</p>
+*A fine-tuned NLP model for classifying climate-related shareholder proposals with high precision.*
+</div>
+---
+## 📋 Model Summary
+This model is a fine-tuned version of [`climatebert/distilroberta-base-climate-detector`](https://huggingface.co/climatebert/distilroberta-base-climate-detector), specifically designed to classify **shareholder proposals** into binary categories: **green** (climate/environmental) or **non-green**.
 It was trained on a highly curated dataset of Institutional Shareholder Services (ISS) proposals, achieving an **F1 score of 0.981** on the validation set.
+> 💡 **Designed for researchers and practitioners** in sustainable finance, ESG analysis, and corporate governance.
+---
+## 🔍 Model Details
+| Property | Value |
+|:---|:---|
+| 🧠 **Base Model** | `climatebert/distilroberta-base-climate-detector` |
+| 🎯 **Task** | Binary Sequence Classification |
+| 🌐 **Language** | English |
+| 📄 **License** | Apache 2.0 *(model weights)* |
+### 🏷️ Label Schema
+| Label | Description |
+|:---:|:---|
+| `1` | ✅ Green / Climate-related proposal |
+| `0` | ❌ Non-green proposal |
+---
+## 🚀 Uses
+### ✅ Direct Use
+The model takes a structured text input describing a shareholder proposal and predicts whether it is conceptually focused on climate change or environmental sustainability.
+**📌 Recommended Input Format**
 To achieve optimal performance, input text should mirror the structure of the training data:
+```
+"A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
+{sic2_des}-sector company. This proposal requests: {resolution}.
+[It falls under a broader agenda class that may include items not directly
+relevant to this specific proposal: {AgendaCodeInformation}]"
+```
+### ⚠️ Out-of-Scope Use
+The following use cases are **not recommended**:
+- 🚫 Applying the model to **non-English** texts
+- 🚫 Using the model for **automated legal or compliance decision-making** without human oversight
+- 🚫 Generalizing to **broad ESG topics** outside of strict environmental/climate scopes *(e.g., social or governance issues like gender equality or animal welfare are explicitly trained as negative classes)*
+---
+## 📦 Training Data
+<div align="center">
+| Split | Examples |
+|:---:|:---:|
+| 🏋️ Train | 1,200 |
+| 🧪 Validation | 300 |
+| **Total** | **1,500** |
+</div>
+The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rigorous rule-based correction to exclude tangentially environmental or purely social/governance proposals.
+📂 For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
+---
+## ⚙️ Training Procedure
+### 🔧 Hyperparameters
+| Hyperparameter | Value |
+|:---|:---:|
+| 📐 Learning Rate | `2e-05` |
+| 📦 Train Batch Size | `16` |
+| 📦 Eval Batch Size | `16` |
+| 🎲 Seed | `42` |
+| ⚖️ Weight Decay | `0.05` |
+| 🔁 Optimizer | AdamW |
+| 🔄 Epochs | `10` |
+### 📈 Training Results
+The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best performing checkpoint based on the validation F1 score.
+| Epoch | Train Loss | Val Loss | Accuracy | F1 (Binary) |
 |:---:|:---:|:---:|:---:|:---:|
 | 1 | 0.3060 | 0.0968 | 0.9667 | 0.9675 |
 | 2 | 0.0954 | 0.0898 | 0.9733 | 0.9740 |
 | 5 | 0.0395 | 0.1026 | 0.9800 | 0.9803 |
 | 6 | 0.0350 | 0.1308 | 0.9733 | 0.9744 |
 | 7 | 0.0094 | 0.1108 | 0.9767 | 0.9772 |
+| **8** ⭐ | **0.0003** | **0.1182** | **0.9800** | **0.9806** |
 | 9 | 0.0004 | 0.1154 | 0.9767 | 0.9773 |
 | 10 | 0.0002 | 0.1229 | 0.9767 | 0.9773 |
+> ⭐ **Best checkpoint selected at Epoch 8** — highest validation F1 of **0.9806**
+---
+## ⚠️ Limitations and Bias
+While the model achieves high accuracy on the validation set, several limitations should be noted:
+- 🔗 **Domain dependency** — Performance is tightly coupled with the specific linguistic patterns and taxonomy of the ISS database *(e.g., SIC-2 sector descriptions, ISS agenda codes)*
+- 📰 **Unstructured text** — Lower confidence or accuracy is expected when processing unstructured news articles or raw corporate filings
+- 🌍 **Jurisdictional scope** — The model may not generalize well to proposals from jurisdictions outside the US/global norm represented in the training set
+---
+## 📚 Citation
 If you use this model in your research, please cite the associated working paper:
+```bibtex
+@misc{gprop_classifier,
+  title  = {Green Shareholder Proposal Classifier},
+  note   = {Citation details forthcoming},
+}
+```
+---
+<div align="center">
+*Built on top of [ClimateBERT](https://huggingface.co/climatebert) · Trained with 🤗 Hugging Face Transformers*
+</div>