File size: 7,796 Bytes
9a73d3e cef9a6e 9a73d3e 343faaa 9a73d3e cef9a6e 2d16b2e cef9a6e c1a0588 cef9a6e e794bf5 cef9a6e e794bf5 cef9a6e e794bf5 cef9a6e e794bf5 2b2a34f 54140cb 2b2a34f e794bf5 2b2a34f e794bf5 cef9a6e e794bf5 9a73d3e e794bf5 9a73d3e 54140cb cef9a6e d4454b9 cef9a6e e794bf5 0ae9a67 e794bf5 54140cb e794bf5 9a73d3e cef9a6e 9a73d3e d4454b9 9a73d3e cef9a6e 9a73d3e cef9a6e 9a73d3e cef9a6e 9a73d3e cef9a6e 9a73d3e cef9a6e 9a73d3e cef9a6e 9a73d3e d4454b9 cef9a6e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 | ---
license: apache-2.0
language:
- en
tags:
- climate
- ESG
- sustainable-finance
- sequence-classification
base_model: climatebert/distilroberta-base-climate-detector
metrics:
- f1
- accuracy
---
<div align="center">
# πΏ Green Shareholder Proposal Detector
<p align="center">
<img src="https://img.shields.io/badge/License-Apache%202.0-green.svg?style=for-the-badge&logo=apache" alt="License"/>
<img src="https://img.shields.io/badge/Language-English-blue?style=for-the-badge&logo=googletranslate&logoColor=white" alt="Language"/>
<img src="https://img.shields.io/badge/Task-Text%20Classification-orange?style=for-the-badge&logo=openai&logoColor=white" alt="Task"/>
<img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
</p>
*A fine-tuned BERT-based language model to detect "greenness" within shareholder proposal.*
</div>
---
## π Model Summary
Shareholder resolutions are often terse and semantically ambiguous when read in isolation.
Consider a proposal requesting a report on **water risk management** β this may refer to
environmental water stress (a climate risk) or to the human right to water access (a social
issue). Such overlaps are pervasive in ESG discourse, where the same terminology routinely
spans environmental, social, and governance dimensions.
This model is a fine-tuned version of [ClimateBERT](https://huggingface.co/climatebert/distilroberta-base-climate-detector),
specifically engineered to classify shareholder proposals as **green** (climate/environmental)
or **non-green**. It is trained to resolve precisely this kind of ambiguity: rather than
surface-matching sustainability keywords, it learns to identify the **underlying environmental
intent** of a proposal from its full contextual framing.
As a result, the model is robust against false positives induced by generic ESG buzzwords
β terms such as *neutrality*, *waste*, or *water* that frequently appear across non-environmental
proposals β and maintains high precision in **mixed-ESG contexts** where environmental and
social/governance themes co-occur.
> π― **Designed for:** Extracting environmental signal from noisy, multi-topic ESG disclosures.
---
## π Usage
### β‘ Quick Start
Install dependencies first:
```bash
pip install transformers torch
```
Then run the following:
```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline
from transformers.pipelines.pt_utils import KeyDataset
import datasets
from tqdm.auto import tqdm
# ββ Model ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
model_name = "Jidi1997/ClimateBERT_GPROP_Detector"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name, model_max_length=512)
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer, device=0) # change to device=-1 if only CPU is available
# ββ Data βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Option A: Load your own dataset from a local CSV / JSON file
# dataset = datasets.load_dataset("csv", data_files="your_proposals.csv", split="train")
# Option B: Construct proposals inline using the recommended input format
# Each entry should follow the structure below for best performance:
# "A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
# {sic2_des}-sector company. This proposal requests: {resolution}.
# It falls under a broader agenda class that may include items not directly
# relevant to this specific proposal: {AgendaCodeInformation}"
dataset = datasets.Dataset.from_dict({"text": [
# Replace with your own proposals following the recommended input format above
"""A(An) institutional-type sponsor has filed a shareholder proposal to a(an)
energy-sector company. This proposal requests: the company to issue a report
on its greenhouse gas emissions reduction targets.
It falls under a broader agenda class: "..."""
]})
# ββ Inference ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# label='yes' β Green proposal (Label 1)
# label='no' β Non-green proposal (Label 0)
for out in tqdm(pipe(KeyDataset(dataset, "text"), padding=True, truncation=True)):
print(out)
```
---
### π Recommended Input Format
To address ambiguity in raw proposal text, we can enhance the model's input with structured proposal- and firm-level context, like the training data format:
```
"A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
{sic2_des}-sector company. This proposal requests: {resolution}.
It falls under a broader agenda class that may include items not directly
relevant to this specific proposal: {AgendaCodeInformation}"
```
| Field | Description | Example |
|:---|:---|:---|
| `{sponsor_type}` | Type of proposal sponsor | `institutional`, `individual`, `SRI fund`, `pension fund` |
| `{sic2_des}` | SIC-2 industry sector description | `energy`, `manufacturing` |
| `{resolution}` | Full text of the proposal resolution | *"Report on Climate Change Performance Metrics Into Executive Compensation Program..."* |
| `{AgendaCodeInformation}` | Description of ISS agenda code | *"This code is used for proposals seeking..."* |
> π‘ **Tip:** The `{AgendaCodeInformation}` field is optional but including it generally improves prediction confidence, as it provides additional categorical context into brief resolution context.
## π¦ Training Data
The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rule-based correction to exclude purely social/governance and blend proposals.
π For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
---
## βοΈ Training Procedure
### π§ Hyperparameters
| Hyperparameter | Value |
|:---|:---:|
| π Learning Rate | `2e-05` |
| π¦ Train Batch Size | `16` |
| π¦ Eval Batch Size | `16` |
| π² Seed | `42` |
| βοΈ Weight Decay | `0.05` |
| π Optimizer | AdamW |
| π Epochs | `10` |
### π Training Results
The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best performing checkpoint based on the validation F1 score.
| Epoch | Train Loss | Val Loss | Accuracy | F1 (Binary) |
|:---:|:---:|:---:|:---:|:---:|
| 1 | 0.3060 | 0.0968 | 0.9667 | 0.9675 |
| 2 | 0.0954 | 0.0898 | 0.9733 | 0.9740 |
| 3 | 0.0956 | 0.1808 | 0.9600 | 0.9623 |
| 4 | 0.0029 | 0.0783 | 0.9800 | 0.9805 |
| 5 | 0.0395 | 0.1026 | 0.9800 | 0.9803 |
| 6 | 0.0350 | 0.1308 | 0.9733 | 0.9744 |
| 7 | 0.0094 | 0.1108 | 0.9767 | 0.9772 |
| **8** β | **0.0003** | **0.1182** | **0.9800** | **0.9806** |
| 9 | 0.0004 | 0.1154 | 0.9767 | 0.9773 |
| 10 | 0.0002 | 0.1229 | 0.9767 | 0.9773 |
> β **Best checkpoint selected at Epoch 8** β highest validation F1 of **0.9806**
---
## π Citation
If you use this model in your research, please cite the associated working paper: (Forthcoming)
---
<div align="center">
*Built on top of [ClimateBERT](https://huggingface.co/climatebert) Β· Trained with π€ Hugging Face Transformers*
</div> |