Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,7 @@ metrics:
|
|
| 24 |
<img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
|
| 25 |
</p>
|
| 26 |
|
| 27 |
-
*A fine-tuned BERT-based language model
|
| 28 |
|
| 29 |
</div>
|
| 30 |
|
|
@@ -32,29 +32,11 @@ metrics:
|
|
| 32 |
|
| 33 |
## π Model Summary
|
| 34 |
|
| 35 |
-
This model is a fine-tuned version of
|
| 36 |
|
| 37 |
-
|
| 38 |
|
| 39 |
-
>
|
| 40 |
-
|
| 41 |
-
---
|
| 42 |
-
|
| 43 |
-
## π Model Details
|
| 44 |
-
|
| 45 |
-
| Property | Value |
|
| 46 |
-
|:---|:---|
|
| 47 |
-
| π§ **Base Model** | `climatebert/distilroberta-base-climate-detector` |
|
| 48 |
-
| π― **Task** | Binary Sequence Classification |
|
| 49 |
-
| π **Language** | English |
|
| 50 |
-
| π **License** | Apache 2.0 *(model weights)* |
|
| 51 |
-
|
| 52 |
-
### π·οΈ Label Schema
|
| 53 |
-
|
| 54 |
-
| Label | Description |
|
| 55 |
-
|:---:|:---|
|
| 56 |
-
| `1` | β
Green / Climate-related proposal |
|
| 57 |
-
| `0` | β Non-green proposal |
|
| 58 |
|
| 59 |
---
|
| 60 |
|
|
@@ -66,28 +48,19 @@ The model takes a structured text input describing a shareholder proposal and pr
|
|
| 66 |
|
| 67 |
**π Recommended Input Format**
|
| 68 |
|
| 69 |
-
To achieve optimal performance, input text
|
| 70 |
|
| 71 |
```
|
| 72 |
"A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
|
| 73 |
{sic2_des}-sector company. This proposal requests: {resolution}.
|
| 74 |
-
|
| 75 |
-
relevant to this specific proposal: {AgendaCodeInformation}
|
| 76 |
```
|
| 77 |
|
| 78 |
-
### β οΈ Out-of-Scope Use
|
| 79 |
-
|
| 80 |
-
The following use cases are **not recommended**:
|
| 81 |
-
|
| 82 |
-
- π« Applying the model to **non-English** texts
|
| 83 |
-
- π« Using the model for **automated legal or compliance decision-making** without human oversight
|
| 84 |
-
- π« Generalizing to **broad ESG topics** outside of strict environmental/climate scopes *(e.g., social or governance issues like gender equality or animal welfare are explicitly trained as negative classes)*
|
| 85 |
-
|
| 86 |
-
---
|
| 87 |
|
| 88 |
## π¦ Training Data
|
| 89 |
|
| 90 |
-
The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rule-based correction to exclude
|
| 91 |
|
| 92 |
π For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
|
| 93 |
|
|
@@ -131,7 +104,7 @@ The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best
|
|
| 131 |
|
| 132 |
## π Citation
|
| 133 |
|
| 134 |
-
If you use this model in your research, please cite the associated working paper:
|
| 135 |
|
| 136 |
---
|
| 137 |
|
|
|
|
| 24 |
<img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
|
| 25 |
</p>
|
| 26 |
|
| 27 |
+
*A fine-tuned BERT-based language model to detect green shareholder proposals with high precision.*
|
| 28 |
|
| 29 |
</div>
|
| 30 |
|
|
|
|
| 32 |
|
| 33 |
## π Model Summary
|
| 34 |
|
| 35 |
+
This model is a fine-tuned version of climatebert/distilroberta-base-climate-detector. It is specifically engineered to classify shareholder proposals into green (climate/environmental) or non-green categories.
|
| 36 |
|
| 37 |
+
Crucially, it is highly effective at isolating environmental topics from **broad, mixed-ESG contexts** without being distracted by generic sustainability or governance buzzwords (etc. Neutrality, Waste, Water...). Trained on a meticulously curated dataset, it achieves an F1 score of 0.981 on the validation set.
|
| 38 |
|
| 39 |
+
>π‘ Designed for: Precision text classification in sustainable finance, ESG analysis, and corporate governance contexts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
---
|
| 42 |
|
|
|
|
| 48 |
|
| 49 |
**π Recommended Input Format**
|
| 50 |
|
| 51 |
+
To achieve optimal performance, input text can be organized as the structure of the training data:
|
| 52 |
|
| 53 |
```
|
| 54 |
"A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
|
| 55 |
{sic2_des}-sector company. This proposal requests: {resolution}.
|
| 56 |
+
It falls under a broader agenda class that may include items not directly
|
| 57 |
+
relevant to this specific proposal: {AgendaCodeInformation}"
|
| 58 |
```
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
## π¦ Training Data
|
| 62 |
|
| 63 |
+
The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rule-based correction to exclude purely social/governance and blend proposals.
|
| 64 |
|
| 65 |
π For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
|
| 66 |
|
|
|
|
| 104 |
|
| 105 |
## π Citation
|
| 106 |
|
| 107 |
+
If you use this model in your research, please cite the associated working paper: (Forthcoming)
|
| 108 |
|
| 109 |
---
|
| 110 |
|