Jidi1997 commited on
Commit
d4454b9
Β·
verified Β·
1 Parent(s): e5b3460

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -36
README.md CHANGED
@@ -24,7 +24,7 @@ metrics:
24
  <img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
25
  </p>
26
 
27
- *A fine-tuned BERT-based language model for classifying environmental and climate-related shareholder proposals with high precision.*
28
 
29
  </div>
30
 
@@ -32,29 +32,11 @@ metrics:
32
 
33
  ## πŸ“‹ Model Summary
34
 
35
- This model is a fine-tuned version of [`climatebert/distilroberta-base-climate-detector`](https://huggingface.co/climatebert/distilroberta-base-climate-detector), specifically designed to classify **shareholder proposals** into binary categories: **green** (climate/environmental) or **non-green**.
36
 
37
- It was trained on a highly curated dataset of Institutional Shareholder Services (ISS) proposals, achieving an **F1 score of 0.981** on the validation set.
38
 
39
- > πŸ’‘ **Designed for researchers and practitioners** in sustainable finance, ESG analysis, and corporate governance.
40
-
41
- ---
42
-
43
- ## πŸ” Model Details
44
-
45
- | Property | Value |
46
- |:---|:---|
47
- | 🧠 **Base Model** | `climatebert/distilroberta-base-climate-detector` |
48
- | 🎯 **Task** | Binary Sequence Classification |
49
- | 🌐 **Language** | English |
50
- | πŸ“„ **License** | Apache 2.0 *(model weights)* |
51
-
52
- ### 🏷️ Label Schema
53
-
54
- | Label | Description |
55
- |:---:|:---|
56
- | `1` | βœ… Green / Climate-related proposal |
57
- | `0` | ❌ Non-green proposal |
58
 
59
  ---
60
 
@@ -66,28 +48,19 @@ The model takes a structured text input describing a shareholder proposal and pr
66
 
67
  **πŸ“Œ Recommended Input Format**
68
 
69
- To achieve optimal performance, input text should mirror the structure of the training data:
70
 
71
  ```
72
  "A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
73
  {sic2_des}-sector company. This proposal requests: {resolution}.
74
- [It falls under a broader agenda class that may include items not directly
75
- relevant to this specific proposal: {AgendaCodeInformation}]"
76
  ```
77
 
78
- ### ⚠️ Out-of-Scope Use
79
-
80
- The following use cases are **not recommended**:
81
-
82
- - 🚫 Applying the model to **non-English** texts
83
- - 🚫 Using the model for **automated legal or compliance decision-making** without human oversight
84
- - 🚫 Generalizing to **broad ESG topics** outside of strict environmental/climate scopes *(e.g., social or governance issues like gender equality or animal welfare are explicitly trained as negative classes)*
85
-
86
- ---
87
 
88
  ## πŸ“¦ Training Data
89
 
90
- The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rule-based correction to exclude tangentially environmental or purely social/governance proposals.
91
 
92
  πŸ“‚ For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
93
 
@@ -131,7 +104,7 @@ The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best
131
 
132
  ## πŸ“š Citation
133
 
134
- If you use this model in your research, please cite the associated working paper:
135
 
136
  ---
137
 
 
24
  <img src="https://img.shields.io/badge/Domain-ESG%20%7C%20Climate%20Finance-teal?style=for-the-badge&logo=leaflet&logoColor=white" alt="Domain"/>
25
  </p>
26
 
27
+ *A fine-tuned BERT-based language model to detect green shareholder proposals with high precision.*
28
 
29
  </div>
30
 
 
32
 
33
  ## πŸ“‹ Model Summary
34
 
35
+ This model is a fine-tuned version of climatebert/distilroberta-base-climate-detector. It is specifically engineered to classify shareholder proposals into green (climate/environmental) or non-green categories.
36
 
37
+ Crucially, it is highly effective at isolating environmental topics from **broad, mixed-ESG contexts** without being distracted by generic sustainability or governance buzzwords (etc. Neutrality, Waste, Water...). Trained on a meticulously curated dataset, it achieves an F1 score of 0.981 on the validation set.
38
 
39
+ >πŸ’‘ Designed for: Precision text classification in sustainable finance, ESG analysis, and corporate governance contexts.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ---
42
 
 
48
 
49
  **πŸ“Œ Recommended Input Format**
50
 
51
+ To achieve optimal performance, input text can be organized as the structure of the training data:
52
 
53
  ```
54
  "A(An) {sponsor_type}-type sponsor has filed a shareholder proposal to a(an)
55
  {sic2_des}-sector company. This proposal requests: {resolution}.
56
+ It falls under a broader agenda class that may include items not directly
57
+ relevant to this specific proposal: {AgendaCodeInformation}"
58
  ```
59
 
 
 
 
 
 
 
 
 
 
60
 
61
  ## πŸ“¦ Training Data
62
 
63
+ The model was fine-tuned on a custom **stratified dataset of 1,500 manually curated ISS shareholder proposals**. The dataset underwent rule-based correction to exclude purely social/governance and blend proposals.
64
 
65
  πŸ“‚ For full details on data sampling, text construction, and labeling rules, please refer to the **[gprop_training_dataset](https://huggingface.co/datasets/Jidi1997/gprop_training_dataset)**.
66
 
 
104
 
105
  ## πŸ“š Citation
106
 
107
+ If you use this model in your research, please cite the associated working paper: (Forthcoming)
108
 
109
  ---
110