Jidi1997 commited on
Commit
9a73d3e
·
verified ·
1 Parent(s): 13d15fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -3
README.md CHANGED
@@ -1,3 +1,94 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - climate
7
+ - ESG
8
+ - sustainable-finance
9
+ - sequence-classification
10
+ base_model: climatebert/distilroberta-base-climate-detector
11
+ metrics:
12
+ - f1
13
+ - accuracy
14
+ ---
15
+
16
+ # Green Shareholder Proposal Classifier
17
+
18
+ ## Model Summary
19
+
20
+ This model is a fine-tuned version of [`climatebert/distilroberta-base-climate-detector`](https://huggingface.co/climatebert/distilroberta-base-climate-detector), specifically designed to classify **shareholder proposals** into binary categories: green (climate/environmental) or non-green.
21
+
22
+ It was trained on a highly curated dataset of Institutional Shareholder Services (ISS) proposals, achieving an **F1 score of 0.981** on the validation set.
23
+
24
+ ## Model Details
25
+
26
+ - **Base Model:** `climatebert/distilroberta-base-climate-detector`
27
+ - **Task:** Binary Sequence Classification
28
+ - `Label 1`: Green / Climate-related proposal
29
+ - `Label 0`: Non-green proposal
30
+ - **Language:** English
31
+ - **License:** Apache 2.0 (Model weights). *Note: The dataset used for fine-tuning contains derived data subject to ISS licensing terms.*
32
+
33
+ ## Uses
34
+
35
+ ### Direct Use
36
+ The model takes a structured text input describing a shareholder proposal and predicts whether it is conceptually focused on climate change or environmental sustainability.
37
+
38
+ **Recommended Input Format:**
39
+ To achieve optimal performance, input text should mirror the structure of the training data:
40
+ > "A {sponsor_type}-type sponsor has filed a shareholder proposal to a(an) {sic2_des}-sector company. This proposal requests: {resolution}. [It falls under a broader agenda class that may include items not directly relevant to this specific proposal: {AgendaCodeInformation}]"
41
+
42
+ ### Out-of-Scope Use
43
+ - Applying the model to non-English texts.
44
+ - Using the model for automated legal or compliance decision-making without human oversight.
45
+ - Generalizing to broad ESG topics outside of strict environmental/climate scopes (e.g., social or governance issues like gender equality or animal welfare are explicitly trained as negative classes).
46
+
47
+ ## Training Data
48
+
49
+ The model was fine-tuned on a custom stratified dataset of 1,500 manually curated ISS shareholder proposals. The dataset underwent rigorous rule-based correction to exclude tangentially environmental or purely social/governance proposals.
50
+
51
+ For full details on data sampling, text construction, and labeling rules, please refer to the **[Dataset Card](在这里填入你的数据集链接)**.
52
+
53
+ - **Train split:** 1,200 examples
54
+ - **Validation split:** 300 examples
55
+
56
+ ## Training Procedure
57
+
58
+ ### Hyperparameters
59
+
60
+ The model was trained using the Hugging Face `Trainer` API with the following hyperparameters:
61
+
62
+ - **Learning rate:** 2e-05
63
+ - **Train batch size:** 16
64
+ - **Eval batch size:** 16
65
+ - **Seed:** 42
66
+ - **Weight decay:** 0.05
67
+ - **Optimizer:** AdamW
68
+ - **Number of epochs:** 10
69
+
70
+ ### Training Results
71
+
72
+ The model weights from **Epoch 8 (`checkpoint-600`)** were selected as the best performing based on the validation F1 score.
73
+
74
+ | Epoch | Training Loss | Validation Loss | Accuracy | F1 (Binary) |
75
+ |:---:|:---:|:---:|:---:|:---:|
76
+ | 1 | 0.3060 | 0.0968 | 0.9667 | 0.9675 |
77
+ | 2 | 0.0954 | 0.0898 | 0.9733 | 0.9740 |
78
+ | 3 | 0.0956 | 0.1808 | 0.9600 | 0.9623 |
79
+ | 4 | 0.0029 | 0.0783 | 0.9800 | 0.9805 |
80
+ | 5 | 0.0395 | 0.1026 | 0.9800 | 0.9803 |
81
+ | 6 | 0.0350 | 0.1308 | 0.9733 | 0.9744 |
82
+ | 7 | 0.0094 | 0.1108 | 0.9767 | 0.9772 |
83
+ | **8** | **0.0003** | **0.1182** | **0.9800** | **0.9806** |
84
+ | 9 | 0.0004 | 0.1154 | 0.9767 | 0.9773 |
85
+ | 10 | 0.0002 | 0.1229 | 0.9767 | 0.9773 |
86
+
87
+ ## Limitations and Bias
88
+
89
+ While the model achieves high accuracy on the validation set, its performance is tightly coupled with the specific linguistic patterns and taxonomy of the ISS database (e.g., SIC-2 sector descriptions, ISS agenda codes). It may exhibit lower confidence or accuracy when processing unstructured news articles, raw corporate filings, or proposals from different jurisdictional contexts outside the US/global norm represented in the training set.
90
+
91
+ ## Citation
92
+
93
+ If you use this model in your research, please cite the associated working paper:
94
+ *(Citation details forthcoming)*