ClimateLouie commited on
Commit
7d9d016
·
verified ·
1 Parent(s): 525de7a

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +195 -0
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ language:
4
+ - en
5
+ license: apache-2.0
6
+ tags:
7
+ - text-classification
8
+ - climate
9
+ - esg
10
+ - environment
11
+ - adaptation
12
+ - roberta
13
+ - binary-classification
14
+ pipeline_tag: text-classification
15
+ base_model: ESGBERT/EnvRoBERTa-base
16
+ datasets:
17
+ - custom
18
+ model-index:
19
+ - name: AdaptationBERT
20
+ results: []
21
+ ---
22
+
23
+ # AdaptationBERT
24
+
25
+ A fine-tuned RoBERTa model for binary classification of climate adaptation and resilience texts in the ESG/environmental domain.
26
+
27
+ Built on top of [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), AdaptationBERT is additionally fine-tuned on a 2,000-sample adaptation dataset to detect whether a given text is related to **climate adaptation and resilience**.
28
+
29
+ ## Model Details
30
+
31
+ ### Model Description
32
+
33
+ AdaptationBERT is a domain-specific language model designed for the automatic classification of environmental texts. It identifies whether a text passage discusses climate adaptation topics such as resilience planning, adaptive capacity, vulnerability reduction, or climate risk management.
34
+
35
+ - **Model type:** RoBERTa-based binary text classifier (`RobertaForSequenceClassification`)
36
+ - **Language(s):** English
37
+ - **License:** Apache 2.0
38
+ - **Fine-tuned from:** [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base)
39
+
40
+ ### Architecture
41
+
42
+ | Parameter | Value |
43
+ |---|---|
44
+ | Hidden size | 768 |
45
+ | Layers | 12 |
46
+ | Attention heads | 12 |
47
+ | Intermediate size | 3,072 |
48
+ | Vocabulary size | 50,265 |
49
+ | Max sequence length | 512 tokens |
50
+ | Parameters | ~125M |
51
+ | Model format | SafeTensors |
52
+
53
+ ### Labels
54
+
55
+ | Label | Description |
56
+ |---|---|
57
+ | `0` | Non-adaptation-related |
58
+ | `1` | Adaptation-related |
59
+
60
+ ## Uses
61
+
62
+ ### Direct Use
63
+
64
+ AdaptationBERT is designed for classifying English text passages as related or unrelated to climate adaptation. Typical use cases include:
65
+
66
+ - Screening corporate sustainability reports for adaptation-related disclosures
67
+ - Analyzing ESG filings and environmental policy documents
68
+ - Large-scale text mining of climate adaptation mentions across document corpora
69
+ - Supporting research on climate resilience discourse
70
+
71
+ ### Recommended Pipeline
72
+
73
+ It is **highly recommended** to use a two-stage classification pipeline:
74
+
75
+ 1. First, classify whether a text is "environmental" using the [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) model.
76
+ 2. Then, apply **AdaptationBERT** only to texts classified as environmental to determine if they are adaptation-related.
77
+
78
+ This two-stage approach improves precision by filtering out non-environmental texts before adaptation classification.
79
+
80
+ ### Out-of-Scope Use
81
+
82
+ - Texts in languages other than English
83
+ - Non-environmental domains (e.g., finance-only, legal, medical) without the upstream environmental filter
84
+ - Real-time or safety-critical decision systems where misclassification could cause harm
85
+ - As a sole basis for regulatory compliance decisions
86
+
87
+ ## How to Get Started with the Model
88
+
89
+ ```python
90
+ from transformers import pipeline
91
+
92
+ classifier = pipeline(
93
+ "text-classification",
94
+ model="your-username/AdaptationBERT",
95
+ tokenizer="your-username/AdaptationBERT",
96
+ )
97
+
98
+ text = "The city implemented a flood resilience plan to protect coastal infrastructure from rising sea levels."
99
+ result = classifier(text)
100
+ print(result)
101
+ # [{'label': 'adaptation-related', 'score': 0.98}]
102
+ ```
103
+
104
+ Or load the model and tokenizer directly:
105
+
106
+ ```python
107
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
108
+ import torch
109
+
110
+ tokenizer = AutoTokenizer.from_pretrained("your-username/AdaptationBERT")
111
+ model = AutoModelForSequenceClassification.from_pretrained("your-username/AdaptationBERT")
112
+
113
+ text = "Communities are developing drought-resistant farming techniques to adapt to changing rainfall patterns."
114
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
115
+
116
+ with torch.no_grad():
117
+ outputs = model(**inputs)
118
+ predictions = torch.softmax(outputs.logits, dim=-1)
119
+ predicted_label = torch.argmax(predictions, dim=-1).item()
120
+
121
+ label_map = {0: "non-adaptation-related", 1: "adaptation-related"}
122
+ print(f"Prediction: {label_map[predicted_label]} (confidence: {predictions[0][predicted_label]:.4f})")
123
+ ```
124
+
125
+ For detailed tutorials, see these guides by Tobias Schimanski on Medium:
126
+ - [Model usage](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2)
127
+ - [Large-scale analysis](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2)
128
+ - [Fine-tuning your own models](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-3-fine-tune-your-own-models-e3692fc0b3c0)
129
+
130
+ ## Training Details
131
+
132
+ ### Training Data
133
+
134
+ The model was fine-tuned on a curated dataset of approximately **2,000 text samples** annotated for climate adaptation relevance. The dataset contains examples from ESG reports, sustainability disclosures, and environmental policy texts, with binary labels indicating whether each sample discusses climate adaptation and resilience.
135
+
136
+ ### Training Procedure
137
+
138
+ #### Base Model
139
+
140
+ Training starts from [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), which is itself a RoBERTa model further pre-trained on environmental text corpora. This provides a strong domain-specific foundation for the adaptation classification task.
141
+
142
+ #### Training Hyperparameters
143
+
144
+ - **Training regime:** fp32
145
+ - **Problem type:** Single-label classification
146
+ - **Framework:** PyTorch + Hugging Face Transformers (v4.40.2)
147
+
148
+ ## Bias, Risks, and Limitations
149
+
150
+ - **Training data size:** The model was fine-tuned on only ~2,000 samples, which may limit its ability to generalize across all types of adaptation-related text.
151
+ - **Language limitation:** The model only supports English text. Climate adaptation texts in other languages will not be classified correctly.
152
+ - **Domain specificity:** Performance is optimized for ESG/environmental domain text. Texts from other domains discussing adaptation in non-climate contexts (e.g., biological adaptation, software adaptation) may produce false positives.
153
+ - **Temporal bias:** The training data reflects adaptation terminology and framing as of the time of dataset creation. Emerging adaptation concepts or evolving terminology may not be captured.
154
+ - **Geographic bias:** The training corpus may over-represent adaptation discourse from certain regions or regulatory frameworks, potentially underperforming on texts from underrepresented geographies.
155
+
156
+ ### Recommendations
157
+
158
+ - Always use the recommended two-stage pipeline (environmental filter + adaptation classification) for best results.
159
+ - Validate model outputs on your specific corpus before using in production.
160
+ - Do not use model predictions as the sole input for policy or regulatory decisions.
161
+ - Consider supplementing with human review, especially for high-stakes applications.
162
+
163
+ ## Technical Specifications
164
+
165
+ ### Model Architecture and Objective
166
+
167
+ RoBERTa (Robustly Optimized BERT Pretraining Approach) with a sequence classification head. The model uses 12 transformer layers with 12 attention heads each, a hidden size of 768, and GELU activation. Classification is performed via a linear layer on top of the `[CLS]` token representation.
168
+
169
+ ### Software
170
+
171
+ - **Transformers:** 4.40.2
172
+ - **Model format:** SafeTensors
173
+ - **Tokenizer:** RoBERTa BPE tokenizer (50,265 tokens)
174
+
175
+ ## Citation
176
+
177
+ If you use this model in your research, please cite:
178
+
179
+ **BibTeX:**
180
+
181
+ ```bibtex
182
+ @misc{adaptationbert,
183
+ title={AdaptationBERT: A Fine-tuned Language Model for Climate Adaptation Text Classification},
184
+ author={Tobias Schimanski},
185
+ year={2024},
186
+ url={https://huggingface.co/ESGBERT/AdaptationBERT}
187
+ }
188
+ ```
189
+
190
+ ## More Information
191
+
192
+ This model is part of the [ESGBERT](https://huggingface.co/ESGBERT) family of models for ESG and environmental text analysis. Related models include:
193
+
194
+ - [EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) - Base environmental language model
195
+ - [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) - Environmental text classifier (recommended upstream filter)