AdaptationBERT / README.md
ClimateLouie's picture
Update README.md
250bba2 verified
---
library_name: transformers
language:
- en
license: apache-2.0
tags:
- text-classification
- climate
- esg
- environment
- adaptation
- roberta
- binary-classification
pipeline_tag: text-classification
base_model: ESGBERT/EnvRoBERTa-base
datasets:
- custom
model-index:
- name: AdaptationBERT
results: []
---
# AdaptationBERT
A fine-tuned RoBERTa model for binary classification of climate adaptation and resilience texts in the ESG/environmental domain.
Built on top of [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), AdaptationBERT is additionally fine-tuned on a 2,000-sample adaptation dataset to detect whether a given text is related to **climate adaptation and resilience**.
## Model Details
### Model Description
AdaptationBERT is a domain-specific language model designed for the automatic classification of environmental texts. It identifies whether a text passage discusses climate adaptation topics such as resilience planning, adaptive capacity, vulnerability reduction, or climate risk management.
- **Model type:** RoBERTa-based binary text classifier (`RobertaForSequenceClassification`)
- **Language(s):** English
- **License:** Apache 2.0
- **Fine-tuned from:** [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base)
### Architecture
| Parameter | Value |
|---|---|
| Hidden size | 768 |
| Layers | 12 |
| Attention heads | 12 |
| Intermediate size | 3,072 |
| Vocabulary size | 50,265 |
| Max sequence length | 512 tokens |
| Parameters | ~125M |
| Model format | SafeTensors |
### Labels
| Label | Description |
|---|---|
| `0` | Non-adaptation-related |
| `1` | Adaptation-related |
## Uses
### Direct Use
AdaptationBERT is designed for classifying English text passages as related or unrelated to climate adaptation. Typical use cases include:
- Screening corporate sustainability reports for adaptation-related disclosures
- Analyzing ESG filings and environmental policy documents
- Large-scale text mining of climate adaptation mentions across document corpora
- Supporting research on climate resilience discourse
### Recommended Pipeline
It is **highly recommended** to use a two-stage classification pipeline:
1. First, classify whether a text is "environmental" using the [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) model.
2. Then, apply **AdaptationBERT** only to texts classified as environmental to determine if they are adaptation-related.
This two-stage approach improves precision by filtering out non-environmental texts before adaptation classification.
### Out-of-Scope Use
- Texts in languages other than English
- Non-environmental domains (e.g., finance-only, legal, medical) without the upstream environmental filter
- Real-time or safety-critical decision systems where misclassification could cause harm
- As a sole basis for regulatory compliance decisions
## How to Get Started with the Model
```python
from transformers import pipeline
classifier = pipeline(
"text-classification",
model="ClimateLouie/AdaptationBERT",
tokenizer="ClimateLouie/AdaptationBERT",
)
text = "The city implemented a flood resilience plan to protect coastal infrastructure from rising sea levels."
result = classifier(text)
print(result)
# [{'label': 'adaptation-related', 'score': 0.98}]
```
Or load the model and tokenizer directly:
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("your-username/AdaptationBERT")
model = AutoModelForSequenceClassification.from_pretrained("your-username/AdaptationBERT")
text = "Communities are developing drought-resistant farming techniques to adapt to changing rainfall patterns."
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
predicted_label = torch.argmax(predictions, dim=-1).item()
label_map = {0: "non-adaptation-related", 1: "adaptation-related"}
print(f"Prediction: {label_map[predicted_label]} (confidence: {predictions[0][predicted_label]:.4f})")
```
For detailed tutorials, see these guides by Tobias Schimanski on Medium:
- [Model usage](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2)
- [Large-scale analysis](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2)
- [Fine-tuning your own models](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-3-fine-tune-your-own-models-e3692fc0b3c0)
## Training Details
### Training Data
The model was fine-tuned on a curated dataset of approximately **2,000 text samples** annotated for climate adaptation relevance. The dataset contains examples from ESG reports, sustainability disclosures, and environmental policy texts, with binary labels indicating whether each sample discusses climate adaptation and resilience.
### Training Procedure
#### Base Model
Training starts from [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), which is itself a RoBERTa model further pre-trained on environmental text corpora. This provides a strong domain-specific foundation for the adaptation classification task.
#### Training Hyperparameters
- **Training regime:** fp32
- **Problem type:** Single-label classification
- **Framework:** PyTorch + Hugging Face Transformers (v4.40.2)
## Bias, Risks, and Limitations
- **Training data size:** The model was fine-tuned on only ~2,000 samples, which may limit its ability to generalize across all types of adaptation-related text.
- **Language limitation:** The model only supports English text. Climate adaptation texts in other languages will not be classified correctly.
- **Domain specificity:** Performance is optimized for ESG/environmental domain text. Texts from other domains discussing adaptation in non-climate contexts (e.g., biological adaptation, software adaptation) may produce false positives.
- **Temporal bias:** The training data reflects adaptation terminology and framing as of the time of dataset creation. Emerging adaptation concepts or evolving terminology may not be captured.
- **Geographic bias:** The training corpus may over-represent adaptation discourse from certain regions or regulatory frameworks, potentially underperforming on texts from underrepresented geographies.
### Recommendations
- Always use the recommended two-stage pipeline (environmental filter + adaptation classification) for best results.
- Validate model outputs on your specific corpus before using in production.
- Do not use model predictions as the sole input for policy or regulatory decisions.
- Consider supplementing with human review, especially for high-stakes applications.
## Technical Specifications
### Model Architecture and Objective
RoBERTa (Robustly Optimized BERT Pretraining Approach) with a sequence classification head. The model uses 12 transformer layers with 12 attention heads each, a hidden size of 768, and GELU activation. Classification is performed via a linear layer on top of the `[CLS]` token representation.
### Software
- **Transformers:** 4.40.2
- **Model format:** SafeTensors
- **Tokenizer:** RoBERTa BPE tokenizer (50,265 tokens)
## Citation
If you use this model in your research, please cite:
**BibTeX:**
```bibtex
@misc{adaptationbert,
title={AdaptationBERT: A Fine-tuned Language Model for Climate Adaptation Text Classification},
author={Louie Woodall, inspired by Tobias Schimanski},
year={2024},
url={https://huggingface.co/ClimateLouie/AdaptationBERT}
}
```
## More Information
This model is part of the [ESGBERT](https://huggingface.co/ESGBERT) family of models for ESG and environmental text analysis. Related models include:
- [EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) - Base environmental language model
- [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) - Environmental text classifier (recommended upstream filter)