| | --- |
| | library_name: transformers |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - text-classification |
| | - climate |
| | - esg |
| | - environment |
| | - adaptation |
| | - roberta |
| | - binary-classification |
| | pipeline_tag: text-classification |
| | base_model: ESGBERT/EnvRoBERTa-base |
| | datasets: |
| | - custom |
| | model-index: |
| | - name: AdaptationBERT |
| | results: [] |
| | --- |
| | |
| | # AdaptationBERT |
| |
|
| | A fine-tuned RoBERTa model for binary classification of climate adaptation and resilience texts in the ESG/environmental domain. |
| |
|
| | Built on top of [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), AdaptationBERT is additionally fine-tuned on a 2,000-sample adaptation dataset to detect whether a given text is related to **climate adaptation and resilience**. |
| |
|
| | ## Model Details |
| |
|
| | ### Model Description |
| |
|
| | AdaptationBERT is a domain-specific language model designed for the automatic classification of environmental texts. It identifies whether a text passage discusses climate adaptation topics such as resilience planning, adaptive capacity, vulnerability reduction, or climate risk management. |
| |
|
| | - **Model type:** RoBERTa-based binary text classifier (`RobertaForSequenceClassification`) |
| | - **Language(s):** English |
| | - **License:** Apache 2.0 |
| | - **Fine-tuned from:** [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) |
| |
|
| | ### Architecture |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Hidden size | 768 | |
| | | Layers | 12 | |
| | | Attention heads | 12 | |
| | | Intermediate size | 3,072 | |
| | | Vocabulary size | 50,265 | |
| | | Max sequence length | 512 tokens | |
| | | Parameters | ~125M | |
| | | Model format | SafeTensors | |
| |
|
| | ### Labels |
| |
|
| | | Label | Description | |
| | |---|---| |
| | | `0` | Non-adaptation-related | |
| | | `1` | Adaptation-related | |
| |
|
| | ## Uses |
| |
|
| | ### Direct Use |
| |
|
| | AdaptationBERT is designed for classifying English text passages as related or unrelated to climate adaptation. Typical use cases include: |
| |
|
| | - Screening corporate sustainability reports for adaptation-related disclosures |
| | - Analyzing ESG filings and environmental policy documents |
| | - Large-scale text mining of climate adaptation mentions across document corpora |
| | - Supporting research on climate resilience discourse |
| |
|
| | ### Recommended Pipeline |
| |
|
| | It is **highly recommended** to use a two-stage classification pipeline: |
| |
|
| | 1. First, classify whether a text is "environmental" using the [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) model. |
| | 2. Then, apply **AdaptationBERT** only to texts classified as environmental to determine if they are adaptation-related. |
| |
|
| | This two-stage approach improves precision by filtering out non-environmental texts before adaptation classification. |
| |
|
| | ### Out-of-Scope Use |
| |
|
| | - Texts in languages other than English |
| | - Non-environmental domains (e.g., finance-only, legal, medical) without the upstream environmental filter |
| | - Real-time or safety-critical decision systems where misclassification could cause harm |
| | - As a sole basis for regulatory compliance decisions |
| |
|
| | ## How to Get Started with the Model |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline( |
| | "text-classification", |
| | model="ClimateLouie/AdaptationBERT", |
| | tokenizer="ClimateLouie/AdaptationBERT", |
| | ) |
| | |
| | text = "The city implemented a flood resilience plan to protect coastal infrastructure from rising sea levels." |
| | result = classifier(text) |
| | print(result) |
| | # [{'label': 'adaptation-related', 'score': 0.98}] |
| | ``` |
| |
|
| | Or load the model and tokenizer directly: |
| |
|
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| | import torch |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("your-username/AdaptationBERT") |
| | model = AutoModelForSequenceClassification.from_pretrained("your-username/AdaptationBERT") |
| | |
| | text = "Communities are developing drought-resistant farming techniques to adapt to changing rainfall patterns." |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | predictions = torch.softmax(outputs.logits, dim=-1) |
| | predicted_label = torch.argmax(predictions, dim=-1).item() |
| | |
| | label_map = {0: "non-adaptation-related", 1: "adaptation-related"} |
| | print(f"Prediction: {label_map[predicted_label]} (confidence: {predictions[0][predicted_label]:.4f})") |
| | ``` |
| |
|
| | For detailed tutorials, see these guides by Tobias Schimanski on Medium: |
| | - [Model usage](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2) |
| | - [Large-scale analysis](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2) |
| | - [Fine-tuning your own models](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-3-fine-tune-your-own-models-e3692fc0b3c0) |
| |
|
| | ## Training Details |
| |
|
| | ### Training Data |
| |
|
| | The model was fine-tuned on a curated dataset of approximately **2,000 text samples** annotated for climate adaptation relevance. The dataset contains examples from ESG reports, sustainability disclosures, and environmental policy texts, with binary labels indicating whether each sample discusses climate adaptation and resilience. |
| |
|
| | ### Training Procedure |
| |
|
| | #### Base Model |
| |
|
| | Training starts from [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), which is itself a RoBERTa model further pre-trained on environmental text corpora. This provides a strong domain-specific foundation for the adaptation classification task. |
| |
|
| | #### Training Hyperparameters |
| |
|
| | - **Training regime:** fp32 |
| | - **Problem type:** Single-label classification |
| | - **Framework:** PyTorch + Hugging Face Transformers (v4.40.2) |
| |
|
| | ## Bias, Risks, and Limitations |
| |
|
| | - **Training data size:** The model was fine-tuned on only ~2,000 samples, which may limit its ability to generalize across all types of adaptation-related text. |
| | - **Language limitation:** The model only supports English text. Climate adaptation texts in other languages will not be classified correctly. |
| | - **Domain specificity:** Performance is optimized for ESG/environmental domain text. Texts from other domains discussing adaptation in non-climate contexts (e.g., biological adaptation, software adaptation) may produce false positives. |
| | - **Temporal bias:** The training data reflects adaptation terminology and framing as of the time of dataset creation. Emerging adaptation concepts or evolving terminology may not be captured. |
| | - **Geographic bias:** The training corpus may over-represent adaptation discourse from certain regions or regulatory frameworks, potentially underperforming on texts from underrepresented geographies. |
| |
|
| | ### Recommendations |
| |
|
| | - Always use the recommended two-stage pipeline (environmental filter + adaptation classification) for best results. |
| | - Validate model outputs on your specific corpus before using in production. |
| | - Do not use model predictions as the sole input for policy or regulatory decisions. |
| | - Consider supplementing with human review, especially for high-stakes applications. |
| |
|
| | ## Technical Specifications |
| |
|
| | ### Model Architecture and Objective |
| |
|
| | RoBERTa (Robustly Optimized BERT Pretraining Approach) with a sequence classification head. The model uses 12 transformer layers with 12 attention heads each, a hidden size of 768, and GELU activation. Classification is performed via a linear layer on top of the `[CLS]` token representation. |
| |
|
| | ### Software |
| |
|
| | - **Transformers:** 4.40.2 |
| | - **Model format:** SafeTensors |
| | - **Tokenizer:** RoBERTa BPE tokenizer (50,265 tokens) |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite: |
| |
|
| | **BibTeX:** |
| |
|
| | ```bibtex |
| | @misc{adaptationbert, |
| | title={AdaptationBERT: A Fine-tuned Language Model for Climate Adaptation Text Classification}, |
| | author={Louie Woodall, inspired by Tobias Schimanski}, |
| | year={2024}, |
| | url={https://huggingface.co/ClimateLouie/AdaptationBERT} |
| | } |
| | ``` |
| |
|
| | ## More Information |
| |
|
| | This model is part of the [ESGBERT](https://huggingface.co/ESGBERT) family of models for ESG and environmental text analysis. Related models include: |
| |
|
| | - [EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) - Base environmental language model |
| | - [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) - Environmental text classifier (recommended upstream filter) |
| |
|