--- library_name: transformers language: - en license: apache-2.0 tags: - text-classification - climate - esg - environment - adaptation - roberta - binary-classification pipeline_tag: text-classification base_model: ESGBERT/EnvRoBERTa-base datasets: - custom model-index: - name: AdaptationBERT results: [] --- # AdaptationBERT A fine-tuned RoBERTa model for binary classification of climate adaptation and resilience texts in the ESG/environmental domain. Built on top of [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), AdaptationBERT is additionally fine-tuned on a 2,000-sample adaptation dataset to detect whether a given text is related to **climate adaptation and resilience**. ## Model Details ### Model Description AdaptationBERT is a domain-specific language model designed for the automatic classification of environmental texts. It identifies whether a text passage discusses climate adaptation topics such as resilience planning, adaptive capacity, vulnerability reduction, or climate risk management. - **Model type:** RoBERTa-based binary text classifier (`RobertaForSequenceClassification`) - **Language(s):** English - **License:** Apache 2.0 - **Fine-tuned from:** [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) ### Architecture | Parameter | Value | |---|---| | Hidden size | 768 | | Layers | 12 | | Attention heads | 12 | | Intermediate size | 3,072 | | Vocabulary size | 50,265 | | Max sequence length | 512 tokens | | Parameters | ~125M | | Model format | SafeTensors | ### Labels | Label | Description | |---|---| | `0` | Non-adaptation-related | | `1` | Adaptation-related | ## Uses ### Direct Use AdaptationBERT is designed for classifying English text passages as related or unrelated to climate adaptation. Typical use cases include: - Screening corporate sustainability reports for adaptation-related disclosures - Analyzing ESG filings and environmental policy documents - Large-scale text mining of climate adaptation mentions across document corpora - Supporting research on climate resilience discourse ### Recommended Pipeline It is **highly recommended** to use a two-stage classification pipeline: 1. First, classify whether a text is "environmental" using the [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) model. 2. Then, apply **AdaptationBERT** only to texts classified as environmental to determine if they are adaptation-related. This two-stage approach improves precision by filtering out non-environmental texts before adaptation classification. ### Out-of-Scope Use - Texts in languages other than English - Non-environmental domains (e.g., finance-only, legal, medical) without the upstream environmental filter - Real-time or safety-critical decision systems where misclassification could cause harm - As a sole basis for regulatory compliance decisions ## How to Get Started with the Model ```python from transformers import pipeline classifier = pipeline( "text-classification", model="ClimateLouie/AdaptationBERT", tokenizer="ClimateLouie/AdaptationBERT", ) text = "The city implemented a flood resilience plan to protect coastal infrastructure from rising sea levels." result = classifier(text) print(result) # [{'label': 'adaptation-related', 'score': 0.98}] ``` Or load the model and tokenizer directly: ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch tokenizer = AutoTokenizer.from_pretrained("your-username/AdaptationBERT") model = AutoModelForSequenceClassification.from_pretrained("your-username/AdaptationBERT") text = "Communities are developing drought-resistant farming techniques to adapt to changing rainfall patterns." inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) with torch.no_grad(): outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=-1) predicted_label = torch.argmax(predictions, dim=-1).item() label_map = {0: "non-adaptation-related", 1: "adaptation-related"} print(f"Prediction: {label_map[predicted_label]} (confidence: {predictions[0][predicted_label]:.4f})") ``` For detailed tutorials, see these guides by Tobias Schimanski on Medium: - [Model usage](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2) - [Large-scale analysis](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-2-large-scale-analyses-of-environmental-actions-0735cc8dc9c2) - [Fine-tuning your own models](https://medium.com/@schimanski.tobi/analyzing-esg-with-ai-and-nlp-tutorial-3-fine-tune-your-own-models-e3692fc0b3c0) ## Training Details ### Training Data The model was fine-tuned on a curated dataset of approximately **2,000 text samples** annotated for climate adaptation relevance. The dataset contains examples from ESG reports, sustainability disclosures, and environmental policy texts, with binary labels indicating whether each sample discusses climate adaptation and resilience. ### Training Procedure #### Base Model Training starts from [ESGBERT/EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base), which is itself a RoBERTa model further pre-trained on environmental text corpora. This provides a strong domain-specific foundation for the adaptation classification task. #### Training Hyperparameters - **Training regime:** fp32 - **Problem type:** Single-label classification - **Framework:** PyTorch + Hugging Face Transformers (v4.40.2) ## Bias, Risks, and Limitations - **Training data size:** The model was fine-tuned on only ~2,000 samples, which may limit its ability to generalize across all types of adaptation-related text. - **Language limitation:** The model only supports English text. Climate adaptation texts in other languages will not be classified correctly. - **Domain specificity:** Performance is optimized for ESG/environmental domain text. Texts from other domains discussing adaptation in non-climate contexts (e.g., biological adaptation, software adaptation) may produce false positives. - **Temporal bias:** The training data reflects adaptation terminology and framing as of the time of dataset creation. Emerging adaptation concepts or evolving terminology may not be captured. - **Geographic bias:** The training corpus may over-represent adaptation discourse from certain regions or regulatory frameworks, potentially underperforming on texts from underrepresented geographies. ### Recommendations - Always use the recommended two-stage pipeline (environmental filter + adaptation classification) for best results. - Validate model outputs on your specific corpus before using in production. - Do not use model predictions as the sole input for policy or regulatory decisions. - Consider supplementing with human review, especially for high-stakes applications. ## Technical Specifications ### Model Architecture and Objective RoBERTa (Robustly Optimized BERT Pretraining Approach) with a sequence classification head. The model uses 12 transformer layers with 12 attention heads each, a hidden size of 768, and GELU activation. Classification is performed via a linear layer on top of the `[CLS]` token representation. ### Software - **Transformers:** 4.40.2 - **Model format:** SafeTensors - **Tokenizer:** RoBERTa BPE tokenizer (50,265 tokens) ## Citation If you use this model in your research, please cite: **BibTeX:** ```bibtex @misc{adaptationbert, title={AdaptationBERT: A Fine-tuned Language Model for Climate Adaptation Text Classification}, author={Louie Woodall, inspired by Tobias Schimanski}, year={2024}, url={https://huggingface.co/ClimateLouie/AdaptationBERT} } ``` ## More Information This model is part of the [ESGBERT](https://huggingface.co/ESGBERT) family of models for ESG and environmental text analysis. Related models include: - [EnvRoBERTa-base](https://huggingface.co/ESGBERT/EnvRoBERTa-base) - Base environmental language model - [EnvironmentalBERT-environmental](https://huggingface.co/ESGBERT/EnvironmentalBERT-environmental) - Environmental text classifier (recommended upstream filter)