--- tags: - text-classification - sustainable-development-goals - SDG - transformers - bert - social-impact license: mit language: - en base_model: - google-bert/bert-base-uncased --- # SDG Startup Classifier (18-label BERT-based Model) [![Model](https://img.shields.io/badge/model-BERT--base--uncased-blue)](https://huggingface.co/bert-base-uncased) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Hugging Face](https://img.shields.io/badge/HuggingFace-BERT%20SDG%20Classifier-green)](https://huggingface.co/your-hf-username/your-model-repo-name) --- ## Model Overview This model is a **BERT-base-uncased** transformer fine-tuned for multiclass classification of startup companies into **18 categories**: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label. It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar: > *Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals* > Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf) The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus. --- ## Intended Use - Automatic SDG classification of startup textual descriptions, mission statements, and product/service information. - Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs. - Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling. --- ## Model Details - **Architecture:** BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers) - **Number of labels:** 18 (17 SDGs + 1 no-impact) - **Tokenizer:** BERT-base-uncased WordPiece tokenizer - **Training data:** Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022) - **Training details:** Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset - **Performance:** Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper) --- ## How to Use Minimal example code to load and run inference using the Hugging Face Transformers library: from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch model_name = "amannor/bert-base-uncased-sdg-classifier" Load tokenizer and model from Hugging Face Hub tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) Input startup description text text = "This startup develops affordable solar panels to improve clean energy access." Tokenize input text inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True) Forward pass outputs = model(**inputs) Predicted class index (0 to 17, aligned with SDGs + no-impact) predicted_label_id = torch.argmax(outputs.logits, dim=-1).item() print(f"Predicted SDG label ID: {predicted_label_id}") --- ## Limitations - The model relies solely on **textual company descriptions**, which might be promotional or biased (“greenwashing”). - Performance may degrade on short, noisy, or non-English inputs. - The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal. - Intended to assist, not replace, expert judgment. --- ## Citation If you use this model, please cite: @inproceedings{bar2022ijcai, title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, author={Bar, Kfir}, booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)}, year={2022} } You may also wish to reference the accompanying repository: https://github.com/Amannor/sdg-codebase --- ## License This model is released under the **MIT License**. For more information, see the LICENSE file in this repository. --- ## Links and Resources - [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase) - [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf) --- *For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.*