amannor's picture
Create README.md
6f1363f verified
---
tags:
- text-classification
- sustainable-development-goals
- SDG
- transformers
- bert
- social-impact
license: mit
language:
- en
base_model:
- google-bert/bert-base-uncased
---
# SDG Startup Classifier (18-label BERT-based Model)
[![Model](https://img.shields.io/badge/model-BERT--base--uncased-blue)](https://huggingface.co/bert-base-uncased)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Hugging Face](https://img.shields.io/badge/HuggingFace-BERT%20SDG%20Classifier-green)](https://huggingface.co/your-hf-username/your-model-repo-name)
---
## Model Overview
This model is a **BERT-base-uncased** transformer fine-tuned for multiclass classification of startup companies into **18 categories**: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.
It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:
> *Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals*
> Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.
---
## Intended Use
- Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
- Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
- Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.
---
## Model Details
- **Architecture:** BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers)
- **Number of labels:** 18 (17 SDGs + 1 no-impact)
- **Tokenizer:** BERT-base-uncased WordPiece tokenizer
- **Training data:** Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
- **Training details:** Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
- **Performance:** Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)
---
## How to Use
Minimal example code to load and run inference using the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "amannor/bert-base-uncased-sdg-classifier"
Load tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Input startup description text
text = "This startup develops affordable solar panels to improve clean energy access."
Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
Forward pass
outputs = model(**inputs)
Predicted class index (0 to 17, aligned with SDGs + no-impact)
predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()
print(f"Predicted SDG label ID: {predicted_label_id}")
---
## Limitations
- The model relies solely on **textual company descriptions**, which might be promotional or biased (“greenwashing”).
- Performance may degrade on short, noisy, or non-English inputs.
- The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
- Intended to assist, not replace, expert judgment.
---
## Citation
If you use this model, please cite:
@inproceedings{bar2022ijcai,
title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
author={Bar, Kfir},
booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
year={2022}
}
You may also wish to reference the accompanying repository:
https://github.com/Amannor/sdg-codebase
---
## License
This model is released under the **MIT License**. For more information, see the LICENSE file in this repository.
---
## Links and Resources
- [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase)
- [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
---
*For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.*