File size: 4,610 Bytes
6f1363f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
---
tags:
- text-classification
- sustainable-development-goals
- SDG
- transformers
- bert
- social-impact
license: mit
language:
- en
base_model:
- google-bert/bert-base-uncased
---
# SDG Startup Classifier (18-label BERT-based Model)
[](https://huggingface.co/bert-base-uncased)
[](https://opensource.org/licenses/MIT)
[](https://huggingface.co/your-hf-username/your-model-repo-name)
---
## Model Overview
This model is a **BERT-base-uncased** transformer fine-tuned for multiclass classification of startup companies into **18 categories**: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.
It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:
> *Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals*
> Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.
---
## Intended Use
- Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
- Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
- Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.
---
## Model Details
- **Architecture:** BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers)
- **Number of labels:** 18 (17 SDGs + 1 no-impact)
- **Tokenizer:** BERT-base-uncased WordPiece tokenizer
- **Training data:** Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
- **Training details:** Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
- **Performance:** Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)
---
## How to Use
Minimal example code to load and run inference using the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "amannor/bert-base-uncased-sdg-classifier"
Load tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Input startup description text
text = "This startup develops affordable solar panels to improve clean energy access."
Tokenize input text
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
Forward pass
outputs = model(**inputs)
Predicted class index (0 to 17, aligned with SDGs + no-impact)
predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()
print(f"Predicted SDG label ID: {predicted_label_id}")
---
## Limitations
- The model relies solely on **textual company descriptions**, which might be promotional or biased (“greenwashing”).
- Performance may degrade on short, noisy, or non-English inputs.
- The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
- Intended to assist, not replace, expert judgment.
---
## Citation
If you use this model, please cite:
@inproceedings{bar2022ijcai,
title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
author={Bar, Kfir},
booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
year={2022}
}
You may also wish to reference the accompanying repository:
https://github.com/Amannor/sdg-codebase
---
## License
This model is released under the **MIT License**. For more information, see the LICENSE file in this repository.
---
## Links and Resources
- [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase)
- [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)
---
*For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.* |