File size: 4,610 Bytes
6f1363f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---

tags:
- text-classification
- sustainable-development-goals
- SDG
- transformers
- bert
- social-impact
license: mit
language:
- en
base_model:
- google-bert/bert-base-uncased
---


# SDG Startup Classifier (18-label BERT-based Model)

[![Model](https://img.shields.io/badge/model-BERT--base--uncased-blue)](https://huggingface.co/bert-base-uncased)  
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)  
[![Hugging Face](https://img.shields.io/badge/HuggingFace-BERT%20SDG%20Classifier-green)](https://huggingface.co/your-hf-username/your-model-repo-name)

---

## Model Overview

This model is a **BERT-base-uncased** transformer fine-tuned for multiclass classification of startup companies into **18 categories**: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.

It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:

> *Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals*  
> Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)

The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.

---

## Intended Use

- Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
- Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
- Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.

---

## Model Details

- **Architecture:** BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers)  
- **Number of labels:** 18 (17 SDGs + 1 no-impact)  
- **Tokenizer:** BERT-base-uncased WordPiece tokenizer  
- **Training data:** Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)  
- **Training details:** Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset  
- **Performance:** Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)

---

## How to Use

Minimal example code to load and run inference using the Hugging Face Transformers library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "amannor/bert-base-uncased-sdg-classifier"
Load tokenizer and model from Hugging Face Hub

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
Input startup description text

text = "This startup develops affordable solar panels to improve clean energy access."
Tokenize input text

inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
Forward pass

outputs = model(**inputs)
Predicted class index (0 to 17, aligned with SDGs + no-impact)

predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()

print(f"Predicted SDG label ID: {predicted_label_id}")

---

## Limitations

- The model relies solely on **textual company descriptions**, which might be promotional or biased (“greenwashing”).
- Performance may degrade on short, noisy, or non-English inputs.
- The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
- Intended to assist, not replace, expert judgment.

---

## Citation

If you use this model, please cite:

@inproceedings{bar2022ijcai,
title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
author={Bar, Kfir},
booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
year={2022}
}


You may also wish to reference the accompanying repository:  
https://github.com/Amannor/sdg-codebase

---

## License

This model is released under the **MIT License**. For more information, see the LICENSE file in this repository.

---

## Links and Resources

- [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase)  
- [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)

---

*For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.*