|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
pipeline_tag: text-classification |
|
|
datasets: custom |
|
|
tags: |
|
|
- sdg |
|
|
- sustainable-development-goals |
|
|
- impact-tech |
|
|
- text-classification |
|
|
--- |
|
|
|
|
|
# BERT for Startup SDG Classification |
|
|
This is a bert-base-uncased model fine-tuned to classify startup company descriptions into one of the 17 UN Sustainable Development Goals (SDGs), plus a "no-impact" category. |
|
|
This model was trained by Kfir Bar as part of the research paper: "Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals" (2022). |
|
|
This repository is hosted by Alon Mannor to make the original model weights accessible to the public. |
|
|
|
|
|
## Model Details |
|
|
* Base Model: bert-base-uncased |
|
|
* Task: Text Classification |
|
|
* Labels: 18 (0: No Impact, 1-17: corresponding SDG) |
|
|
|
|
|
### Label Mapping (id2label) |
|
|
The model outputs a logit for each of the 18 classes. |
|
|
The mapping from the index (ID) to the label name is as follows: |
|
|
````json |
|
|
{ |
|
|
"0": "0: No Impact", |
|
|
"1": "SDG 1: No Poverty", |
|
|
"2": "SDG 2: Zero Hunger", |
|
|
"3": "SDG 3: Good Health and Well-being", |
|
|
"4": "SDG 4: Quality Education", |
|
|
"5": "SDG 5: Gender Equality", |
|
|
"6": "SDG 6: Clean Water and Sanitation", |
|
|
"7": "SDG 7: Affordable and Clean Energy", |
|
|
"8": "SDG 8: Decent Work and Economic Growth", |
|
|
"9": "SDG 9: Industry, Innovation and Infrastructure", |
|
|
"10": "SDG 10: Reduced Inequality", |
|
|
"11": "SDG 11: Sustainable Cities and Communities", |
|
|
"12": "SDG 12: Responsible Consumption and Production", |
|
|
"13": "SDG 13: Climate Action", |
|
|
"14": "SDG 14: Life Below Water", |
|
|
"15": "SDG 15: Life on Land", |
|
|
"16": "SDG 16: Peace and Justice Strong Institutions", |
|
|
"17": "SDG 17: Partnerships to achieve the Goal" |
|
|
} |
|
|
```` |
|
|
|
|
|
### How to Use: |
|
|
|
|
|
You can use this model directly with the text-classification pipeline. |
|
|
|
|
|
````python |
|
|
from transformers import pipeline |
|
|
|
|
|
# Load the classifier |
|
|
classifier = pipeline("text-classification", model="amannor/bert-base-uncased-sdgclassifier") |
|
|
|
|
|
# Example description |
|
|
text = "Our company develops innovative, low-cost solar panels to bring electricity to rural communities." |
|
|
|
|
|
# Get prediction |
|
|
result = classifier(text) |
|
|
print(result) |
|
|
# [{'label': 'SDG 7: Affordable and Clean Energy', 'score': 0.98...}] |
|
|
|
|
|
# Example of a non-impact startup |
|
|
text_2 = "We are a B2B platform for optimizing advertising spend on social media." |
|
|
result_2 = classifier(text_2) |
|
|
print(result_2) |
|
|
# [{'label': '0: No Impact', 'score': 0.95...}] |
|
|
```` |
|
|
|
|
|
### Training Data: |
|
|
The model was trained on a dataset of 4,247 startup descriptions (from the Gidron et al. 2023 extension) aggregated from two main sources, which were manually annotated by experts: |
|
|
1. Rainmaking (Compass): A global database of impact-focused startups. |
|
|
2. Start-up Nation Central (SNC): A database of Israeli startups, including both impact and non-impact companies. |
|
|
|
|
|
### Performance |
|
|
The model was evaluated on a test set of 866 startups from the original paper. |
|
|
|
|
|
| Task | F1-Weighted | |
|
|
| :-------------: | :---------: | |
|
|
| 18-Label (Full) | 0.79 | |
|
|
| 6-Label (5Ps) | 0.836 | |
|
|
|
|
|
|
|
|
The performance for the 6-label task (People, Planet, Prosperity, Peace, Partnerships, No-Impact) was aggregated from the 18-label predictions. |
|
|
|
|
|
### Citation: |
|
|
If you use this model or its underlying research, please cite the original paper: |
|
|
@inproceedings{bar2022usinglm, |
|
|
title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, |
|
|
author={Bar, Kfir}, |
|
|
booktitle={Anonymous Submission to IJCAI-22}, |
|
|
year={2022}, |
|
|
url={httpsall://[github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)} |
|
|
} |