Update README.md
Browse files
README.md
CHANGED
|
@@ -1,24 +1,55 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
# Load the classifier
|
| 24 |
classifier = pipeline("text-classification", model="amannor/bert-base-uncased-sdgclassifier")
|
|
@@ -36,10 +67,24 @@ text_2 = "We are a B2B platform for optimizing advertising spend on social media
|
|
| 36 |
result_2 = classifier(text_2)
|
| 37 |
print(result_2)
|
| 38 |
# [{'label': '0: No Impact', 'score': 0.95...}]
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
url={httpsall://[github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)}
|
| 45 |
-
}
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model:
|
| 6 |
+
- google-bert/bert-base-uncased
|
| 7 |
+
pipeline_tag: text-classification
|
| 8 |
+
datasets: custom
|
| 9 |
+
tags:
|
| 10 |
+
- sdg
|
| 11 |
+
- sustainable-development-goals
|
| 12 |
+
- impact-tech
|
| 13 |
+
- text-classification
|
| 14 |
+
- BERT for Startup SDG Classification
|
| 15 |
+
---
|
| 16 |
+
This is a bert-base-uncased model fine-tuned to classify startup company descriptions into one of the 17 UN Sustainable Development Goals (SDGs), plus a "no-impact" category.
|
| 17 |
+
This model was trained by Kfir Bar as part of the research paper: "Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals" (2022).
|
| 18 |
+
This repository is hosted by Alon Mannor to make the original model weights accessible to the public.
|
| 19 |
+
|
| 20 |
+
Model Details
|
| 21 |
+
Base Model: bert-base-uncased
|
| 22 |
+
Task: Text Classification
|
| 23 |
+
Labels: 18 (0: No Impact, 1-17: corresponding SDG)
|
| 24 |
+
Label Mapping (id2label) The model outputs a logit for each of the 18 classes.
|
| 25 |
+
The mapping from the index (ID) to the label name is as follows:
|
| 26 |
+
{
|
| 27 |
+
"0": "0: No Impact",
|
| 28 |
+
"1": "SDG 1: No Poverty",
|
| 29 |
+
"2": "SDG 2: Zero Hunger",
|
| 30 |
+
"3": "SDG 3: Good Health and Well-being",
|
| 31 |
+
"4": "SDG 4: Quality Education",
|
| 32 |
+
"5": "SDG 5: Gender Equality",
|
| 33 |
+
"6": "SDG 6: Clean Water and Sanitation",
|
| 34 |
+
"7": "SDG 7: Affordable and Clean Energy",
|
| 35 |
+
"8": "SDG 8: Decent Work and Economic Growth",
|
| 36 |
+
"9": "SDG 9: Industry, Innovation and Infrastructure",
|
| 37 |
+
"10": "SDG 10: Reduced Inequality",
|
| 38 |
+
"11": "SDG 11: Sustainable Cities and Communities",
|
| 39 |
+
"12": "SDG 12: Responsible Consumption and Production",
|
| 40 |
+
"13": "SDG 13: Climate Action",
|
| 41 |
+
"14": "SDG 14: Life Below Water",
|
| 42 |
+
"15": "SDG 15: Life on Land",
|
| 43 |
+
"16": "SDG 16: Peace and Justice Strong Institutions",
|
| 44 |
+
"17": "SDG 17: Partnerships to achieve the Goal"
|
| 45 |
+
}
|
| 46 |
+
|
| 47 |
+
How to Use:
|
| 48 |
+
|
| 49 |
+
You can use this model directly with the text-classification pipeline.
|
| 50 |
+
|
| 51 |
+
````python
|
| 52 |
+
from transformers import pipeline
|
| 53 |
|
| 54 |
# Load the classifier
|
| 55 |
classifier = pipeline("text-classification", model="amannor/bert-base-uncased-sdgclassifier")
|
|
|
|
| 67 |
result_2 = classifier(text_2)
|
| 68 |
print(result_2)
|
| 69 |
# [{'label': '0: No Impact', 'score': 0.95...}]
|
| 70 |
+
````
|
| 71 |
+
|
| 72 |
+
Training Data:
|
| 73 |
+
The model was trained on a dataset of 4,247 startup descriptions (from the Gidron et al. 2023 extension) aggregated from two main sources, which were manually annotated by experts:
|
| 74 |
+
Rainmaking (Compass): A global database of impact-focused startups.
|
| 75 |
+
Start-up Nation Central (SNC): A database of Israeli startups, including both impact and non-impact companies.
|
| 76 |
+
|
| 77 |
+
Performance
|
| 78 |
+
The model was evaluated on a test set of 866 startups from the original paper.
|
| 79 |
+
Task: F1-Weighted F1-Macro F1-Micro 18-Label (Full )0.7900.4730.7906-Label (5Ps)0.8360.6020.836
|
| 80 |
+
The performance for the 6-label task (People, Planet, Prosperity, Peace, Partnerships, No-Impact) was aggregated from the 18-label predictions.
|
| 81 |
+
|
| 82 |
+
Citation:
|
| 83 |
+
If you use this model or its underlying research, please cite the original paper:
|
| 84 |
+
@inproceedings{bar2022usinglm,
|
| 85 |
+
title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
|
| 86 |
+
author={Bar, Kfir},
|
| 87 |
+
booktitle={Anonymous Submission to IJCAI-22},
|
| 88 |
+
year={2022},
|
| 89 |
url={httpsall://[github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)}
|
| 90 |
+
}
|