amannor commited on
Commit
c4bc385
·
verified ·
1 Parent(s): 6b25a75

Initial commit

Browse files
README.md CHANGED
@@ -1,3 +1,88 @@
1
- ---
2
  license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ language: en
2
  license: mit
3
+ pipeline_tag: text-classification
4
+ base_model: bert-base-uncased
5
+ datasets:
6
+
7
+ custom
8
+ tags:
9
+
10
+ sdg
11
+
12
+ sustainable-development-goals
13
+
14
+ impact-tech
15
+
16
+ text-classification
17
+
18
+ BERT for Startup SDG Classification
19
+
20
+ This is a bert-base-uncased model fine-tuned to classify startup company descriptions into one of the 17 UN Sustainable Development Goals (SDGs), plus a "no-impact" category.
21
+
22
+ This model was trained by Kfir Bar as part of the research paper: "Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals" (2022).
23
+
24
+ This repository is hosted by Alon Mannor to make the original model weights accessible to the public.
25
+
26
+ Model Details
27
+
28
+ Base Model: bert-base-uncased
29
+
30
+ Task: Text Classification
31
+
32
+ Labels: 18 (0: No Impact, 1-17: corresponding SDG)
33
+
34
+ Label Mapping (id2label)
35
+
36
+ The model outputs a logit for each of the 18 classes. The mapping from the index (ID) to the label name is as follows:
37
+
38
+ { "0": "0: No Impact", "1": "SDG 1: No Poverty", "2": "SDG 2: Zero Hunger", "3": "SDG 3: Good Health and Well-being", "4": "SDG 4: Quality Education", "5": "SDG 5: Gender Equality", "6": "SDG 6: Clean Water and Sanitation", "7": "SDG 7: Affordable and Clean Energy", "8": "SDG 8: Decent Work and Economic Growth", "9": "SDG 9: Industry, Innovation and Infrastructure", "10": "SDG 10: Reduced Inequality", "11": "SDG 11: Sustainable Cities and Communities", "12": "SDG 12: Responsible Consumption and Production", "13": "SDG 13: Climate Action", "14": "SDG 14: Life Below Water", "15": "SDG 15: Life on Land", "16": "SDG 16: Peace and Justice Strong Institutions", "17": "SDG 17: Partnerships to achieve the Goal" }
39
+
40
+ How to Use
41
+
42
+ You can use this model directly with the text-classification pipeline.
43
+
44
+ from transformers import pipeline # Load the classifier classifier = pipeline("text-classification", model="amannor/bert-base-uncased-sdgclassifier") # Example description text = "Our company develops innovative, low-cost solar panels to bring electricity to rural communities." # Get prediction result = classifier(text) print(result) # [{'label': 'SDG 7: Affordable and Clean Energy', 'score': 0.98...}] # Example of a non-impact startup text_2 = "We are a B2B platform for optimizing advertising spend on social media." result_2 = classifier(text_2) print(result_2) # [{'label': '0: No Impact', 'score': 0.95...}]
45
+
46
+ Training Data
47
+
48
+ The model was trained on a dataset of 4,247 startup descriptions (from the Gidron et al. 2023 extension) aggregated from two main sources, which were manually annotated by experts:
49
+
50
+ Rainmaking (Compass): A global database of impact-focused startups.
51
+
52
+ Start-up Nation Central (SNC): A database of Israeli startups, including both impact and non-impact companies.
53
+
54
+ Performance
55
+
56
+ The model was evaluated on a test set of 866 startups from the original paper.
57
+
58
+ Task
59
+
60
+ F1-Weighted
61
+
62
+ F1-Macro
63
+
64
+ F1-Micro
65
+
66
+ 18-Label (Full)
67
+
68
+ 0.790
69
+
70
+ 0.473
71
+
72
+ 0.790
73
+
74
+ 6-Label (5Ps)
75
+
76
+ 0.836
77
+
78
+ 0.602
79
+
80
+ 0.836
81
+
82
+ The performance for the 6-label task (People, Planet, Prosperity, Peace, Partnerships, No-Impact) was aggregated from the 18-label predictions.
83
+
84
+ Citation
85
+
86
+ If you use this model or its underlying research, please cite the original paper:
87
+
88
+ @inproceedings{bar2022usinglm, title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals}, author={Bar, Kfir}, booktitle={Anonymous Submission to IJCAI-22}, year={2022}, url={[https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)} }
config.json ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "models/bert-base-uncased",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "id2label": {
13
+ "0": "LABEL_0",
14
+ "1": "LABEL_1",
15
+ "2": "LABEL_2",
16
+ "3": "LABEL_3",
17
+ "4": "LABEL_4",
18
+ "5": "LABEL_5",
19
+ "6": "LABEL_6",
20
+ "7": "LABEL_7",
21
+ "8": "LABEL_8",
22
+ "9": "LABEL_9",
23
+ "10": "LABEL_10",
24
+ "11": "LABEL_11",
25
+ "12": "LABEL_12",
26
+ "13": "LABEL_13",
27
+ "14": "LABEL_14",
28
+ "15": "LABEL_15",
29
+ "16": "LABEL_16",
30
+ "17": "LABEL_17"
31
+ },
32
+ "initializer_range": 0.02,
33
+ "intermediate_size": 3072,
34
+ "label2id": {
35
+ "LABEL_0": 0,
36
+ "LABEL_1": 1,
37
+ "LABEL_10": 10,
38
+ "LABEL_11": 11,
39
+ "LABEL_12": 12,
40
+ "LABEL_13": 13,
41
+ "LABEL_14": 14,
42
+ "LABEL_15": 15,
43
+ "LABEL_16": 16,
44
+ "LABEL_17": 17,
45
+ "LABEL_2": 2,
46
+ "LABEL_3": 3,
47
+ "LABEL_4": 4,
48
+ "LABEL_5": 5,
49
+ "LABEL_6": 6,
50
+ "LABEL_7": 7,
51
+ "LABEL_8": 8,
52
+ "LABEL_9": 9
53
+ },
54
+ "layer_norm_eps": 1e-12,
55
+ "max_position_embeddings": 512,
56
+ "model_type": "bert",
57
+ "num_attention_heads": 12,
58
+ "num_hidden_layers": 12,
59
+ "pad_token_id": 0,
60
+ "position_embedding_type": "absolute",
61
+ "problem_type": "single_label_classification",
62
+ "torch_dtype": "float32",
63
+ "transformers_version": "4.15.0",
64
+ "type_vocab_size": 2,
65
+ "use_cache": true,
66
+ "vocab_size": 30522
67
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ebc44bf07e6099bb364051b4670f85826008359c0c555159f782133a23fba546
3
+ size 438068461
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "model_max_length": 512,
50
+ "pad_token": "[PAD]",
51
+ "sep_token": "[SEP]",
52
+ "strip_accents": null,
53
+ "tokenize_chinese_chars": true,
54
+ "tokenizer_class": "BertTokenizer",
55
+ "unk_token": "[UNK]"
56
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff