bert-base-uncased-sdg-classifier / README.md

Create README.md

6f1363f verified 5 months ago

4.61 kB

	---
	tags:
	- text-classification
	- sustainable-development-goals
	- SDG
	- transformers
	- bert
	- social-impact
	license: mit
	language:
	- en
	base_model:
	- google-bert/bert-base-uncased
	---

	# SDG Startup Classifier (18-label BERT-based Model)

	[![Model](https://img.shields.io/badge/model-BERT--base--uncased-blue)](https://huggingface.co/bert-base-uncased)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
	[![Hugging Face](https://img.shields.io/badge/HuggingFace-BERT%20SDG%20Classifier-green)](https://huggingface.co/your-hf-username/your-model-repo-name)

	---

	## Model Overview

	This model is a BERT-base-uncased transformer fine-tuned for multiclass classification of startup companies into 18 categories: the 17 United Nations Sustainable Development Goals (SDGs) plus a "no-impact" label.

	It is based on the methodology and dataset described in the IJCAI 2022 paper by Kfir Bar:

	> Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals
	> Kfir Bar (2022) — [Paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)

	The model takes as input textual company descriptions, mission statements, and product summaries and predicts the most relevant SDG label reflecting the company's social or environmental impact focus.

	---

	## Intended Use

	- Automatic SDG classification of startup textual descriptions, mission statements, and product/service information.
	- Support for impact investors, researchers, policymakers, and analysts interested in assessing startup alignment with SDGs.
	- Multiclass classification into all 17 SDGs plus a no-impact class, useful for comprehensive sustainability profiling.

	---

	## Model Details

	- Architecture: BERT-base-uncased (`bert-base-uncased` from Hugging Face Transformers)
	- Number of labels: 18 (17 SDGs + 1 no-impact)
	- Tokenizer: BERT-base-uncased WordPiece tokenizer
	- Training data: Proprietary dataset of startup descriptions labeled by SDG, as described in Bar (2022)
	- Training details: Fine-tuned using AdamW optimizer, learning rate approx. 2e-5, for multiple epochs on an annotated dataset
	- Performance: Approximately 77% accuracy on the 5 aggregated SDG groups, with competitive performance on the full 18-label task (per original paper)

	---

	## How to Use

	Minimal example code to load and run inference using the Hugging Face Transformers library:
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_name = "amannor/bert-base-uncased-sdg-classifier"
	Load tokenizer and model from Hugging Face Hub

	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)
	Input startup description text

	text = "This startup develops affordable solar panels to improve clean energy access."
	Tokenize input text

	inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
	Forward pass

	outputs = model(**inputs)
	Predicted class index (0 to 17, aligned with SDGs + no-impact)

	predicted_label_id = torch.argmax(outputs.logits, dim=-1).item()

	print(f"Predicted SDG label ID: {predicted_label_id}")

	---

	## Limitations

	- The model relies solely on textual company descriptions, which might be promotional or biased (“greenwashing”).
	- Performance may degrade on short, noisy, or non-English inputs.
	- The training dataset was geographically and linguistically limited; generalization outside these domains may be suboptimal.
	- Intended to assist, not replace, expert judgment.

	---

	## Citation

	If you use this model, please cite:

	@inproceedings{bar2022ijcai,
	title={Using Language Models for Classifying Startups Into the UN’s 17 Sustainable Development Goals},
	author={Bar, Kfir},
	booktitle={Proceedings of the 31st International Joint Conference on Artificial Intelligence (IJCAI)},
	year={2022}
	}


	You may also wish to reference the accompanying repository:
	https://github.com/Amannor/sdg-codebase

	---

	## License

	This model is released under the MIT License. For more information, see the LICENSE file in this repository.

	---

	## Links and Resources

	- [Full repository with code, notebooks, and datasets](https://github.com/Amannor/sdg-codebase)
	- [IJCAI 2022 original paper PDF](https://github.com/Amannor/sdg-codebase/blob/master/articles/IJCAI_2022_SDGs_Methodology.pdf)

	---

	For questions or issues, please open an issue in the GitHub repository or contact the maintainer via Hugging Face.