sdgBERT / README.md

sadickam

Update README.md

c30ff73 verified about 1 month ago

preview code

raw

history blame contribute delete

3.78 kB

metadata

license: mit
language:
  - en
metrics:
  - accuracy
  - matthews_correlation
widget:
  - text: >-
      Highway work zones create potential risks for both traffic and workers in
      addition to traffic congestion and delays that result in increased road
      user delay.
  - text: >-
      A circular economy is a way of achieving sustainable consumption and
      production, as well as nature positive outcomes.

sadickam/sdgBERT

sgdBERT (previously named "sdg-classification-bert"), is an NLP model for classifying text with respect to the United Nations sustainable development goals (SDG).

Source:https://www.un.org/development/desa/disabilities/about-us/sustainable-development-goals-sdgs-and-disability.html

Model Details

Model Description

This text classification model was developed by fine-tuning the bert-base-uncased pre-trained model. The training data for this fine-tuned model was sourced from the publicly available OSDG Community Dataset (OSDG-CD) Version 2023.10 at https://zenodo.org/records/8397907. This model was made as part of academic research at Deakin University. The goal was to make a transformer-based SDG text classification model that anyone could use. Only the first 16 UN SDGs supported. The primary model details are highlighted below:

Model type: Text classification
Language(s) (NLP): English
License: mit
Finetuned from model [optional]: bert-base-uncased

Model Sources

Repository: https://huggingface.co/sadickam/sdg-classification-bert
Demo: option 1 (copy/past text and csv): https://sadickam-sdg-text-classifier.hf.space/; option 2 (PDF documents): https://sadickam-document-sdg-app-cpu.hf.space

Direct Use

This is a fine-tuned model and therefore requires no further training.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("sadickam/sdg-classification-bert")
model = AutoModelForSequenceClassification.from_pretrained("sadickam/sdg-classification-bert")

Training Data

The training data includes text from a wide range of industries and academic research fields. Hence, this fine-tuned model is not for a specific industry.

See training here: https://zenodo.org/records/8397907

Training Hyperparameters

Num_epoch = 3
Learning rate = 5e-5
Batch size = 16

Evaluation

Metrics

Accuracy = 0.90
Matthews correlation = 0.89

Citation

If you use this model, please cite the journal article below. Publication year will be added as soon as the paper goes live online. DOI will remain unchanged.

Sadick, A.-M., Hasan, A. and Ahiaga-Dagbui, D.D. (Forthcoming), "Modeling sustainability discourse in the construction industry: A deep-learning approach". Journal of Construction Engineering and Management. DOI: 10.1061/JCEMD4/COENG-16205

Model Card Contact

s.sadick@deakin.edu.au