| | --- |
| | language: |
| | - en |
| |
|
| | --- |
| | # Model Card for ESG-BERT |
| | Domain Specific BERT Model for Text Mining in Sustainable Investing |
| | |
| | |
| | |
| | # Model Details |
| | |
| | ## Model Description |
| | |
| | |
| | |
| | - **Developed by:** [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) |
| | - **Shared by [Optional]:** HuggingFace |
| | - **Model type:** Language model |
| | - **Language(s) (NLP):** en |
| | - **License:** More information needed |
| | - **Related Models:** |
| | - **Parent Model:** BERT |
| | - **Resources for more information:** |
| | - [GitHub Repo](https://github.com/mukut03/ESG-BERT) |
| | - [Blog Post](https://towardsdatascience.com/nlp-meets-sustainable-investing-d0542b3c264b?source=friends_link&sk=1f7e6641c3378aaff319a81decf387bf) |
| | |
| | # Uses |
| | |
| | |
| | ## Direct Use |
| | |
| | Text Mining in Sustainable Investing |
| | |
| | ## Downstream Use [Optional] |
| | |
| | The applications of ESG-BERT can be expanded way beyond just text classification. It can be fine-tuned to perform various other downstream NLP tasks in the domain of Sustainable Investing. |
| | |
| | ## Out-of-Scope Use |
| | |
| | The model should not be used to intentionally create hostile or alienating environments for people. |
| | # Bias, Risks, and Limitations |
| | |
| | |
| | Significant research has explored bias and fairness issues with language models (see, e.g., [Sheng et al. (2021)](https://aclanthology.org/2021.acl-long.330.pdf) and [Bender et al. (2021)](https://dl.acm.org/doi/pdf/10.1145/3442188.3445922)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups. |
| | |
| | |
| | ## Recommendations |
| | |
| | |
| | Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recomendations. |
| | |
| | |
| | # Training Details |
| | |
| | ## Training Data |
| | |
| | More information needed |
| | |
| | ## Training Procedure |
| | |
| | <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> |
| | |
| | ### Preprocessing |
| | |
| | More information needed |
| | |
| | ### Speeds, Sizes, Times |
| | |
| | More information needed |
| | |
| | # Evaluation |
| | |
| | |
| | |
| | ## Testing Data, Factors & Metrics |
| | |
| | ### Testing Data |
| | |
| | The fine-tuned model for text classification is also available [here](https://drive.google.com/drive/folders/1Qz4HP3xkjLfJ6DGCFNeJ7GmcPq65_HVe?usp=sharing). It can be used directly to make predictions using just a few steps. First, download the fine-tuned pytorch_model.bin, config.json, and vocab.txt |
| | |
| | ### Factors |
| | |
| | More information needed |
| | |
| | ### Metrics |
| | |
| | More information needed |
| | |
| | ## Results |
| | |
| | ESG-BERT was further trained on unstructured text data with accuracies of 100% and 98% for Next Sentence Prediction and Masked Language Modelling tasks. Fine-tuning ESG-BERT for text classification yielded an F-1 score of 0.90. For comparison, the general BERT (BERT-base) model scored 0.79 after fine-tuning, and the sci-kit learn approach scored 0.67. |
| | |
| | # Model Examination |
| | |
| | More information needed |
| | |
| | # Environmental Impact |
| | |
| | |
| | Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). |
| | |
| | - **Hardware Type:** More information needed |
| | - **Hours used:** More information needed |
| | - **Cloud Provider:** information needed |
| | - **Compute Region:** More information needed |
| | - **Carbon Emitted:** More information needed |
| | |
| | # Technical Specifications [optional] |
| | |
| | ## Model Architecture and Objective |
| | |
| | More information needed |
| | |
| | ## Compute Infrastructure |
| | |
| | More information needed |
| | |
| | ### Hardware |
| | |
| | More information needed |
| | |
| | ### Software |
| | |
| | JDK 11 is needed to serve the model |
| | |
| | # Citation |
| | |
| | <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
| | |
| | **BibTeX:** |
| | |
| | More information needed |
| | |
| | **APA:** |
| | |
| | More information needed |
| | |
| | # Glossary [optional] |
| | |
| | <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> |
| | |
| | More information needed |
| | |
| | # More Information [optional] |
| | |
| | More information needed |
| | |
| | # Model Card Authors [optional] |
| | [Charan Pothireddi](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/) and [Parabole.ai](https://www.linkedin.com/in/sree-charan-pothireddi-6a0a3587/), in collaboration with the Ezi Ozoani and the HuggingFace Team |
| | |
| | |
| | # Model Card Contact |
| | |
| | More information needed |
| | |
| | # How to Get Started with the Model |
| | |
| | Use the code below to get started with the model. |
| | |
| | <details> |
| | <summary> Click to expand </summary> |
| | |
| | ``` |
| | pip install torchserve torch-model-archiver |
| | |
| | pip install torchvision |
| | |
| | pip install transformers |
| | |
| | ``` |
| | |
| | Next up, we'll set up the handler script. It is a basic handler for text classification that can be improved upon. Save this script as "handler.py" in your directory. [1] |
| | |
| | ``` |
| | |
| | from abc import ABC |
| | |
| | import json |
| | |
| | import logging |
| | |
| | import os |
| | |
| | import torch |
| | |
| | from transformers import AutoModelForSequenceClassification, AutoTokenizer |
| | |
| | from ts.torch_handler.base_handler import BaseHandler |
| | |
| | logger = logging.getLogger(__name__) |
| | |
| | class TransformersClassifierHandler(BaseHandler, ABC): |
| | |
| | """ |
| | |
| | Transformers text classifier handler class. This handler takes a text (string) and |
| | |
| | as input and returns the classification text based on the serialized transformers checkpoint. |
| | |
| | """ |
| | |
| | def __init__(self): |
| | |
| | super(TransformersClassifierHandler, self).__init__() |
| | |
| | self.initialized = False |
| | |
| | def initialize(self, ctx): |
| | |
| | self.manifest = ctx.manifest |
| | |
| | properties = ctx.system_properties |
| | |
| | model_dir = properties.get("model_dir") |
| | |
| | self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu") |
| | |
| | # Read model serialize/pt file |
| | |
| | self.model = AutoModelForSequenceClassification.from_pretrained(model_dir) |
| | |
| | self.tokenizer = AutoTokenizer.from_pretrained(model_dir) |
| | |
| | self.model.to(self.device) |
| | |
| | self.model.eval() |
| | |
| | logger.debug('Transformer model from path {0} loaded successfully'.format(model_dir)) |
| | |
| | # Read the mapping file, index to object name |
| | |
| | mapping_file_path = os.path.join(model_dir, "index_to_name.json") |
| | |
| | if os.path.isfile(mapping_file_path): |
| | |
| | with open(mapping_file_path) as f: |
| | |
| | self.mapping = json.load(f) |
| | |
| | else: |
| | |
| | logger.warning('Missing the index_to_name.json file. Inference output will not include class name.') |
| | |
| | self.initialized = True |
| | |
| | def preprocess(self, data): |
| | |
| | """ Very basic preprocessing code - only tokenizes. |
| | |
| | Extend with your own preprocessing steps as needed. |
| | |
| | """ |
| | |
| | text = data[0].get("data") |
| | |
| | if text is None: |
| | |
| | text = data[0].get("body") |
| | |
| | sentences = text.decode('utf-8') |
| | |
| | logger.info("Received text: '%s'", sentences) |
| | |
| | inputs = self.tokenizer.encode_plus( |
| | |
| | sentences, |
| | |
| | add_special_tokens=True, |
| | |
| | return_tensors="pt" |
| | |
| | ) |
| | |
| | return inputs |
| | |
| | def inference(self, inputs): |
| | |
| | """ |
| | |
| | Predict the class of a text using a trained transformer model. |
| | |
| | """ |
| | |
| | # NOTE: This makes the assumption that your model expects text to be tokenized |
| | |
| | # with "input_ids" and "token_type_ids" - which is true for some popular transformer models, e.g. bert. |
| | |
| | # If your transformer model expects different tokenization, adapt this code to suit |
| | |
| | # its expected input format. |
| | |
| | prediction = self.model( |
| | |
| | inputs['input_ids'].to(self.device), |
| | |
| | token_type_ids=inputs['token_type_ids'].to(self.device) |
| | |
| | )[0].argmax().item() |
| | |
| | logger.info("Model predicted: '%s'", prediction) |
| | |
| | if self.mapping: |
| | |
| | prediction = self.mapping[str(prediction)] |
| | |
| | return [prediction] |
| | |
| | def postprocess(self, inference_output): |
| | |
| | # TODO: Add any needed post-processing of the model predictions here |
| | |
| | return inference_output |
| | |
| | _service = TransformersClassifierHandler() |
| | |
| | def handle(data, context): |
| | |
| | try: |
| | |
| | if not _service.initialized: |
| | |
| | _service.initialize(context) |
| | |
| | if data is None: |
| | |
| | return None |
| | |
| | data = _service.preprocess(data) |
| | |
| | data = _service.inference(data) |
| | |
| | data = _service.postprocess(data) |
| | |
| | return data |
| | |
| | except Exception as e: |
| | |
| | raise e |
| | |
| | |
| | |
| | ``` |
| | |
| | TorcheServe uses a format called MAR (Model Archive). We can convert our PyTorch model to a .mar file using this command: |
| | |
| | ``` |
| | |
| | torch-model-archiver --model-name "bert" --version 1.0 --serialized-file ./bert_model/pytorch_model.bin --extra-files "./bert_model/config.json,./bert_model/vocab.txt" --handler "./handler.py" |
| | |
| | ``` |
| | |
| | Move the .mar file into a new directory: |
| | |
| | ``` |
| | |
| | mkdir model_store && mv bert.mar model_store |
| | |
| | ``` |
| | |
| | Finally, we can start TorchServe using the command: |
| | |
| | ``` |
| | |
| | torchserve --start --model-store model_store --models bert=bert.mar |
| | |
| | ``` |
| | |
| | We can now query the model from another terminal window using the Inference API. We pass a text file containing text that the model will try to classify. |
| | |
| | |
| | |
| | |
| | ``` |
| | |
| | curl -X POST http://127.0.0.1:8080/predictions/bert -T predict.txt |
| | |
| | ``` |
| | |
| | This returns a label number which correlates to a textual label. This is stored in the label_dict.txt dictionary file. |
| | |
| | ``` |
| | |
| | __label__Business_Ethics : 0 |
| | |
| | __label__Data_Security : 1 |
| | |
| | __label__Access_And_Affordability : 2 |
| | |
| | __label__Business_Model_Resilience : 3 |
| | |
| | __label__Competitive_Behavior : 4 |
| | |
| | __label__Critical_Incident_Risk_Management : 5 |
| | |
| | __label__Customer_Welfare : 6 |
| | |
| | __label__Director_Removal : 7 |
| | |
| | __label__Employee_Engagement_Inclusion_And_Diversity : 8 |
| | |
| | __label__Employee_Health_And_Safety : 9 |
| | |
| | __label__Human_Rights_And_Community_Relations : 10 |
| | |
| | __label__Labor_Practices : 11 |
| | |
| | __label__Management_Of_Legal_And_Regulatory_Framework : 12 |
| | |
| | __label__Physical_Impacts_Of_Climate_Change : 13 |
| | |
| | __label__Product_Quality_And_Safety : 14 |
| | |
| | __label__Product_Design_And_Lifecycle_Management : 15 |
| | |
| | __label__Selling_Practices_And_Product_Labeling : 16 |
| | |
| | __label__Supply_Chain_Management : 17 |
| | |
| | __label__Systemic_Risk_Management : 18 |
| | |
| | __label__Waste_And_Hazardous_Materials_Management : 19 |
| | |
| | __label__Water_And_Wastewater_Management : 20 |
| | |
| | __label__Air_Quality : 21 |
| | |
| | __label__Customer_Privacy : 22 |
| | |
| | __label__Ecological_Impacts : 23 |
| | |
| | __label__Energy_Management : 24 |
| | |
| | __label__GHG_Emissions : 25 |
| | |
| | ``` |
| | |
| | <\details> |
| | |