| <div align="center"> | |
| **⚠️ Disclaimer:** | |
| The huggingface models currently give different results to the detoxify library (see issue [here](https://github.com/unitaryai/detoxify/issues/15)). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify | |
| # 🙊 Detoxify | |
| ## Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers | |
|  | |
|  | |
| </div> | |
|  | |
| ## Description | |
| Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification. | |
| Built by [Laura Hanu](https://laurahanu.github.io/) at [Unitary](https://www.unitary.ai/), where we are working to stop harmful content online by interpreting visual content in context. | |
| Dependencies: | |
| - For inference: | |
| - 🤗 Transformers | |
| - ⚡ Pytorch lightning | |
| - For training will also need: | |
| - Kaggle API (to download data) | |
| | Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score | |
| |-|-|-|-|-|-|-| | |
| | [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636 | |
| | [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93639 | |
| | [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655* | |
| *Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available. | |
| It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use. | |
| ## Limitations and ethical considerations | |
| If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups. | |
| The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker. | |
| Some useful resources about the risk of different biases in toxicity or hate speech detection are: | |
| - [The Risk of Racial Bias in Hate Speech Detection](https://homes.cs.washington.edu/~msap/pdfs/sap2019risk.pdf) | |
| - [Automated Hate Speech Detection and the Problem of Offensive Language](https://arxiv.org/pdf/1703.04009.pdf%201.pdf) | |
| - [Racial Bias in Hate Speech and Abusive Language Detection Datasets](https://arxiv.org/pdf/1905.12516.pdf) | |
| ## Quick prediction | |
| The `multilingual` model has been trained on 7 different languages so it should only be tested on: `english`, `french`, `spanish`, `italian`, `portuguese`, `turkish` or `russian`. | |
| ```bash | |
| # install detoxify | |
| pip install detoxify | |
| ``` | |
| ```python | |
| from detoxify import Detoxify | |
| # each model takes in either a string or a list of strings | |
| results = Detoxify('original').predict('example text') | |
| results = Detoxify('unbiased').predict(['example text 1','example text 2']) | |
| results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста']) | |
| # optional to display results nicely (will need to pip install pandas) | |
| import pandas as pd | |
| print(pd.DataFrame(results, index=input_text).round(5)) | |
| ``` | |
| For more details check the Prediction section. | |
| ## Labels | |
| All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema: | |
| - **Very Toxic** (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective) | |
| - **Toxic** (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective) | |
| - **Hard to Say** | |
| - **Not Toxic** | |
| More information about the labelling schema can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data). | |
| ### Toxic Comment Classification Challenge | |
| This challenge includes the following labels: | |
| - `toxic` | |
| - `severe_toxic` | |
| - `obscene` | |
| - `threat` | |
| - `insult` | |
| - `identity_hate` | |
| ### Jigsaw Unintended Bias in Toxicity Classification | |
| This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments. | |
| Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation. | |
| - `toxicity` | |
| - `severe_toxicity` | |
| - `obscene` | |
| - `threat` | |
| - `insult` | |
| - `identity_attack` | |
| - `sexual_explicit` | |
| Identity labels used: | |
| - `male` | |
| - `female` | |
| - `homosexual_gay_or_lesbian` | |
| - `christian` | |
| - `jewish` | |
| - `muslim` | |
| - `black` | |
| - `white` | |
| - `psychiatric_or_mental_illness` | |
| A complete list of all the identity labels available can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data). | |
| ### Jigsaw Multilingual Toxic Comment Classification | |
| Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on: | |
| - `toxicity` | |
| ## How to run | |
| First, install dependencies | |
| ```bash | |
| # clone project | |
| git clone https://github.com/unitaryai/detoxify | |
| # create virtual env | |
| python3 -m venv toxic-env | |
| source toxic-env/bin/activate | |
| # install project | |
| pip install -e detoxify | |
| cd detoxify | |
| # for training | |
| pip install -r requirements.txt | |
| ``` | |
| ## Prediction | |
| Trained models summary: | |
| |Model name| Transformer type| Data from | |
| |:--:|:--:|:--:| | |
| |`original`| `bert-base-uncased` | Toxic Comment Classification Challenge | |
| |`unbiased`| `roberta-base`| Unintended Bias in Toxicity Classification | |
| |`multilingual`| `xlm-roberta-base`| Multilingual Toxic Comment Classification | |
| For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments. | |
| ```bash | |
| # load model via torch.hub | |
| python run_prediction.py --input 'example' --model_name original | |
| # load model from from checkpoint path | |
| python run_prediction.py --input 'example' --from_ckpt_path model_path | |
| # save results to a .csv file | |
| python run_prediction.py --input test_set.txt --model_name original --save_to results.csv | |
| # to see usage | |
| python run_prediction.py --help | |
| ``` | |
| Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names: | |
| - `toxic_bert` | |
| - `unbiased_toxic_roberta` | |
| - `multilingual_toxic_xlm_r` | |
| ```bash | |
| model = torch.hub.load('unitaryai/detoxify','toxic_bert') | |
| ``` | |
| Importing detoxify in python: | |
| ```python | |
| from detoxify import Detoxify | |
| results = Detoxify('original').predict('some text') | |
| results = Detoxify('unbiased').predict(['example text 1','example text 2']) | |
| results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста']) | |
| # to display results nicely | |
| import pandas as pd | |
| print(pd.DataFrame(results,index=input_text).round(5)) | |
| ``` | |
| ## Training | |
| If you do not already have a Kaggle account: | |
| - you need to create one to be able to download the data | |
| - go to My Account and click on Create New API Token - this will download a kaggle.json file | |
| - make sure this file is located in ~/.kaggle | |
| ```bash | |
| # create data directory | |
| mkdir jigsaw_data | |
| cd jigsaw_data | |
| # download data | |
| kaggle competitions download -c jigsaw-toxic-comment-classification-challenge | |
| kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification | |
| kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification | |
| ``` | |
| ## Start Training | |
| ### Toxic Comment Classification Challenge | |
| ```bash | |
| python create_val_set.py | |
| python train.py --config configs/Toxic_comment_classification_BERT.json | |
| ``` | |
| ### Unintended Bias in Toxicicity Challenge | |
| ```bash | |
| python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json | |
| ``` | |
| ### Multilingual Toxic Comment Classification | |
| This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge. | |
| The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set). | |
| ```bash | |
| # stage 1 | |
| python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json | |
| # stage 2 | |
| python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json | |
| ``` | |
| ### Monitor progress with tensorboard | |
| ```bash | |
| tensorboard --logdir=./saved | |
| ``` | |
| ## Model Evaluation | |
| ### Toxic Comment Classification Challenge | |
| This challenge is evaluated on the mean AUC score of all the labels. | |
| ```bash | |
| python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv | |
| ``` | |
| ### Unintended Bias in Toxicicity Challenge | |
| This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview/evaluation). | |
| ```bash | |
| python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv | |
| # to get the final bias metric | |
| python model_eval/compute_bias_metric.py | |
| ``` | |
| ### Multilingual Toxic Comment Classification | |
| This challenge is evaluated on the AUC score of the main toxic label. | |
| ```bash | |
| python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv | |
| ``` | |
| ### Citation | |
| ``` | |
| @misc{Detoxify, | |
| title={Detoxify}, | |
| author={Hanu, Laura and {Unitary team}}, | |
| howpublished={Github. https://github.com/unitaryai/detoxify}, | |
| year={2020} | |
| } | |
| ``` | |