duddaladeepak commited on
Commit
7f69c3b
·
1 Parent(s): fe20226

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +329 -0
README.md CHANGED
@@ -8,3 +8,332 @@ widget:
8
  - text: ["Is this review positive or negative? Review: Best cast iron skillet you will every buy."]
9
 
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - text: ["Is this review positive or negative? Review: Best cast iron skillet you will every buy."]
9
 
10
  ---
11
+
12
+
13
+
14
+ <div align="center">
15
+
16
+ **⚠️ Disclaimer:**
17
+ The huggingface models currently give different results to the detoxify library (see issue [here](https://github.com/unitaryai/detoxify/issues/15)). For the most up to date models we recommend using the models from https://github.com/unitaryai/detoxify
18
+
19
+ # 🙊 Detoxify
20
+ ## Toxic Comment Classification with ⚡ Pytorch Lightning and 🤗 Transformers
21
+
22
+ ![CI testing](https://github.com/unitaryai/detoxify/workflows/CI%20testing/badge.svg)
23
+ ![Lint](https://github.com/unitaryai/detoxify/workflows/Lint/badge.svg)
24
+
25
+ </div>
26
+
27
+ ![Examples image](examples.png)
28
+
29
+ ## Description
30
+
31
+ Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.
32
+
33
+ Built by [Laura Hanu](https://laurahanu.github.io/) at [Unitary](https://www.unitary.ai/), where we are working to stop harmful content online by interpreting visual content in context.
34
+
35
+ Dependencies:
36
+ - For inference:
37
+ - 🤗 Transformers
38
+ - ⚡ Pytorch lightning
39
+ - For training will also need:
40
+ - Kaggle API (to download data)
41
+
42
+
43
+ | Challenge | Year | Goal | Original Data Source | Detoxify Model Name | Top Kaggle Leaderboard Score | Detoxify Score
44
+ |-|-|-|-|-|-|-|
45
+ | [Toxic Comment Classification Challenge](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge) | 2018 | build a multi-headed model that’s capable of detecting different types of of toxicity like threats, obscenity, insults, and identity-based hate. | Wikipedia Comments | `original` | 0.98856 | 0.98636
46
+ | [Jigsaw Unintended Bias in Toxicity Classification](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification) | 2019 | build a model that recognizes toxicity and minimizes this type of unintended bias with respect to mentions of identities. You'll be using a dataset labeled for identity mentions and optimizing a metric designed to measure unintended bias. | Civil Comments | `unbiased` | 0.94734 | 0.93639
47
+ | [Jigsaw Multilingual Toxic Comment Classification](https://www.kaggle.com/c/jigsaw-multilingual-toxic-comment-classification) | 2020 | build effective multilingual models | Wikipedia Comments + Civil Comments | `multilingual` | 0.9536 | 0.91655*
48
+
49
+ *Score not directly comparable since it is obtained on the validation set provided and not on the test set. To update when the test labels are made available.
50
+
51
+ It is also noteworthy to mention that the top leadearboard scores have been achieved using model ensembles. The purpose of this library was to build something user-friendly and straightforward to use.
52
+
53
+ ## Limitations and ethical considerations
54
+
55
+ If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the tone or the intent of the author e.g. humorous/self-deprecating. This could present some biases towards already vulnerable minority groups.
56
+
57
+ The intended use of this library is for research purposes, fine-tuning on carefully constructed datasets that reflect real world demographics and/or to aid content moderators in flagging out harmful content quicker.
58
+
59
+ Some useful resources about the risk of different biases in toxicity or hate speech detection are:
60
+ - [The Risk of Racial Bias in Hate Speech Detection](https://homes.cs.washington.edu/~msap/pdfs/sap2019risk.pdf)
61
+ - [Automated Hate Speech Detection and the Problem of Offensive Language](https://arxiv.org/pdf/1703.04009.pdf%201.pdf)
62
+ - [Racial Bias in Hate Speech and Abusive Language Detection Datasets](https://arxiv.org/pdf/1905.12516.pdf)
63
+
64
+ ## Quick prediction
65
+
66
+
67
+ The `multilingual` model has been trained on 7 different languages so it should only be tested on: `english`, `french`, `spanish`, `italian`, `portuguese`, `turkish` or `russian`.
68
+
69
+ ```bash
70
+ # install detoxify
71
+
72
+ pip install detoxify
73
+
74
+ ```
75
+ ```python
76
+
77
+ from detoxify import Detoxify
78
+
79
+ # each model takes in either a string or a list of strings
80
+
81
+ results = Detoxify('original').predict('example text')
82
+
83
+ results = Detoxify('unbiased').predict(['example text 1','example text 2'])
84
+
85
+ results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
86
+
87
+ # optional to display results nicely (will need to pip install pandas)
88
+
89
+ import pandas as pd
90
+
91
+ print(pd.DataFrame(results, index=input_text).round(5))
92
+
93
+ ```
94
+ For more details check the Prediction section.
95
+
96
+
97
+ ## Labels
98
+ All challenges have a toxicity label. The toxicity labels represent the aggregate ratings of up to 10 annotators according the following schema:
99
+ - **Very Toxic** (a very hateful, aggressive, or disrespectful comment that is very likely to make you leave a discussion or give up on sharing your perspective)
100
+ - **Toxic** (a rude, disrespectful, or unreasonable comment that is somewhat likely to make you leave a discussion or give up on sharing your perspective)
101
+ - **Hard to Say**
102
+ - **Not Toxic**
103
+
104
+ More information about the labelling schema can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
105
+
106
+ ### Toxic Comment Classification Challenge
107
+ This challenge includes the following labels:
108
+
109
+ - `toxic`
110
+ - `severe_toxic`
111
+ - `obscene`
112
+ - `threat`
113
+ - `insult`
114
+ - `identity_hate`
115
+
116
+ ### Jigsaw Unintended Bias in Toxicity Classification
117
+ This challenge has 2 types of labels: the main toxicity labels and some additional identity labels that represent the identities mentioned in the comments.
118
+
119
+ Only identities with more than 500 examples in the test set (combined public and private) are included during training as additional labels and in the evaluation calculation.
120
+
121
+ - `toxicity`
122
+ - `severe_toxicity`
123
+ - `obscene`
124
+ - `threat`
125
+ - `insult`
126
+ - `identity_attack`
127
+ - `sexual_explicit`
128
+
129
+ Identity labels used:
130
+ - `male`
131
+ - `female`
132
+ - `homosexual_gay_or_lesbian`
133
+ - `christian`
134
+ - `jewish`
135
+ - `muslim`
136
+ - `black`
137
+ - `white`
138
+ - `psychiatric_or_mental_illness`
139
+
140
+ A complete list of all the identity labels available can be found [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/data).
141
+
142
+
143
+ ### Jigsaw Multilingual Toxic Comment Classification
144
+
145
+ Since this challenge combines the data from the previous 2 challenges, it includes all labels from above, however the final evaluation is only on:
146
+
147
+ - `toxicity`
148
+
149
+ ## How to run
150
+
151
+ First, install dependencies
152
+ ```bash
153
+ # clone project
154
+
155
+ git clone https://github.com/unitaryai/detoxify
156
+
157
+ # create virtual env
158
+
159
+ python3 -m venv toxic-env
160
+ source toxic-env/bin/activate
161
+
162
+ # install project
163
+
164
+ pip install -e detoxify
165
+ cd detoxify
166
+
167
+ # for training
168
+ pip install -r requirements.txt
169
+
170
+ ```
171
+
172
+ ## Prediction
173
+
174
+ Trained models summary:
175
+
176
+ |Model name| Transformer type| Data from
177
+ |:--:|:--:|:--:|
178
+ |`original`| `bert-base-uncased` | Toxic Comment Classification Challenge
179
+ |`unbiased`| `roberta-base`| Unintended Bias in Toxicity Classification
180
+ |`multilingual`| `xlm-roberta-base`| Multilingual Toxic Comment Classification
181
+
182
+ For a quick prediction can run the example script on a comment directly or from a txt containing a list of comments.
183
+ ```bash
184
+
185
+ # load model via torch.hub
186
+
187
+ python run_prediction.py --input 'example' --model_name original
188
+
189
+ # load model from from checkpoint path
190
+
191
+ python run_prediction.py --input 'example' --from_ckpt_path model_path
192
+
193
+ # save results to a .csv file
194
+
195
+ python run_prediction.py --input test_set.txt --model_name original --save_to results.csv
196
+
197
+ # to see usage
198
+
199
+ python run_prediction.py --help
200
+
201
+ ```
202
+
203
+ Checkpoints can be downloaded from the latest release or via the Pytorch hub API with the following names:
204
+ - `toxic_bert`
205
+ - `unbiased_toxic_roberta`
206
+ - `multilingual_toxic_xlm_r`
207
+ ```bash
208
+ model = torch.hub.load('unitaryai/detoxify','toxic_bert')
209
+ ```
210
+
211
+ Importing detoxify in python:
212
+
213
+ ```python
214
+
215
+ from detoxify import Detoxify
216
+
217
+ results = Detoxify('original').predict('some text')
218
+
219
+ results = Detoxify('unbiased').predict(['example text 1','example text 2'])
220
+
221
+ results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])
222
+
223
+ # to display results nicely
224
+
225
+ import pandas as pd
226
+
227
+ print(pd.DataFrame(results,index=input_text).round(5))
228
+
229
+ ```
230
+
231
+
232
+ ## Training
233
+
234
+ If you do not already have a Kaggle account:
235
+ - you need to create one to be able to download the data
236
+
237
+ - go to My Account and click on Create New API Token - this will download a kaggle.json file
238
+
239
+ - make sure this file is located in ~/.kaggle
240
+
241
+ ```bash
242
+
243
+ # create data directory
244
+
245
+ mkdir jigsaw_data
246
+ cd jigsaw_data
247
+
248
+ # download data
249
+
250
+ kaggle competitions download -c jigsaw-toxic-comment-classification-challenge
251
+
252
+ kaggle competitions download -c jigsaw-unintended-bias-in-toxicity-classification
253
+
254
+ kaggle competitions download -c jigsaw-multilingual-toxic-comment-classification
255
+
256
+ ```
257
+ ## Start Training
258
+ ### Toxic Comment Classification Challenge
259
+
260
+ ```bash
261
+
262
+ python create_val_set.py
263
+
264
+ python train.py --config configs/Toxic_comment_classification_BERT.json
265
+ ```
266
+ ### Unintended Bias in Toxicicity Challenge
267
+
268
+ ```bash
269
+
270
+ python train.py --config configs/Unintended_bias_toxic_comment_classification_RoBERTa.json
271
+
272
+ ```
273
+ ### Multilingual Toxic Comment Classification
274
+
275
+ This is trained in 2 stages. First, train on all available data, and second, train only on the translated versions of the first challenge.
276
+
277
+ The [translated data](https://www.kaggle.com/miklgr500/jigsaw-train-multilingual-coments-google-api) can be downloaded from Kaggle in french, spanish, italian, portuguese, turkish, and russian (the languages available in the test set).
278
+
279
+ ```bash
280
+
281
+ # stage 1
282
+
283
+ python train.py --config configs/Multilingual_toxic_comment_classification_XLMR.json
284
+
285
+ # stage 2
286
+
287
+ python train.py --config configs/Multilingual_toxic_comment_classification_XLMR_stage2.json
288
+
289
+ ```
290
+ ### Monitor progress with tensorboard
291
+
292
+ ```bash
293
+
294
+ tensorboard --logdir=./saved
295
+
296
+ ```
297
+ ## Model Evaluation
298
+
299
+ ### Toxic Comment Classification Challenge
300
+
301
+ This challenge is evaluated on the mean AUC score of all the labels.
302
+
303
+ ```bash
304
+
305
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
306
+
307
+ ```
308
+ ### Unintended Bias in Toxicicity Challenge
309
+
310
+ This challenge is evaluated on a novel bias metric that combines different AUC scores to balance overall performance. More information on this metric [here](https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification/overview/evaluation).
311
+
312
+ ```bash
313
+
314
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
315
+
316
+ # to get the final bias metric
317
+ python model_eval/compute_bias_metric.py
318
+
319
+ ```
320
+ ### Multilingual Toxic Comment Classification
321
+
322
+ This challenge is evaluated on the AUC score of the main toxic label.
323
+
324
+ ```bash
325
+
326
+ python evaluate.py --checkpoint saved/lightning_logs/checkpoints/example_checkpoint.pth --test_csv test.csv
327
+
328
+ ```
329
+
330
+ ### Citation
331
+ ```
332
+ @misc{Detoxify,
333
+ title={Detoxify},
334
+ author={Hanu, Laura and {Unitary team}},
335
+ howpublished={Github. https://github.com/unitaryai/detoxify},
336
+ year={2020}
337
+ }
338
+ ```
339
+