Commit ·
c191b19
1
Parent(s): b23ba8b
Update README.md
Browse files
README.md
CHANGED
|
@@ -29,7 +29,7 @@ You can apply the pipeline on a data set.
|
|
| 29 |
```python
|
| 30 |
|
| 31 |
df['result'] = df['comment_text'].apply(lambda x: pipeline(x[:512])) #Cuts after max. legth of tokens for this model, which is 512 for this model.
|
| 32 |
-
#
|
| 33 |
df['toxic_label'] = df['result'].str[0].str['label']
|
| 34 |
df['score'] = df['result'].str[0].str['score']
|
| 35 |
```
|
|
@@ -45,7 +45,7 @@ As toxic, we defined comments that are inappropriate in whole or in part. By ina
|
|
| 45 |
**Language model:** bert-base-cased (~ 12GB)
|
| 46 |
**Language:** German
|
| 47 |
**Labels:** Toxicity (binary classification)
|
| 48 |
-
**Training data:** User comments posted to
|
| 49 |
**Labeling procedure:** Crowd annotation
|
| 50 |
**Batch size:** 32
|
| 51 |
**Epochs:** 4
|
|
|
|
| 29 |
```python
|
| 30 |
|
| 31 |
df['result'] = df['comment_text'].apply(lambda x: pipeline(x[:512])) #Cuts after max. legth of tokens for this model, which is 512 for this model.
|
| 32 |
+
# Afterwards, you can make two new columns out of the column "result", one including the label, one including the score.
|
| 33 |
df['toxic_label'] = df['result'].str[0].str['label']
|
| 34 |
df['score'] = df['result'].str[0].str['score']
|
| 35 |
```
|
|
|
|
| 45 |
**Language model:** bert-base-cased (~ 12GB)
|
| 46 |
**Language:** German
|
| 47 |
**Labels:** Toxicity (binary classification)
|
| 48 |
+
**Training data:** User comments posted to websites and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
|
| 49 |
**Labeling procedure:** Crowd annotation
|
| 50 |
**Batch size:** 32
|
| 51 |
**Epochs:** 4
|