Commit ·
281ae55
1
Parent(s): 1024a95
Update README.md
Browse files
README.md
CHANGED
|
@@ -25,17 +25,18 @@ print(pipeline('du bist blöd.'))
|
|
| 25 |
```
|
| 26 |
|
| 27 |
|
| 28 |
-
## Training
|
| 29 |
|
| 30 |
The pre-trained model [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) was fine-tuned on a crowd-annotated data set of over 14,000 user comments that has been labeled for toxicity in a binary classification task.
|
| 31 |
|
| 32 |
As toxic, we defined comments that are inappropriate in whole or in part. By inappropriate, we mean comments that are rude, insulting, hateful, or otherwise make users feel disrespected.
|
| 33 |
|
| 34 |
-
## Training procedure
|
| 35 |
|
| 36 |
**Language model:** bert-base-cased (~ 12GB)
|
| 37 |
**Language:** German
|
|
|
|
| 38 |
**Training data:** User comments posted to webistes and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
|
|
|
|
| 39 |
**Batch size:** 32
|
| 40 |
**Epochs:** 4
|
| 41 |
**Max. tokens length:** 512
|
|
|
|
| 25 |
```
|
| 26 |
|
| 27 |
|
| 28 |
+
## Training
|
| 29 |
|
| 30 |
The pre-trained model [bert-base-german-cased model by deepset](https://huggingface.co/bert-base-german-cased) was fine-tuned on a crowd-annotated data set of over 14,000 user comments that has been labeled for toxicity in a binary classification task.
|
| 31 |
|
| 32 |
As toxic, we defined comments that are inappropriate in whole or in part. By inappropriate, we mean comments that are rude, insulting, hateful, or otherwise make users feel disrespected.
|
| 33 |
|
|
|
|
| 34 |
|
| 35 |
**Language model:** bert-base-cased (~ 12GB)
|
| 36 |
**Language:** German
|
| 37 |
+
**Labels:** Toxicity (binary classification)
|
| 38 |
**Training data:** User comments posted to webistes and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
|
| 39 |
+
**Labeling procedure:** Crowd annotation
|
| 40 |
**Batch size:** 32
|
| 41 |
**Epochs:** 4
|
| 42 |
**Max. tokens length:** 512
|