ankekat1000
/

toxic-bert-german

Text Classification

Model card Files Files and versions

ankekat1000 commited on Oct 25, 2023

Commit

c191b19

·

1 Parent(s): b23ba8b

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -29,7 +29,7 @@ You can apply the pipeline on a data set.
 ```python
 df['result'] = df['comment_text'].apply(lambda x: pipeline(x[:512])) #Cuts after max. legth of tokens for this model, which is 512 for this model.
-# Make two new columns out of the column "results", one with the label, one with the score.
 df['toxic_label'] = df['result'].str[0].str['label']
 df['score'] = df['result'].str[0].str['score']
 ```
@@ -45,7 +45,7 @@ As toxic, we defined comments that are inappropriate in whole or in part. By ina
 **Language model:** bert-base-cased   (~ 12GB)
 **Language:** German
 **Labels:** Toxicity (binary classification)
-**Training data:** User comments posted to webistes and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
 **Labeling procedure:** Crowd annotation
 **Batch size:** 32
 **Epochs:** 4

 ```python
 df['result'] = df['comment_text'].apply(lambda x: pipeline(x[:512])) #Cuts after max. legth of tokens for this model, which is 512 for this model.
+# Afterwards, you can make two new columns out of the column "result", one including the label, one including the score.
 df['toxic_label'] = df['result'].str[0].str['label']
 df['score'] = df['result'].str[0].str['score']
 ```
 **Language model:** bert-base-cased   (~ 12GB)
 **Language:** German
 **Labels:** Toxicity (binary classification)
+**Training data:** User comments posted to websites and facebook pages of German news media, user comments posted to online participation platforms (~ 14,000)
 **Labeling procedure:** Crowd annotation
 **Batch size:** 32
 **Epochs:** 4