EIStakovskii commited on
Commit
bfc16b7
·
1 Parent(s): bebd17b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -18,6 +18,6 @@ This model was trained for toxicity labeling. Label_1 means TOXIC, Label_0 means
18
 
19
  The model was fine-tuned based off the already existing sentiment classifier oliverguhr/german-sentiment-bert . The aforementioned classifier performed poorly (44% accuracy on my test sample), so I trained the current toxicity classifier. It was noted that the same performance achieved training on the https://huggingface.co/dbmdz/bert-base-german-cased
20
 
21
- The accuracy is 91% on the test split during training and 83% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
22
 
23
- The model was finetuned on 37k sentences. The train data was the translations of the english data (around 30k sentences) from https://github.com/s-nlp/multilingual_detox with https://huggingface.co/Helsinki-NLP/opus-mt-en-de and semi-manually collected data (around 7 k) by crawling https://www.dict.cc/ and https://context.reverso.net/translation/ websites
 
18
 
19
  The model was fine-tuned based off the already existing sentiment classifier oliverguhr/german-sentiment-bert . The aforementioned classifier performed poorly (44% accuracy on my test sample), so I trained the current toxicity classifier. It was noted that the same performance achieved training on the https://huggingface.co/dbmdz/bert-base-german-cased
20
 
21
+ The accuracy is 91% on the test split during training and 83% on a manually picked (and thus harder) sample of 200 sentences (100 label 1, 100 label 0) at the end of the training.
22
 
23
+ The model was finetuned on 37k sentences. The train data was the translations of the English data (around 30k sentences) ffrom [the multilingual_detox dataset](https://github.com/s-nlp/multilingual_detox) by [Skolkovo Institute](https://huggingface.co/SkolkovoInstitute) using [the opus-mt-en-de translation model](https://huggingface.co/Helsinki-NLP/opus-mt-en-de) by [Helsinki-NLP](https://huggingface.co/Helsinki-NLP) and semi-manually collected data (around 7 k) by crawling [the dict.cc web dictionary](https://www.dict.cc/) and [the Reverso Context](https://context.reverso.net/translation/).