thomasdehaene commited on
Commit
9088957
·
1 Parent(s): 8b705d4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -0
README.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - nl
4
+ tags:
5
+ - text-classification
6
+ - pytorch
7
+ license: apache-2.0
8
+ metrics:
9
+ - Accuracy, F1 Score, Recall, Precision
10
+ ---
11
+ # RobBERT-dutch-base-toxic-comments
12
+
13
+ ## Model description:
14
+ This model was created with the purpose to detect toxic or potentially harmful comments.
15
+
16
+ For this model, we finetuned a dutch RobBerta-based model called [RobBERT](https://huggingface.co/pdelobelle/robbert-v2-dutch-base) on the translated [Jigsaw Toxicity dataset](https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge).
17
+
18
+ The original dataset was translated using the appropriate [MariantMT model](https://huggingface.co/Helsinki-NLP/opus-mt-en-nl).
19
+
20
+ The model was trained for 2 epochs, on 90% of the dataset, with the following arguments:
21
+ ```
22
+ training_args = TrainingArguments(
23
+ learning_rate=1e-5,
24
+ per_device_train_batch_size=8,
25
+ per_device_eval_batch_size=8,
26
+ gradient_accumulation_steps=6,
27
+ load_best_model_at_end=True,
28
+ metric_for_best_model="recall",
29
+ epochs=2,
30
+ evaluation_strategy="steps",
31
+ save_strategy="steps",
32
+ save_total_limit=10,
33
+ logging_steps=100,
34
+ eval_steps=250,
35
+ save_steps=250,
36
+ weight_decay=0.001,
37
+ report_to="wandb")
38
+ ```
39
+
40
+ ## Model Performance:
41
+
42
+ Model evaluation was done on 1/10th of the dataset, which served as the test dataset.
43
+
44
+ | Accuracy | F1 Score | Recall | Precision |
45
+ | --- | --- | --- | --- |
46
+ | 94.52 | 65.42 | 75.50| 57.71 |
47
+
48
+ ## Dataset:
49
+ We will soon open-source the dataset as well.