Upload benchmark_results.md with huggingface_hub
Browse files- benchmark_results.md +8 -8
benchmark_results.md
CHANGED
|
@@ -6,9 +6,9 @@ The best performing model was chosen based on the highest average F1 score acros
|
|
| 6 |
## Best Hyperparameters
|
| 7 |
|
| 8 |
```
|
| 9 |
-
learning_rate:
|
| 10 |
-
per_device_train_batch_size:
|
| 11 |
-
num_train_epochs:
|
| 12 |
weight_decay: 0.05
|
| 13 |
lr_scheduler_type: cosine
|
| 14 |
```
|
|
@@ -17,8 +17,8 @@ lr_scheduler_type: cosine
|
|
| 17 |
|
| 18 |
| dataset | accuracy | f1_score | recall | precision |
|
| 19 |
|:----------------------------------|-----------:|-----------:|---------:|------------:|
|
| 20 |
-
| BRlkl/BingoGuard-train-test-pt | 0.
|
| 21 |
-
| BRlkl/openai-moderation-eval-pt | 0.
|
| 22 |
-
| BRlkl/WildGuardTest-pt | 0.
|
| 23 |
-
| BRlkl/XSTest-pt | 0.
|
| 24 |
-
| BRlkl/toxic-chat-pt (40% holdout) | 0.
|
|
|
|
| 6 |
## Best Hyperparameters
|
| 7 |
|
| 8 |
```
|
| 9 |
+
learning_rate: 5e-05
|
| 10 |
+
per_device_train_batch_size: 32
|
| 11 |
+
num_train_epochs: 8
|
| 12 |
weight_decay: 0.05
|
| 13 |
lr_scheduler_type: cosine
|
| 14 |
```
|
|
|
|
| 17 |
|
| 18 |
| dataset | accuracy | f1_score | recall | precision |
|
| 19 |
|:----------------------------------|-----------:|-----------:|---------:|------------:|
|
| 20 |
+
| BRlkl/BingoGuard-train-test-pt | 0.897773 | 0.946133 | 0.897773 | 1 |
|
| 21 |
+
| BRlkl/openai-moderation-eval-pt | 0.69881 | 0.651994 | 0.908046 | 0.508584 |
|
| 22 |
+
| BRlkl/WildGuardTest-pt | 0.839317 | 0.810021 | 0.771883 | 0.852123 |
|
| 23 |
+
| BRlkl/XSTest-pt | 0.831111 | 0.824885 | 0.895 | 0.764957 |
|
| 24 |
+
| BRlkl/toxic-chat-pt (40% holdout) | 0.97296 | 0.836795 | 0.927632 | 0.762162 |
|