Update src/about.py
Browse files- src/about.py +1 -0
src/about.py
CHANGED
|
@@ -93,6 +93,7 @@ And here find all the translated benchmarks provided by the Language evaluation
|
|
| 93 |
|
| 94 |
|
| 95 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
|
|
|
| 96 |
Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
|
| 97 |
|
| 98 |
|
|
|
|
| 93 |
|
| 94 |
|
| 95 |
To ensure a fair and unbiased assessment of the models' true capabilities, all evaluations are conducted in zero-shot settings `0-shots`. This approach eliminates any potential advantage from task-specific fine-tuning, providing a clear indication of how well the models can generalize to new tasks.
|
| 96 |
+
|
| 97 |
Also, given the nature of the tasks, which include multiple-choice and yes/no questions, the leaderboard primarily uses normalized log likelihood accuracy `loglikelihood_acc_norm` for all tasks. This metric was chosen for its ability to provide a clear and fair measurement of model performance across different types of questions.
|
| 98 |
|
| 99 |
|