Workbench
Collection
Untested and unfinished models. Works in progress.
•
8 items
•
Updated
Experiment, can DUS be taken one or more steps further?
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 62.87 |
| AI2 Reasoning Challenge (25-Shot) | 60.92 |
| HellaSwag (10-Shot) | 82.92 |
| MMLU (5-Shot) | 65.11 |
| TruthfulQA (0-shot) | 43.67 |
| Winogrande (5-shot) | 81.14 |
| GSM8k (5-shot) | 43.44 |