Update README.md
Browse files
README.md
CHANGED
|
@@ -47,11 +47,12 @@ Evaluation Results
|
|
| 47 |
|
| 48 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
| 49 |
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
|
|
|
| 55 |
|
| 56 |
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
| 57 |
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
|
|
|
| 47 |
|
| 48 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
| 49 |
|
| 50 |
+
| Parameters | Model | MMLU | ARC-C | HellaSwag | PIQA | Winogrande | Average |
|
| 51 |
+
|------------|-----------|-------|--------|-----------|--------|------------|---------|
|
| 52 |
+
| 500M | qwen 2 | 44.13 | 28.92 | 49.05 | 69.31 | 56.99 | 49.68 |
|
| 53 |
+
| 500M | qwen 2.5 | 47.29 | 31.83 | 52.17 | 70.29 | 57.06 | 51.72 |
|
| 54 |
+
| 1.24B | llama 3.2 | 36.75 | 36.18 | 63.70 | 74.54 | 60.54 | 54.34 |
|
| 55 |
+
| 514M | archeon | NA | 32.34 | 47.80 | 74.37 | 62.12 | 54.16 |
|
| 56 |
|
| 57 |
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
| 58 |
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|