bfuzzy1
/

acheron

bfuzzy1 commited on Oct 13, 2024

Commit

1b39890

verified ·

1 Parent(s): a703860

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -47,11 +47,12 @@ Evaluation Results
 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
-Task	Accuracy	Normalized Accuracy
-ARC Challenge	32.42%	37.29%
-HellaSwag	47.83%	63.02%
-PIQA	74.37%	N/A
-Winogrande	62.12%	N/A
 	•	ARC Challenge: The model performs decently in answering general knowledge questions.
 	•	HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.

 The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
+| Parameters | Model     | MMLU  | ARC-C  | HellaSwag | PIQA   | Winogrande | Average |
+|------------|-----------|-------|--------|-----------|--------|------------|---------|
+| 500M       | qwen 2    | 44.13 | 28.92  | 49.05     | 69.31  | 56.99      | 49.68   |
+| 500M       | qwen 2.5  | 47.29 | 31.83  | 52.17     | 70.29  | 57.06      | 51.72   |
+| 1.24B      | llama 3.2 | 36.75 | 36.18  | 63.70     | 74.54  | 60.54      | 54.34   |
+| 514M       | archeon   | NA    | 32.34  | 47.80     | 74.37  | 62.12      | 54.16   |
 	•	ARC Challenge: The model performs decently in answering general knowledge questions.
 	•	HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.