Update README.md
Browse files
README.md
CHANGED
|
@@ -22,17 +22,12 @@ Evaluation Results
|
|
| 22 |
|
| 23 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
| 24 |
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
• ARC Challenge: The model performs decently in answering general knowledge questions.
|
| 33 |
-
• HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
| 34 |
-
• PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
|
| 35 |
-
• Winogrande: It also shows competitive performance in linguistic reasoning tasks.
|
| 36 |
|
| 37 |
Ethical Considerations
|
| 38 |
|
|
|
|
| 22 |
|
| 23 |
The model was evaluated across a range of tasks. Below are the final evaluation results (after removing GSM8k):
|
| 24 |
|
| 25 |
+

|
| 26 |
+
|
| 27 |
+
- ARC Challenge: The model performs decently in answering general knowledge questions.
|
| 28 |
+
- HellaSwag: The model is strong in commonsense reasoning, performing well in predicting the next sequence of events in a given scenario.
|
| 29 |
+
- PIQA: The model excels in physical reasoning, showcasing a solid understanding of everyday physical interactions.
|
| 30 |
+
- Winogrande: It also shows competitive performance in linguistic reasoning tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
Ethical Considerations
|
| 33 |
|