Update README.md
Browse files
README.md
CHANGED
|
@@ -75,6 +75,11 @@ Despite its compact size, Arcee Spark offers deep reasoning capabilities, making
|
|
| 75 |
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
| 76 |
<img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
| 77 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
### MT-Bench
|
| 79 |
|
| 80 |
```markdown
|
|
@@ -144,6 +149,32 @@ AGI-eval average: 51.11
|
|
| 144 |
|
| 145 |
Gpt4al Average: 69.37
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
## License
|
| 148 |
|
| 149 |
Arcee Spark is released under the Apache 2.0 license.
|
|
|
|
| 75 |
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
| 76 |
<img src="https://i.ibb.co/BLX8GmZ/Screenshot-2024-06-23-at-10-43-50-PM.png" alt="Additional Benchmark Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
| 77 |
</div>
|
| 78 |
+
|
| 79 |
+
<div style="display: flex; justify-content: center; margin: 20px 0;">
|
| 80 |
+
<img src="https://i.postimg.cc/Vs7v0Vbn/Screenshot-2024-06-24-at-1-10-58-AM.png" alt="Bigbenchhard Results" style="border-radius: 10px; max-width: 90%; height: auto; box-shadow: 0 4px 8px 0 rgba(0, 0, 0, 0.2), 0 6px 20px 0 rgba(0, 0, 0, 0.19);">
|
| 81 |
+
</div>
|
| 82 |
+
|
| 83 |
### MT-Bench
|
| 84 |
|
| 85 |
```markdown
|
|
|
|
| 149 |
|
| 150 |
Gpt4al Average: 69.37
|
| 151 |
|
| 152 |
+
## Big Bench Hard
|
| 153 |
+
|
| 154 |
+
| Task |Version| Metric |Value | |Stderr|
|
| 155 |
+
|------------------------------------------------|------:|---------------------|-----:|---|-----:|
|
| 156 |
+
|bigbench_causal_judgement | 0|multiple_choice_grade|0.6053|± |0.0356|
|
| 157 |
+
|bigbench_date_understanding | 0|multiple_choice_grade|0.6450|± |0.0249|
|
| 158 |
+
|bigbench_disambiguation_qa | 0|multiple_choice_grade|0.5233|± |0.0312|
|
| 159 |
+
|bigbench_geometric_shapes | 0|multiple_choice_grade|0.2006|± |0.0212|
|
| 160 |
+
| | |exact_str_match |0.0000|± |0.0000|
|
| 161 |
+
|bigbench_logical_deduction_five_objects | 0|multiple_choice_grade|0.2840|± |0.0202|
|
| 162 |
+
|bigbench_logical_deduction_seven_objects | 0|multiple_choice_grade|0.2429|± |0.0162|
|
| 163 |
+
|bigbench_logical_deduction_three_objects | 0|multiple_choice_grade|0.4367|± |0.0287|
|
| 164 |
+
|bigbench_movie_recommendation | 0|multiple_choice_grade|0.4720|± |0.0223|
|
| 165 |
+
|bigbench_navigate | 0|multiple_choice_grade|0.4980|± |0.0158|
|
| 166 |
+
|bigbench_reasoning_about_colored_objects | 0|multiple_choice_grade|0.5600|± |0.0111|
|
| 167 |
+
|bigbench_ruin_names | 0|multiple_choice_grade|0.4375|± |0.0235|
|
| 168 |
+
|bigbench_salient_translation_error_detection | 0|multiple_choice_grade|0.2685|± |0.0140|
|
| 169 |
+
|bigbench_snarks | 0|multiple_choice_grade|0.7348|± |0.0329|
|
| 170 |
+
|bigbench_sports_understanding | 0|multiple_choice_grade|0.6978|± |0.0146|
|
| 171 |
+
|bigbench_temporal_sequences | 0|multiple_choice_grade|0.4060|± |0.0155|
|
| 172 |
+
|bigbench_tracking_shuffled_objects_five_objects | 0|multiple_choice_grade|0.2072|± |0.0115|
|
| 173 |
+
|bigbench_tracking_shuffled_objects_seven_objects| 0|multiple_choice_grade|0.1406|± |0.0083|
|
| 174 |
+
|bigbench_tracking_shuffled_objects_three_objects| 0|multiple_choice_grade|0.4367|± |0.0287|
|
| 175 |
+
|
| 176 |
+
Big Bench average: 45.78
|
| 177 |
+
|
| 178 |
## License
|
| 179 |
|
| 180 |
Arcee Spark is released under the Apache 2.0 license.
|