Added updated evaluation for 2907 model
Browse files
README.md
CHANGED
|
@@ -24,6 +24,7 @@ This table tracks the performance of our model on various tasks over time.
|
|
| 24 |
|
| 25 |
| Date (YYYY-MM-DD) | Metric | arc_easy | hellaswag | sglue_rte | truthfulqa | Avg |
|
| 26 |
|-------------------|----------|---------------|---------------|---------------|---------------| ---- |
|
|
|
|
| 27 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
| 28 |
|
| 29 |
## Legend
|
|
|
|
| 24 |
|
| 25 |
| Date (YYYY-MM-DD) | Metric | arc_easy | hellaswag | sglue_rte | truthfulqa | Avg |
|
| 26 |
|-------------------|----------|---------------|---------------|---------------|---------------| ---- |
|
| 27 |
+
| 2024-07-29 | acc | 32.24% ± 0.96% | 25.74% ± 0.44% | 47.29% ± 3.01% | 39.91% ± 1.11% | 36.30% |
|
| 28 |
| 2024-07-27 | acc | 27.40% ± 0.92% | 25.52% ± 0.44% | 52.71% ± 3.01% | 39.52% ± 1.11% | 36.29% |
|
| 29 |
|
| 30 |
## Legend
|