Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Clémentine
commited on
Commit
·
2f717b4
1
Parent(s):
21e3c8a
title update
Browse files- content.py +0 -1
content.py
CHANGED
|
@@ -12,7 +12,6 @@ GAIA is made of more than 450 non-trivial question with an unambiguous answer, r
|
|
| 12 |
It is divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities, each divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
| 13 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
| 14 |
|
| 15 |
-
|
| 16 |
## Submissions
|
| 17 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
| 18 |
```
|
|
|
|
| 12 |
It is divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities, each divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
| 13 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
| 14 |
|
|
|
|
| 15 |
## Submissions
|
| 16 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
| 17 |
```
|