Buckets:

proy
/

GAIA-bucket

109 MB

119 files

Updated 16 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
2023		16 days ago	117 items
.gitattributes	2.73 kB xet	16 days ago	38dc6ff9
README.md	3.29 kB xet	16 days ago	23092f07

README.md

GAIA dataset

GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc).

We added gating to prevent bots from scraping the dataset. Please do not reshare the validation or test set in a crawlable format.

Data and leaderboard

GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. It is therefore divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities. Each level is divided into a fully public dev set for validation, and a test set with private answers and metadata.

GAIA leaderboard can be found in this space (https://huggingface.co/spaces/gaia-benchmark/leaderboard).

Questions are contained in metadata.jsonl. Some questions come with an additional file, that can be found in the same folder and whose id is given in the field file_name.

More details in the paper for now and soon here as well.

Dataset Format update (October 2025)

To keep GAIA compatible with HF datasets 4.x where code-based dataset loaders are deprecated—we now ship Parquet-backed splits that mirror the former JSONL structure:

metadata.parquet carries the full split, and companion files like metadata.level1.parquet retain the per-level views exposed in the configs.
Columns remain task_id, Question, Level, Final answer, file_name, file_path, and the struct-valued Annotator Metadata, so existing processing pipelines can continue unchanged.
file_path keeps pointing to attachments relative to the repository root (for example, 2023/test/<attachment-id>.pdf), ensuring offline access to PDFs, media, and other auxiliary files.

Load datasets

import os

from datasets import load_dataset
from huggingface_hub import snapshot_download

data_dir = snapshot_download(repo_id="gaia-benchmark/GAIA", repo_type="dataset")
dataset = load_dataset(data_dir, "2023_level1", split="test")
for example in dataset:
    question = example["Question"]
    file_path = os.path.join(data_dir, example["file_path"])

Total size: 109 MB

Files: 119

Last updated: Jun 11

Pre-warmed CDN: US EU US EU

GAIA dataset

Data and leaderboard

Dataset Format update (October 2025)

Load datasets

Contributors