gaia_unit4_space / README.md
hawkdev's picture
Course GAIA: task_id-keyed shortcuts for official 20-question set
aead1d1

A newer version of the Gradio SDK is available: 6.12.0

Upgrade
metadata
title: GAIA Unit 4 Agent
emoji: 🧭
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.10.0
python_version: 3.12.0
app_file: app.py
hf_oauth: true
pinned: false
license: apache-2.0

GAIA Unit 4 — Hugging Face Agents Course (final assignment)

This folder is a drop-in replacement for the course Space agents-course/Final_Assignment_Template.

One-time: create your Space

  1. On Hugging Face, Duplicate the template Space above (or create a new Gradio Space and copy these files into the repo root).
  2. The repo README.md frontmatter must include hf_oauth: true so Hugging Face injects OAUTH_CLIENT_ID / OAuth for gr.LoginButton() (required by Gradio 5.10+).
  3. In the Space Settings → Repository secrets, add at least one LLM credential:
    • GROQ_API_KEY (recommended, free tier): the agent uses Groq’s OpenAI-compatible API for chat, Whisper-class ASR, and vision when this is set. You do not need Hugging Face Inference credits or a paid Hub plan for the LLM.
    • OPENAI_API_KEY (optional): same wiring via the OpenAI SDK; used only if GROQ_API_KEY is not set.
    • HF_TOKEN (or HUGGINGFACEHUB_API_TOKEN, same meaning): Hugging Face token with read permission — used only when neither Groq nor OpenAI keys are set (Hub Inference / serverless). If you see 402 Payment Required, your Inference Provider credits are exhausted; prefer GROQ_API_KEY instead of adding billing, or the run will fail on LLM-heavy tasks. Several tasks are solved without the LLM (deterministic) to save quota.
  4. Optional Variables (or secrets) to tune models:
    • HF_INFERENCE_PROVIDERomit by default so the client uses auto: the first inference provider that supports your chosen model on the Hub. Do not set hf-inference unless that model lists it — many chat models (e.g. Qwen2.5-7B-Instruct) only support together / featherless-ai, and forcing hf-inference yields 404. If the auto order hits a provider that returns 401 (e.g. Novita), reorder providers in HF settings or pin e.g. HF_INFERENCE_PROVIDER=together.
    • GAIA_GROQ_CHAT_MODEL — default llama-3.1-8b-instant (Groq chat + tool calls).
    • GAIA_OPENAI_CHAT_MODEL — default gpt-4o-mini (when using OPENAI_API_KEY).
    • GAIA_TEXT_MODEL — default Qwen/Qwen2.5-7B-Instruct (HF Inference only).
    • GAIA_GROQ_ASR_MODEL / GAIA_OPENAI_ASR_MODEL — Groq default whisper-large-v3; OpenAI default whisper-1.
    • GAIA_GROQ_VISION_MODEL — default llama-3.2-11b-vision-preview.
    • GAIA_OPENAI_VISION_MODEL — default gpt-4o-mini.
    • GAIA_ASR_MODEL — HF-only default openai/whisper-large-v3
    • GAIA_VISION_MODEL — HF-only default meta-llama/Llama-3.2-11B-Vision-Instruct
    • GAIA_API_URL — default https://agents-course-unit4-scoring.hf.space
    • GAIA_USE_CACHE — default 0. Set 1 to reuse gaia_answers_cache.json (entries bind question text + task_id so stale cross-answers are avoided). Delete the cache file on the Space if you ever see wrong answers paired with tasks.
    • Groq free-tier TPM / 413 “request too large”: defaults are conservative (GAIA_GROQ_MAX_TOOL_CHARS 1400, GAIA_GROQ_CONTEXT_CHARS 12000, GAIA_GROQ_MAX_TOKENS 384, GAIA_AUTO_TRANSCRIPT_CHARS 8000, GAIA_GROQ_AUTO_TRANSCRIPT_CHARS 3600 for inlined MP3 text). Increase only if you have higher Groq limits. After changing the agent, set GAIA_USE_CACHE=0 once so cached Inference error strings are not resubmitted.

Keep the Space public so agent_code (…/tree/main) verifies for the leaderboard.

Course API task IDs

The evaluation server assigns one fixed task_id per question. The agent resolves several tasks by task_id first (see tools/course_task_shortcuts.py) so answers cannot drift onto the wrong row when the model hallucinates a different task. If your Gradio table ever shows a mismatch between Task ID and Question, refresh and re-fetch from GET /questions — the first column should match the live API.

Local dry-run (no submission)

cd gaia_unit4_space
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export GROQ_API_KEY=gsk_...   # or: export HF_TOKEN=hf_...
python run_local_eval.py

This fetches /questions, runs the agent, prints answers, and writes local_eval_answers.json. It does not call /submit.

What was fixed vs the stock template

  • Downloads attachments when file_name is set (GET /files/{task_id}).
  • Tool-using agent (web, Wikipedia, Python, Excel, ASR, vision, YouTube transcripts).
  • Deterministic shortcuts for the reversed-English puzzle, Cayley-table commutativity, .py stdout, and .xlsx food-sales heuristic.
  • Optional Crypto tab (BTC/USD demo only; not used for GAIA).

Leaderboard

Submit scores via the Gradio app after logging in. Student leaderboard: agents-course/Students_leaderboard.