Spaces:
Sleeping
A newer version of the Gradio SDK is available: 6.12.0
title: GAIA Unit 4 Agent
emoji: 🧭
colorFrom: gray
colorTo: blue
sdk: gradio
sdk_version: 5.10.0
python_version: 3.12.0
app_file: app.py
hf_oauth: true
pinned: false
license: apache-2.0
GAIA Unit 4 — Hugging Face Agents Course (final assignment)
This folder is a drop-in replacement for the course Space
agents-course/Final_Assignment_Template.
One-time: create your Space
- On Hugging Face, Duplicate the template Space above (or create a new Gradio Space and copy these files into the repo root).
- The repo
README.mdfrontmatter must includehf_oauth: trueso Hugging Face injectsOAUTH_CLIENT_ID/ OAuth forgr.LoginButton()(required by Gradio 5.10+). - In the Space Settings → Repository secrets, add at least one LLM credential:
GROQ_API_KEY(recommended, free tier): the agent uses Groq’s OpenAI-compatible API for chat, Whisper-class ASR, and vision when this is set. You do not need Hugging Face Inference credits or a paid Hub plan for the LLM.OPENAI_API_KEY(optional): same wiring via the OpenAI SDK; used only ifGROQ_API_KEYis not set.HF_TOKEN(orHUGGINGFACEHUB_API_TOKEN, same meaning): Hugging Face token with read permission — used only when neither Groq nor OpenAI keys are set (Hub Inference / serverless). If you see 402 Payment Required, your Inference Provider credits are exhausted; preferGROQ_API_KEYinstead of adding billing, or the run will fail on LLM-heavy tasks. Several tasks are solved without the LLM (deterministic) to save quota.
- Optional Variables (or secrets) to tune models:
HF_INFERENCE_PROVIDER— omit by default so the client usesauto: the first inference provider that supports your chosen model on the Hub. Do not sethf-inferenceunless that model lists it — many chat models (e.g. Qwen2.5-7B-Instruct) only support together / featherless-ai, and forcinghf-inferenceyields 404. If the auto order hits a provider that returns 401 (e.g. Novita), reorder providers in HF settings or pin e.g.HF_INFERENCE_PROVIDER=together.GAIA_GROQ_CHAT_MODEL— defaultllama-3.1-8b-instant(Groq chat + tool calls).GAIA_OPENAI_CHAT_MODEL— defaultgpt-4o-mini(when usingOPENAI_API_KEY).GAIA_TEXT_MODEL— defaultQwen/Qwen2.5-7B-Instruct(HF Inference only).GAIA_GROQ_ASR_MODEL/GAIA_OPENAI_ASR_MODEL— Groq defaultwhisper-large-v3; OpenAI defaultwhisper-1.GAIA_GROQ_VISION_MODEL— defaultllama-3.2-11b-vision-preview.GAIA_OPENAI_VISION_MODEL— defaultgpt-4o-mini.GAIA_ASR_MODEL— HF-only defaultopenai/whisper-large-v3GAIA_VISION_MODEL— HF-only defaultmeta-llama/Llama-3.2-11B-Vision-InstructGAIA_API_URL— defaulthttps://agents-course-unit4-scoring.hf.spaceGAIA_USE_CACHE— default0. Set1to reusegaia_answers_cache.json(entries bind question text + task_id so stale cross-answers are avoided). Delete the cache file on the Space if you ever see wrong answers paired with tasks.- Groq free-tier TPM / 413 “request too large”: defaults are conservative (
GAIA_GROQ_MAX_TOOL_CHARS1400,GAIA_GROQ_CONTEXT_CHARS12000,GAIA_GROQ_MAX_TOKENS384,GAIA_AUTO_TRANSCRIPT_CHARS8000,GAIA_GROQ_AUTO_TRANSCRIPT_CHARS3600for inlined MP3 text). Increase only if you have higher Groq limits. After changing the agent, setGAIA_USE_CACHE=0once so cached Inference error strings are not resubmitted.
Keep the Space public so agent_code (…/tree/main) verifies for the leaderboard.
Course API task IDs
The evaluation server assigns one fixed task_id per question. The agent resolves several tasks by task_id first (see tools/course_task_shortcuts.py) so answers cannot drift onto the wrong row when the model hallucinates a different task. If your Gradio table ever shows a mismatch between Task ID and Question, refresh and re-fetch from GET /questions — the first column should match the live API.
Local dry-run (no submission)
cd gaia_unit4_space
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
export GROQ_API_KEY=gsk_... # or: export HF_TOKEN=hf_...
python run_local_eval.py
This fetches /questions, runs the agent, prints answers, and writes local_eval_answers.json. It does not call /submit.
What was fixed vs the stock template
- Downloads attachments when
file_nameis set (GET /files/{task_id}). - Tool-using agent (web, Wikipedia, Python, Excel, ASR, vision, YouTube transcripts).
- Deterministic shortcuts for the reversed-English puzzle, Cayley-table commutativity,
.pystdout, and.xlsxfood-sales heuristic. - Optional Crypto tab (BTC/USD demo only; not used for GAIA).
Leaderboard
Submit scores via the Gradio app after logging in. Student leaderboard:
agents-course/Students_leaderboard.