Spaces:

W01fAI
/

gaia_unit4_space

Sleeping

App Files Files Community

hawkdev commited on Mar 22

Commit

ac299d5

1 Parent(s): c1ea04f

initial commit

Browse files

Files changed (25) hide show

README.md +48 -6
__pycache__/agent.cpython-312.pyc +0 -0
__pycache__/answer_normalize.cpython-312.pyc +0 -0
__pycache__/debug_ndjson.cpython-312.pyc +0 -0
__pycache__/inference_client_factory.cpython-312.pyc +0 -0
agent.py +151 -0
answer_normalize.py +44 -0
app.py +289 -0
inference_client_factory.py +31 -0
requirements.txt +11 -0
run_local_eval.py +111 -0
tools/__init__.py +5 -0
tools/__pycache__/__init__.cpython-312.pyc +0 -0
tools/__pycache__/code_tools.cpython-312.pyc +0 -0
tools/__pycache__/excel_tools.cpython-312.pyc +0 -0
tools/__pycache__/media_tools.cpython-312.pyc +0 -0
tools/__pycache__/registry.cpython-312.pyc +0 -0
tools/__pycache__/web_tools.cpython-312.pyc +0 -0
tools/__pycache__/wiki_tools.cpython-312.pyc +0 -0
tools/code_tools.py +103 -0
tools/excel_tools.py +101 -0
tools/media_tools.py +97 -0
tools/registry.py +227 -0
tools/web_tools.py +109 -0
tools/wiki_tools.py +32 -0

README.md CHANGED Viewed

@@ -1,13 +1,55 @@
 ---
-title: Gaia Unit4 Space
-emoji: 📊
 colorFrom: gray
-colorTo: pink
 sdk: gradio
-sdk_version: 6.9.0
 app_file: app.py
 pinned: false
-short_description: Final project for Agents Course
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GAIA Unit 4 Agent
+emoji: 🧭
 colorFrom: gray
+colorTo: blue
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: apache-2.0
 ---
+# GAIA Unit 4 — Hugging Face Agents Course (final assignment)
+This folder is a **drop-in replacement** for the course Space
+[`agents-course/Final_Assignment_Template`](https://huggingface.co/spaces/agents-course/Final_Assignment_Template).
+## One-time: create your Space
+1. On Hugging Face, **Duplicate** the template Space above (or create a new Gradio Space and copy these files into the repo root).
+2. In the Space **Settings → Repository secrets**, add:
+   - **`HF_TOKEN`**: a Hugging Face access token with **read** permission (for Inference API / serverless models).
+3. Optional **Variables** (or secrets) to tune models:
+   - `HF_INFERENCE_PROVIDER` — **omit by default** so the client uses **`auto`**: the first [inference provider](https://hf.co/settings/inference-providers) that supports your **chosen model** on the Hub. Do **not** set `hf-inference` unless that model lists it — many chat models (e.g. Qwen2.5-7B-Instruct) only support **together** / **featherless-ai**, and forcing `hf-inference` yields **404**. If the auto order hits a provider that returns **401** (e.g. Novita), reorder providers in HF settings or pin e.g. `HF_INFERENCE_PROVIDER=together`.
+   - `GAIA_TEXT_MODEL` — default `Qwen/Qwen2.5-7B-Instruct` (broad provider mapping via Together).
+   - `GAIA_ASR_MODEL` — default `openai/whisper-large-v3`
+   - `GAIA_VISION_MODEL` — default `meta-llama/Llama-3.2-11B-Vision-Instruct`
+   - `GAIA_API_URL` — default `https://agents-course-unit4-scoring.hf.space`
+   - `GAIA_USE_CACHE` — `1` (default) or `0` to disable `gaia_answers_cache.json`
+Keep the Space **public** so `agent_code` (`…/tree/main`) verifies for the leaderboard.
+## Local dry-run (no submission)
+```bash
+cd gaia_unit4_space
+python -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+export HF_TOKEN=hf_...
+python run_local_eval.py
+```
+This fetches `/questions`, runs the agent, prints answers, and writes `local_eval_answers.json`. It does **not** call `/submit`.
+## What was fixed vs the stock template
+- Downloads attachments when `file_name` is set (`GET /files/{task_id}`).
+- Tool-using agent (web, Wikipedia, Python, Excel, ASR, vision, YouTube transcripts).
+- Deterministic shortcuts for the reversed-English puzzle, Cayley-table commutativity, `.py` stdout, and `.xlsx` food-sales heuristic.
+- Optional **Crypto** tab (BTC/USD demo only; not used for GAIA).
+## Leaderboard
+Submit scores via the Gradio app after logging in. Student leaderboard:
+[`agents-course/Students_leaderboard`](https://huggingface.co/spaces/agents-course/Students_leaderboard).

__pycache__/agent.cpython-312.pyc ADDED Viewed

Binary file (7.4 kB). View file

__pycache__/answer_normalize.cpython-312.pyc ADDED Viewed

Binary file (2.49 kB). View file

__pycache__/debug_ndjson.cpython-312.pyc ADDED Viewed

Binary file (1.53 kB). View file

__pycache__/inference_client_factory.cpython-312.pyc ADDED Viewed

Binary file (2.42 kB). View file

agent.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""GAIA Unit 4 agent: tool-calling loop via Hugging Face Inference API."""
+from __future__ import annotations
+import os
+from typing import Any, Optional
+from huggingface_hub import InferenceClient
+from answer_normalize import normalize_answer
+from inference_client_factory import inference_client_kwargs
+from tools.registry import TOOL_DEFINITIONS, deterministic_attempt, dispatch_tool
+SYSTEM_PROMPT = """You solve GAIA benchmark questions for the Hugging Face Agents Course.
+Hard rules:
+- Call tools as needed (search, Wikipedia, fetch URL, Python, audio, image, Excel).
+- Your final assistant message must contain ONLY the answer text required by the question — no labels like "FINAL ANSWER", no markdown fences, no extra sentences.
+- Match the question's format exactly (comma-separated, alphabetical order, IOC codes, algebraic notation, two-decimal USD, first name only, etc.).
+- When a local attachment path is given, use the appropriate tool with that exact path.
+- For English Wikipedia tasks, use wikipedia_* tools; cross-check with web_search if needed.
+- For YouTube URLs in the question, try youtube_transcript first.
+"""
+class GaiaAgent:
+    def __init__(
+        self,
+        *,
+        hf_token: Optional[str] = None,
+        text_model: Optional[str] = None,
+        max_iterations: int = 14,
+    ):
+        self.hf_token = (
+            hf_token
+            or os.environ.get("HF_TOKEN")
+            or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+        )
+        self.text_model = text_model or os.environ.get(
+            "GAIA_TEXT_MODEL", "Qwen/Qwen2.5-7B-Instruct"
+        )
+        self.max_iterations = max_iterations
+        self._client: Optional[InferenceClient] = None
+    def _get_client(self) -> InferenceClient:
+        if self._client is None:
+            if not self.hf_token:
+                raise RuntimeError(
+                    "HF_TOKEN or HUGGINGFACEHUB_API_TOKEN is required for GaiaAgent."
+                )
+            kw = inference_client_kwargs(self.hf_token)
+            self._client = InferenceClient(**kw)
+        return self._client
+    def __call__(
+        self,
+        question: str,
+        attachment_path: Optional[str] = None,
+        task_id: Optional[str] = None,
+    ) -> str:
+        det = deterministic_attempt(question, attachment_path)
+        if det is not None:
+            return normalize_answer(det)
+        if not self.hf_token:
+            return normalize_answer(
+                "Error: missing HF_TOKEN; cannot run LLM tools for this question."
+            )
+        user_text = _build_user_payload(question, attachment_path, task_id)
+        messages: list[dict[str, Any]] = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": user_text},
+        ]
+        client = self._get_client()
+        last_text = ""
+        for _ in range(self.max_iterations):
+            try:
+                completion = client.chat_completion(
+                    messages=messages,
+                    model=self.text_model,
+                    tools=TOOL_DEFINITIONS,
+                    tool_choice="auto",
+                    max_tokens=1024,
+                    temperature=0.15,
+                )
+            except Exception as e:
+                last_text = f"Inference error: {e}"
+                break
+            choice = completion.choices[0]
+            msg = choice.message
+            last_text = (msg.content or "").strip()
+            if msg.tool_calls:
+                messages.append(
+                    {
+                        "role": "assistant",
+                        "content": msg.content if msg.content else None,
+                        "tool_calls": [
+                            {
+                                "id": tc.id,
+                                "type": "function",
+                                "function": {
+                                    "name": tc.function.name,
+                                    "arguments": tc.function.arguments,
+                                },
+                            }
+                            for tc in msg.tool_calls
+                        ],
+                    }
+                )
+                for tc in msg.tool_calls:
+                    name = tc.function.name
+                    args = tc.function.arguments or "{}"
+                    result = dispatch_tool(name, args, hf_token=self.hf_token)
+                    messages.append(
+                        {
+                            "role": "tool",
+                            "tool_call_id": tc.id,
+                            "content": result[:24_000],
+                        }
+                    )
+                continue
+            if last_text:
+                break
+            if choice.finish_reason == "length":
+                last_text = "Error: model hit max length without an answer."
+                break
+        return normalize_answer(last_text or "Error: empty response.")
+def _build_user_payload(
+    question: str,
+    attachment_path: Optional[str],
+    task_id: Optional[str],
+) -> str:
+    parts = []
+    if task_id:
+        parts.append(f"task_id: {task_id}")
+    parts.append(f"Question:\n{question.strip()}")
+    if attachment_path:
+        parts.append(f"\nAttachment path (use with tools): {attachment_path}")
+    else:
+        parts.append("\nNo attachment.")
+    return "\n".join(parts)

answer_normalize.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""Post-process model output for GAIA exact-match submission."""
+import re
+from typing import Any, Union
+_FINAL_ANSWER_RE = re.compile(
+    r"^\s*(?:FINAL\s*ANSWER\s*[:：]?\s*)",
+    re.IGNORECASE,
+)
+def normalize_answer(raw: Union[str, int, float, None]) -> Union[str, int, float]:
+    """
+    Strip wrappers and forbidden prefixes. Prefer returning a string for API compatibility.
+    """
+    if raw is None:
+        return ""
+    if isinstance(raw, (int, float)) and not isinstance(raw, bool):
+        return raw
+    text = str(raw).strip()
+    if not text:
+        return ""
+    text = _FINAL_ANSWER_RE.sub("", text, count=1).strip()
+    # Strip common wrappers (single line)
+    for prefix in ("The answer is", "Answer:", "ANSWER:", "```", "`"):
+        if text.lower().startswith(prefix.lower()):
+            text = text[len(prefix) :].strip()
+    if text.startswith('"') and text.endswith('"') and len(text) >= 2:
+        text = text[1:-1].strip()
+    if text.startswith("```"):
+        text = re.sub(r"^```\w*\s*", "", text)
+        text = re.sub(r"\s*```$", "", text).strip()
+    return text.strip()
+def maybe_numeric(text: str) -> Union[str, int, float]:
+    """If the prompt expects a plain number, allow int/float submission."""
+    t = text.strip()
+    if re.fullmatch(r"-?\d+", t):
+        return int(t)
+    if re.fullmatch(r"-?\d+\.\d+", t):
+        return float(t)
+    return text

app.py ADDED Viewed

	@@ -0,0 +1,289 @@

+import json
+import os
+import tempfile
+from pathlib import Path
+import gradio as gr
+import pandas as pd
+import requests
+from agent import GaiaAgent
+from answer_normalize import normalize_answer
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+CACHE_FILENAME = "gaia_answers_cache.json"
+def _cache_path() -> Path:
+    return Path(__file__).resolve().parent / CACHE_FILENAME
+def _load_cache() -> dict:
+    p = _cache_path()
+    if not p.is_file():
+        return {}
+    try:
+        return json.loads(p.read_text(encoding="utf-8"))
+    except json.JSONDecodeError:
+        return {}
+def _save_cache(cache: dict) -> None:
+    _cache_path().write_text(json.dumps(cache, indent=2), encoding="utf-8")
+def _download_attachment(api_url: str, task_id: str, file_name: str) -> str | None:
+    """Save task attachment to a temp file; return path or None."""
+    if not file_name or not str(file_name).strip():
+        return None
+    url = f"{api_url}/files/{task_id}"
+    try:
+        r = requests.get(url, timeout=120)
+    except requests.RequestException:
+        return None
+    if r.status_code != 200:
+        return None
+    ctype = (r.headers.get("Content-Type") or "").lower()
+    if "application/json" in ctype:
+        try:
+            data = r.json()
+            if isinstance(data, dict) and data.get("detail"):
+                return None
+        except json.JSONDecodeError:
+            pass
+    suffix = Path(file_name).suffix or ""
+    fd, path = tempfile.mkstemp(suffix=suffix, prefix=f"gaia_{task_id[:8]}_")
+    try:
+        with os.fdopen(fd, "wb") as f:
+            f.write(r.content)
+    except OSError:
+        return None
+    return path
+def run_and_submit_all(profile: gr.OAuthProfile | None):
+    space_id = os.getenv("SPACE_ID")
+    use_cache = os.getenv("GAIA_USE_CACHE", "1").lower() in ("1", "true", "yes")
+    if profile:
+        username = f"{profile.username}"
+        print(f"User logged in: {username}")
+    else:
+        print("User not logged in.")
+        return "Please Login to Hugging Face with the button.", None
+    api_url = os.getenv("GAIA_API_URL", DEFAULT_API_URL)
+    questions_url = f"{api_url}/questions"
+    submit_url = f"{api_url}/submit"
+    try:
+        agent = GaiaAgent()
+    except Exception as e:
+        print(f"Error instantiating agent: {e}")
+        return f"Error initializing agent: {e}", None
+    agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
+    print(agent_code)
+    print(f"Fetching questions from: {questions_url}")
+    try:
+        response = requests.get(questions_url, timeout=60)
+        response.raise_for_status()
+        questions_data = response.json()
+        if not questions_data:
+            return "Fetched questions list is empty or invalid format.", None
+        print(f"Fetched {len(questions_data)} questions.")
+    except requests.exceptions.RequestException as e:
+        return f"Error fetching questions: {e}", None
+    except json.JSONDecodeError as e:
+        return f"Error decoding server response for questions: {e}", None
+    cache = _load_cache() if use_cache else {}
+    results_log = []
+    answers_payload = []
+    print(f"Running agent on {len(questions_data)} questions...")
+    for item in questions_data:
+        task_id = item.get("task_id")
+        question_text = item.get("question")
+        file_name = item.get("file_name") or ""
+        if not task_id or question_text is None:
+            print(f"Skipping item with missing task_id or question: {item}")
+            continue
+        cache_key = str(task_id)
+        if use_cache and cache_key in cache:
+            submitted_answer = normalize_answer(cache[cache_key])
+            print(f"Cache hit for {task_id}")
+        else:
+            local_path: str | None = None
+            try:
+                if file_name and str(file_name).strip():
+                    local_path = _download_attachment(api_url, str(task_id), str(file_name))
+                    if local_path:
+                        print(f"Downloaded attachment for {task_id} -> {local_path}")
+                submitted_answer = agent(
+                    str(question_text),
+                    attachment_path=local_path,
+                    task_id=str(task_id),
+                )
+                submitted_answer = normalize_answer(submitted_answer)
+                if use_cache:
+                    cache[cache_key] = (
+                        submitted_answer
+                        if isinstance(submitted_answer, str)
+                        else str(submitted_answer)
+                    )
+                    _save_cache(cache)
+            except Exception as e:
+                print(f"Error running agent on task {task_id}: {e}")
+                submitted_answer = f"AGENT ERROR: {e}"
+            finally:
+                if local_path and Path(local_path).is_file():
+                    try:
+                        Path(local_path).unlink(missing_ok=True)
+                    except OSError:
+                        pass
+        answers_payload.append(
+            {
+                "task_id": task_id,
+                "submitted_answer": submitted_answer,
+            }
+        )
+        results_log.append(
+            {
+                "Task ID": task_id,
+                "Question": question_text,
+                "Submitted Answer": submitted_answer,
+            }
+        )
+    if not answers_payload:
+        return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
+    submission_data = {
+        "username": username.strip(),
+        "agent_code": agent_code,
+        "answers": answers_payload,
+    }
+    status_update = (
+        f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
+    )
+    print(status_update)
+    print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
+    try:
+        response = requests.post(submit_url, json=submission_data, timeout=600)
+        response.raise_for_status()
+        result_data = response.json()
+        final_status = (
+            f"Submission Successful!\n"
+            f"User: {result_data.get('username')}\n"
+            f"Overall Score: {result_data.get('score', 'N/A')}% "
+            f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
+            f"Message: {result_data.get('message', 'No message received.')}"
+        )
+        print("Submission successful.")
+        results_df = pd.DataFrame(results_log)
+        return final_status, results_df
+    except requests.exceptions.HTTPError as e:
+        error_detail = f"Server responded with status {e.response.status_code}."
+        try:
+            error_json = e.response.json()
+            error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
+        except json.JSONDecodeError:
+            error_detail += f" Response: {e.response.text[:500]}"
+        status_message = f"Submission Failed: {error_detail}"
+        print(status_message)
+        return status_message, pd.DataFrame(results_log)
+    except requests.exceptions.Timeout:
+        status_message = "Submission Failed: The request timed out."
+        print(status_message)
+        return status_message, pd.DataFrame(results_log)
+    except requests.exceptions.RequestException as e:
+        status_message = f"Submission Failed: Network error - {e}"
+        print(status_message)
+        return status_message, pd.DataFrame(results_log)
+    except Exception as e:
+        status_message = f"An unexpected error occurred during submission: {e}"
+        print(status_message)
+        return status_message, pd.DataFrame(results_log)
+def crypto_btc_price() -> str:
+    """Optional demo: live BTC/USD (not used for GAIA scoring)."""
+    try:
+        r = requests.get(
+            "https://api.coingecko.com/api/v3/simple/price",
+            params={"ids": "bitcoin", "vs_currencies": "usd"},
+            timeout=20,
+        )
+        r.raise_for_status()
+        data = r.json()
+        usd = data.get("bitcoin", {}).get("usd")
+        return f"Bitcoin (BTC) ~ ${usd:,.2f} USD (CoinGecko public API)."
+    except Exception as e:
+        return f"Could not fetch price: {e}"
+with gr.Blocks() as demo:
+    gr.Markdown("# GAIA Unit 4 — Agent Evaluation Runner")
+    gr.Markdown(
+        """
+**Instructions**
+1. Duplicate this Space from the course template (or push this repo) and set **Secrets**: `HF_TOKEN` (read access to Inference).
+2. Optional env vars: `GAIA_TEXT_MODEL`, `GAIA_ASR_MODEL`, `GAIA_VISION_MODEL`, `GAIA_API_URL`, `GAIA_USE_CACHE` (default `1`).
+3. Log in with Hugging Face below (username is used for the leaderboard).
+4. Run **Evaluate & Submit** to answer all questions and post scores.
+Attachment tasks download `GET /files/{task_id}` automatically when `file_name` is set.
+---
+**Crypto demo (optional):** unrelated to GAIA; quick BTC spot check.
+        """
+    )
+    gr.LoginButton()
+    with gr.Tab("GAIA evaluation"):
+        run_button = gr.Button("Run Evaluation & Submit All Answers")
+        status_output = gr.Textbox(
+            label="Run Status / Submission Result", lines=6, interactive=False
+        )
+        results_table = gr.DataFrame(
+            label="Questions and Agent Answers", wrap=True
+        )
+        run_button.click(
+            fn=run_and_submit_all,
+            outputs=[status_output, results_table],
+        )
+    with gr.Tab("Crypto intelligence (demo)"):
+        gr.Markdown(
+            "This tab does not affect GAIA scores. It demonstrates a simple public market data fetch."
+        )
+        cp_btn = gr.Button("Fetch BTC / USD")
+        cp_out = gr.Textbox(label="Output", interactive=False)
+        cp_btn.click(fn=crypto_btc_price, outputs=cp_out)
+if __name__ == "__main__":
+    print("\n" + "-" * 30 + " App Starting " + "-" * 30)
+    space_host_startup = os.getenv("SPACE_HOST")
+    space_id_startup = os.getenv("SPACE_ID")
+    if space_host_startup:
+        print(f"SPACE_HOST found: {space_host_startup}")
+    else:
+        print("SPACE_HOST not set (local run?).")
+    if space_id_startup:
+        print(f"SPACE_ID found: {space_id_startup}")
+        print(f"Repo tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
+    else:
+        print("SPACE_ID not set (local run?).")
+    print("-" * 62 + "\n")
+    demo.launch(debug=True, share=False)

inference_client_factory.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""Build InferenceClient with a provider that accepts the user's HF token."""
+from __future__ import annotations
+import os
+from huggingface_hub import InferenceClient
+def inference_client_kwargs(token: str) -> dict:
+    """
+    Default: **no** ``provider`` → the library uses ``auto``: first provider for this
+    model per your https://hf.co/settings/inference-providers order.
+    Forcing ``hf-inference`` breaks many chat models (e.g. Qwen2.5-7B-Instruct is only on
+    together / featherless-ai — the router then returns **404** for …/hf-inference/models/…).
+    Set ``HF_INFERENCE_PROVIDER`` to pin one provider (e.g. ``together``, ``sambanova``)
+    or ``auto`` explicitly. Use ``hf-inference`` only for models that actually list it.
+    """
+    raw = os.environ.get("HF_INFERENCE_PROVIDER")
+    if raw is None:
+        return {"token": token}
+    r = raw.strip().lower()
+    if r in ("", "auto"):
+        return {"token": token}
+    return {"token": token, "provider": r}
+def make_inference_client(token: str) -> InferenceClient:
+    return InferenceClient(**inference_client_kwargs(token))

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+gradio>=4.44.0
+requests>=2.31.0
+pandas>=2.0.0
+openpyxl>=3.1.0
+beautifulsoup4>=4.12.0
+lxml>=5.0.0
+duckduckgo-search>=6.0.0
+wikipedia>=1.4.0
+huggingface_hub>=0.26.0
+youtube-transcript-api>=0.6.0
+Pillow>=10.0.0

run_local_eval.py ADDED Viewed

	@@ -0,0 +1,111 @@

+#!/usr/bin/env python3
+"""Fetch GAIA course questions, run GaiaAgent, save JSON — does not submit."""
+from __future__ import annotations
+import argparse
+import json
+import os
+import sys
+import tempfile
+from pathlib import Path
+import requests
+ROOT = Path(__file__).resolve().parent
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+from agent import GaiaAgent  # noqa: E402
+from answer_normalize import normalize_answer  # noqa: E402
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+def download_file(api_url: str, task_id: str, file_name: str) -> str | None:
+    if not file_name or not str(file_name).strip():
+        return None
+    url = f"{api_url}/files/{task_id}"
+    r = requests.get(url, timeout=120)
+    if r.status_code != 200:
+        return None
+    ctype = (r.headers.get("Content-Type") or "").lower()
+    if "application/json" in ctype:
+        try:
+            data = r.json()
+            if isinstance(data, dict) and data.get("detail"):
+                return None
+        except json.JSONDecodeError:
+            pass
+    suffix = Path(file_name).suffix or ""
+    fd, path = tempfile.mkstemp(suffix=suffix, prefix=f"gaia_{task_id[:8]}_")
+    with os.fdopen(fd, "wb") as f:
+        f.write(r.content)
+    return path
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--api-url",
+        default=os.environ.get("GAIA_API_URL", DEFAULT_API_URL),
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        default=str(ROOT / "local_eval_answers.json"),
+        help="Write answers JSON here",
+    )
+    args = parser.parse_args()
+    q_url = f"{args.api_url.rstrip('/')}/questions"
+    print(f"GET {q_url}")
+    r = requests.get(q_url, timeout=60)
+    r.raise_for_status()
+    items = r.json()
+    print(f"{len(items)} questions")
+    token = os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+    agent = GaiaAgent(hf_token=token) if token else None
+    out: list[dict] = []
+    for item in items:
+        tid = item.get("task_id")
+        q = item.get("question")
+        fn = item.get("file_name") or ""
+        if not tid or q is None:
+            continue
+        local = None
+        try:
+            if fn and str(fn).strip():
+                local = download_file(args.api_url, str(tid), str(fn))
+            if agent is not None:
+                ans = agent(str(q), attachment_path=local, task_id=str(tid))
+            else:
+                from tools.registry import deterministic_attempt
+                d = deterministic_attempt(str(q), local)
+                ans = d if d is not None else "NO_HF_TOKEN"
+        finally:
+            if local and Path(local).is_file():
+                Path(local).unlink(missing_ok=True)
+        if isinstance(ans, (int, float)) and not isinstance(ans, bool):
+            sub = ans
+        else:
+            sub = normalize_answer(ans)
+        out.append(
+            {
+                "task_id": tid,
+                "question": q,
+                "submitted_answer": sub,
+            }
+        )
+        print(f"--- {tid[:8]}… -> {out[-1]['submitted_answer']!r}")
+    Path(args.output).write_text(json.dumps(out, indent=2), encoding="utf-8")
+    print(f"Wrote {args.output}")
+if __name__ == "__main__":
+    main()

tools/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Tool implementations for the GAIA agent."""
+from .registry import TOOL_DEFINITIONS, dispatch_tool
+__all__ = ["TOOL_DEFINITIONS", "dispatch_tool"]

tools/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (348 Bytes). View file

tools/__pycache__/code_tools.cpython-312.pyc ADDED Viewed

Binary file (5.27 kB). View file

tools/__pycache__/excel_tools.cpython-312.pyc ADDED Viewed

Binary file (4.13 kB). View file

tools/__pycache__/media_tools.cpython-312.pyc ADDED Viewed

Binary file (4.66 kB). View file

tools/__pycache__/registry.cpython-312.pyc ADDED Viewed

Binary file (5.69 kB). View file

tools/__pycache__/web_tools.cpython-312.pyc ADDED Viewed

Binary file (5.57 kB). View file

tools/__pycache__/wiki_tools.cpython-312.pyc ADDED Viewed

Binary file (2.26 kB). View file

tools/code_tools.py ADDED Viewed

	@@ -0,0 +1,103 @@

+import subprocess
+import sys
+from typing import Optional
+def run_python_snippet(code: str, timeout_sec: int = 45) -> str:
+    """Execute Python in a subprocess (no network). For short derived calculations."""
+    if not code.strip():
+        return "Error: empty code."
+    try:
+        proc = subprocess.run(
+            [sys.executable, "-c", code],
+            capture_output=True,
+            text=True,
+            timeout=timeout_sec,
+            env={**__import__("os").environ, "PYTHONHASHSEED": "0"},
+        )
+    except subprocess.TimeoutExpired:
+        return "Error: execution timed out."
+    out = (proc.stdout or "").strip()
+    err = (proc.stderr or "").strip()
+    if proc.returncode != 0:
+        return f"Exit {proc.returncode}. stderr: {err[:2000]}"
+    if err:
+        out = f"{out}\n(stderr: {err[:1500]})" if out else err
+    return out[:30_000] if out else "(no stdout)"
+def run_python_file(file_path: str, timeout_sec: int = 60) -> str:
+    """Run an attached .py file and capture stdout."""
+    if not file_path.endswith(".py"):
+        return "Error: not a .py path."
+    try:
+        proc = subprocess.run(
+            [sys.executable, file_path],
+            capture_output=True,
+            text=True,
+            timeout=timeout_sec,
+            env={**__import__("os").environ, "PYTHONHASHSEED": "0"},
+        )
+    except subprocess.TimeoutExpired:
+        return "Error: execution timed out."
+    out = (proc.stdout or "").strip()
+    err = (proc.stderr or "").strip()
+    if proc.returncode != 0:
+        return f"Exit {proc.returncode}. stderr: {err[:2000]}"
+    return (out or err)[:30_000]
+def solve_cayley_noncommutative_subset(question: str) -> Optional[str]:
+    """
+    Parse a Cayley table from the question (markdown) and return the sorted
+    comma-separated elements involved in any non-commuting pair.
+    """
+    if "* on the set S" not in question and "not commutative" not in question:
+        return None
+    lines = [ln.strip() for ln in question.splitlines() if ln.strip().startswith("|")]
+    if len(lines) < 3:
+        return None
+    def split_row(ln: str) -> list[str]:
+        parts = [p.strip() for p in ln.strip("|").split("|")]
+        return parts
+    header = split_row(lines[0])
+    if len(header) < 2 or header[0] != "*":
+        return None
+    cols = header[1:]
+    op: dict[tuple[str, str], str] = {}
+    for ln in lines[1:]:
+        cells = split_row(ln)
+        if len(cells) < 2:
+            continue
+        row_sym = cells[0]
+        for j, c in enumerate(cols):
+            if j + 1 >= len(cells):
+                break
+            op[(row_sym, c)] = cells[j + 1]
+    elems = cols
+    involved: set[str] = set()
+    for a in elems:
+        for b in elems:
+            ab = op.get((a, b))
+            ba = op.get((b, a))
+            if ab is None or ba is None:
+                continue
+            if ab != ba:
+                involved.add(a)
+                involved.add(b)
+    if not involved:
+        return None
+    return ", ".join(sorted(involved))
+def reverse_english_puzzle_answer(question: str) -> Optional[str]:
+    """If the question is reversed English about 'left', return 'right'."""
+    q = question.strip()
+    if not q:
+        return None
+    rev = q[::-1]
+    if "opposite" in rev.lower() and '"left"' in rev and "answer" in rev.lower():
+        return "right"
+    return None

tools/excel_tools.py ADDED Viewed

	@@ -0,0 +1,101 @@

+import re
+from pathlib import Path
+from typing import Optional
+import pandas as pd
+def excel_food_sales_total_usd(path: str) -> str:
+    """
+    Sum sales for food items excluding drinks from the GAIA-style fast-food workbook.
+    Heuristic: classify rows using a Category/Type/Menu column; exclude drink/beverage/soda.
+    """
+    p = Path(path)
+    if not p.exists():
+        return f"Error: file not found: {path}"
+    try:
+        xl = pd.ExcelFile(path)
+    except Exception as e:
+        return f"Error opening Excel: {e}"
+    total = 0.0
+    notes: list[str] = []
+    for sheet in xl.sheet_names:
+        try:
+            df = pd.read_excel(xl, sheet_name=sheet)
+        except Exception as e:
+            notes.append(f"{sheet}: read error {e}")
+            continue
+        if df.empty:
+            continue
+        cols_lower = {str(c).lower(): c for c in df.columns}
+        cat_col = None
+        for key in (
+            "category",
+            "type",
+            "menu category",
+            "item type",
+            "group",
+        ):
+            if key in cols_lower:
+                cat_col = cols_lower[key]
+                break
+        money_col = None
+        for c in df.columns:
+            cl = str(c).lower()
+            if any(
+                w in cl
+                for w in ("sales", "revenue", "total", "amount", "usd", "price")
+            ):
+                if df[c].dtype == object or pd.api.types.is_numeric_dtype(df[c]):
+                    money_col = c
+                    break
+        if money_col is None:
+            for c in df.columns:
+                if pd.api.types.is_numeric_dtype(df[c]):
+                    money_col = c
+                    break
+        if money_col is None:
+            notes.append(f"{sheet}: no numeric sales column found")
+            continue
+        for _, row in df.iterrows():
+            val = row[money_col]
+            if pd.isna(val):
+                continue
+            try:
+                amount = float(val)
+            except (TypeError, ValueError):
+                s = str(val).replace("$", "").replace(",", "").strip()
+                try:
+                    amount = float(s)
+                except ValueError:
+                    continue
+            if cat_col is not None:
+                raw = row[cat_col]
+                label = str(raw).lower() if pd.notna(raw) else ""
+                if any(
+                    d in label
+                    for d in (
+                        "drink",
+                        "beverage",
+                        "soda",
+                        "coffee",
+                        "tea",
+                        "juice",
+                        "water",
+                        "shake",
+                        "smoothie",
+                    )
+                ):
+                    continue
+            total += amount
+    if total == 0 and not notes:
+        return "Error: could not aggregate (no matching rows)."
+    return f"{total:.2f}"

tools/media_tools.py ADDED Viewed

	@@ -0,0 +1,97 @@

+import base64
+import os
+from pathlib import Path
+from typing import Optional
+from inference_client_factory import make_inference_client
+def transcribe_audio(
+    file_path: str,
+    *,
+    hf_token: Optional[str] = None,
+    model: Optional[str] = None,
+) -> str:
+    token = hf_token or os.environ.get("HF_TOKEN") or os.environ.get(
+        "HUGGINGFACEHUB_API_TOKEN"
+    )
+    if not token:
+        return "Error: HF_TOKEN not set for speech recognition."
+    mid = model or os.environ.get("GAIA_ASR_MODEL", "openai/whisper-large-v3")
+    client = make_inference_client(token)
+    try:
+        out = client.automatic_speech_recognition(file_path, model=mid)
+        return (out.text or "").strip()
+    except Exception as e:
+        return f"ASR error: {e}"
+def analyze_image_with_vlm(
+    file_path: str,
+    question: str,
+    *,
+    hf_token: Optional[str] = None,
+    model: Optional[str] = None,
+) -> str:
+    """Use a vision-language chat model via HF Inference (image as data URL)."""
+    token = hf_token or os.environ.get("HF_TOKEN") or os.environ.get(
+        "HUGGINGFACEHUB_API_TOKEN"
+    )
+    if not token:
+        return "Error: HF_TOKEN not set for vision."
+    mid = model or os.environ.get(
+        "GAIA_VISION_MODEL", "meta-llama/Llama-3.2-11B-Vision-Instruct"
+    )
+    path = Path(file_path)
+    if not path.is_file():
+        return f"Error: image not found: {file_path}"
+    raw = path.read_bytes()
+    b64 = base64.b64encode(raw).decode("ascii")
+    mime = "image/png" if path.suffix.lower() == ".png" else "image/jpeg"
+    data_url = f"data:{mime};base64,{b64}"
+    client = make_inference_client(token)
+    messages = [
+        {
+            "role": "user",
+            "content": [
+                {"type": "text", "text": question},
+                {"type": "image_url", "image_url": {"url": data_url}},
+            ],
+        }
+    ]
+    try:
+        comp = client.chat_completion(
+            messages=messages,
+            model=mid,
+            max_tokens=512,
+            temperature=0.2,
+        )
+        msg = comp.choices[0].message
+        return (msg.content or "").strip()
+    except Exception as e:
+        return f"Vision error: {e}"
+def visual_question_short(
+    file_path: str,
+    question: str,
+    *,
+    hf_token: Optional[str] = None,
+    model: Optional[str] = None,
+) -> str:
+    """Fallback VQA task (shorter answers)."""
+    token = hf_token or os.environ.get("HF_TOKEN") or os.environ.get(
+        "HUGGINGFACEHUB_API_TOKEN"
+    )
+    if not token:
+        return "Error: HF_TOKEN not set for VQA."
+    mid = model or "Salesforce/blip-vqa-base"
+    client = make_inference_client(token)
+    try:
+        answers = client.visual_question_answering(
+            image=file_path, question=question, model=mid, top_k=5
+        )
+        lines = [f"{a.answer} ({a.score:.3f})" for a in answers]
+        return "\n".join(lines)
+    except Exception as e:
+        return f"VQA error: {e}"

tools/registry.py ADDED Viewed

	@@ -0,0 +1,227 @@

+"""OpenAI-style tool schemas and dispatch for Hugging Face chat_completion."""
+from __future__ import annotations
+import json
+from typing import Any, Callable, Optional
+from tools.code_tools import (
+    run_python_file,
+    run_python_snippet,
+    solve_cayley_noncommutative_subset,
+    reverse_english_puzzle_answer,
+)
+from tools.excel_tools import excel_food_sales_total_usd
+from tools.media_tools import analyze_image_with_vlm, transcribe_audio, visual_question_short
+from tools.web_tools import fetch_url, web_search, youtube_transcript
+from tools.wiki_tools import wikipedia_search, wikipedia_summary
+TOOL_DEFINITIONS: list[dict[str, Any]] = [
+    {
+        "type": "function",
+        "function": {
+            "name": "web_search",
+            "description": "Search the web (DuckDuckGo). Use for facts, papers, news, sports, cast lists.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string"},
+                    "max_results": {"type": "integer"},
+                },
+                "required": ["query"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "fetch_url",
+            "description": "Download a public web page and return extracted text (truncated).",
+            "parameters": {
+                "type": "object",
+                "properties": {"url": {"type": "string"}},
+                "required": ["url"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "wikipedia_search",
+            "description": "Search English Wikipedia article titles.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string"},
+                    "results": {"type": "integer"},
+                },
+                "required": ["query"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "wikipedia_summary",
+            "description": "Get English Wikipedia summary text for a specific article title.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "title": {"type": "string"},
+                    "sentences": {"type": "integer"},
+                },
+                "required": ["title"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "run_python_snippet",
+            "description": (
+                "Run a short Python snippet in a subprocess (stdlib only; no network). "
+                "Use for parsing, counting, or arithmetic on data you already have."
+            ),
+            "parameters": {
+                "type": "object",
+                "properties": {"code": {"type": "string"}},
+                "required": ["code"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "youtube_transcript",
+            "description": "Fetch YouTube captions/transcript when available.",
+            "parameters": {
+                "type": "object",
+                "properties": {"video_url": {"type": "string"}},
+                "required": ["video_url"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "transcribe_audio",
+            "description": "Transcribe a local audio file (.mp3, etc.) using HF Whisper inference.",
+            "parameters": {
+                "type": "object",
+                "properties": {"file_path": {"type": "string"}},
+                "required": ["file_path"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "analyze_image",
+            "description": (
+                "Ask a vision-language model about a local image file (PNG/JPEG). "
+                "Use for diagrams, screenshots, chess positions, etc."
+            ),
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "file_path": {"type": "string"},
+                    "question": {"type": "string"},
+                },
+                "required": ["file_path", "question"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "visual_question_short",
+            "description": "Lightweight VQA (short label-style answers). Fallback if analyze_image fails.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "file_path": {"type": "string"},
+                    "question": {"type": "string"},
+                },
+                "required": ["file_path", "question"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "excel_food_sales_total_usd",
+            "description": (
+                "Sum food sales in USD from the attached fast-food style spreadsheet, excluding drinks."
+            ),
+            "parameters": {
+                "type": "object",
+                "properties": {"file_path": {"type": "string"}},
+                "required": ["file_path"],
+            },
+        },
+    },
+]
+def dispatch_tool(
+    name: str,
+    arguments_json: str,
+    *,
+    hf_token: Optional[str] = None,
+) -> str:
+    try:
+        args = json.loads(arguments_json) if arguments_json.strip() else {}
+    except json.JSONDecodeError as e:
+        return f"Invalid JSON arguments: {e}"
+    try:
+        if name == "web_search":
+            return web_search(
+                args["query"], max_results=int(args.get("max_results", 8))
+            )
+        if name == "fetch_url":
+            return fetch_url(args["url"])
+        if name == "wikipedia_search":
+            return wikipedia_search(
+                args["query"], results=int(args.get("results", 5))
+            )
+        if name == "wikipedia_summary":
+            return wikipedia_summary(
+                args["title"], sentences=int(args.get("sentences", 16))
+            )
+        if name == "run_python_snippet":
+            return run_python_snippet(args["code"])
+        if name == "youtube_transcript":
+            return youtube_transcript(args["video_url"])
+        if name == "transcribe_audio":
+            return transcribe_audio(args["file_path"], hf_token=hf_token)
+        if name == "analyze_image":
+            return analyze_image_with_vlm(
+                args["file_path"], args["question"], hf_token=hf_token
+            )
+        if name == "visual_question_short":
+            return visual_question_short(
+                args["file_path"], args["question"], hf_token=hf_token
+            )
+        if name == "excel_food_sales_total_usd":
+            return excel_food_sales_total_usd(args["file_path"])
+        return f"Unknown tool: {name}"
+    except Exception as e:
+        return f"Tool error ({name}): {e}"
+def deterministic_attempt(question: str, attachment_path: Optional[str]) -> Optional[str]:
+    """Return an answer without LLM when we can solve reliably."""
+    r = reverse_english_puzzle_answer(question)
+    if r is not None:
+        return r
+    c = solve_cayley_noncommutative_subset(question)
+    if c is not None:
+        return c
+    if attachment_path:
+        low = attachment_path.lower()
+        if low.endswith(".py"):
+            return run_python_file(attachment_path).strip()
+        if low.endswith(".xlsx"):
+            return excel_food_sales_total_usd(attachment_path)
+    return None

tools/web_tools.py ADDED Viewed

	@@ -0,0 +1,109 @@

+import re
+from html import unescape
+from typing import Optional
+from urllib.parse import urlparse
+import requests
+from bs4 import BeautifulSoup
+try:
+    from duckduckgo_search import DDGS
+except ImportError:
+    DDGS = None  # type: ignore
+DEFAULT_UA = (
+    "Mozilla/5.0 (compatible; GAIA-Agent/1.0; +https://huggingface.co/spaces)"
+)
+MAX_FETCH_BYTES = 1_500_000
+def web_search(query: str, max_results: int = 8) -> str:
+    """Return short snippets and URLs from DuckDuckGo text search."""
+    if not query.strip():
+        return "Error: empty query."
+    if DDGS is None:
+        return "Error: duckduckgo_search is not installed."
+    lines: list[str] = []
+    try:
+        with DDGS() as ddgs:
+            for i, r in enumerate(ddgs.text(query, max_results=max_results)):
+                title = r.get("title") or ""
+                body = r.get("body") or ""
+                href = r.get("href") or ""
+                lines.append(f"{i + 1}. {title}\n   {body[:400]}\n   URL: {href}")
+    except Exception as e:
+        return f"Search error: {e}"
+    if not lines:
+        return "No results."
+    return "\n\n".join(lines)
+def _visible_text(html: str) -> str:
+    soup = BeautifulSoup(html, "lxml")
+    for tag in soup(["script", "style", "noscript"]):
+        tag.decompose()
+    text = soup.get_text(separator="\n")
+    text = unescape(text)
+    text = re.sub(r"\n{3,}", "\n\n", text)
+    return text.strip()
+def fetch_url(url: str, max_chars: int = 25_000) -> str:
+    """Fetch a URL and return extracted plain text (truncated)."""
+    if not url.strip():
+        return "Error: empty URL."
+    parsed = urlparse(url)
+    if parsed.scheme not in ("http", "https"):
+        return "Error: only http(s) URLs are allowed."
+    try:
+        r = requests.get(
+            url,
+            timeout=45,
+            headers={"User-Agent": DEFAULT_UA},
+            stream=True,
+        )
+        r.raise_for_status()
+        chunks: list[bytes] = []
+        total = 0
+        for chunk in r.iter_content(chunk_size=65536):
+            if not chunk:
+                continue
+            chunks.append(chunk)
+            total += len(chunk)
+            if total >= MAX_FETCH_BYTES:
+                break
+        raw = b"".join(chunks)
+        ctype = r.headers.get("Content-Type", "").lower()
+        if "pdf" in ctype or url.lower().endswith(".pdf"):
+            return (
+                "Error: PDF binary not parsed here. "
+                "Search for an HTML abstract page or use web_search instead."
+            )
+        text = raw.decode("utf-8", errors="replace")
+        plain = _visible_text(text) if "<html" in text.lower() else text
+        plain = plain[:max_chars]
+        return plain if plain.strip() else "(empty body after parse)"
+    except Exception as e:
+        return f"Fetch error: {e}"
+def youtube_transcript(video_url: str) -> str:
+    """Return transcript text when the video exposes captions (unofficial API)."""
+    try:
+        from youtube_transcript_api import YouTubeTranscriptApi
+    except ImportError:
+        return "Error: youtube_transcript_api not installed."
+    m = re.search(
+        r"(?:youtube\.com/watch\?v=|youtu\.be/)([a-zA-Z0-9_-]{6,})",
+        video_url,
+    )
+    if not m:
+        return "Error: could not parse YouTube video id from URL."
+    vid = m.group(1)
+    try:
+        transcript = YouTubeTranscriptApi.get_transcript(vid)
+    except Exception as e:
+        return f"No transcript available: {e}"
+    lines = [entry.get("text", "") for entry in transcript]
+    return "\n".join(lines)[:50_000]

tools/wiki_tools.py ADDED Viewed

	@@ -0,0 +1,32 @@

+import wikipedia
+def wikipedia_search(query: str, results: int = 5) -> str:
+    """Search English Wikipedia titles."""
+    if not query.strip():
+        return "Error: empty query."
+    wikipedia.set_lang("en")
+    try:
+        titles = wikipedia.search(query, results=results)
+    except Exception as e:
+        return f"Wikipedia search error: {e}"
+    if not titles:
+        return "No titles found."
+    return "\n".join(f"- {t}" for t in titles)
+def wikipedia_summary(title: str, sentences: int = 12) -> str:
+    """Fetch a Wikipedia page summary by title (English)."""
+    if not title.strip():
+        return "Error: empty title."
+    wikipedia.set_lang("en")
+    try:
+        page = wikipedia.page(title, auto_suggest=True)
+        summary = wikipedia.summary(title, sentences=sentences, auto_suggest=True)
+        out = f"Title: {page.title}\nURL: {page.url}\n\n{summary}"
+        return out[:40_000]
+    except wikipedia.DisambiguationError as e:
+        opts = ", ".join(e.options[:8])
+        return f"Disambiguation; try one of: {opts}"
+    except Exception as e:
+        return f"Wikipedia error: {e}"