Spaces:

agents-course
/

Final_Assignment_Template

Running

App Files Files Community

401

Upload 7 files

#346

by W01fAI - opened Mar 22

base: refs/heads/main

←

from: refs/pr/346

Discussion Files changed

+586

-107

Files changed (7) hide show

README.md +49 -9
agent.py +151 -0
answer_normalize.py +44 -0
app.py +189 -96
inference_client_factory.py +31 -0
requirements.txt +11 -2
run_local_eval.py +111 -0

README.md CHANGED Viewed

@@ -1,15 +1,55 @@
 ---
-title: Template Final Assignment
-emoji: 🕵🏻‍♂️
-colorFrom: indigo
-colorTo: indigo
 sdk: gradio
-sdk_version: 5.25.2
 app_file: app.py
 pinned: false
-hf_oauth: true
-# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
-hf_oauth_expiration_minutes: 480
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: GAIA Unit 4 Agent
+emoji: 🧭
+colorFrom: gray
+colorTo: blue
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
 pinned: false
+license: apache-2.0
 ---
+# GAIA Unit 4 — Hugging Face Agents Course (final assignment)
+This folder is a **drop-in replacement** for the course Space
+[`agents-course/Final_Assignment_Template`](https://huggingface.co/spaces/agents-course/Final_Assignment_Template).
+## One-time: create your Space
+1. On Hugging Face, **Duplicate** the template Space above (or create a new Gradio Space and copy these files into the repo root).
+2. In the Space **Settings → Repository secrets**, add:
+   - **`HF_TOKEN`**: a Hugging Face access token with **read** permission (for Inference API / serverless models).
+3. Optional **Variables** (or secrets) to tune models:
+   - `HF_INFERENCE_PROVIDER` — **omit by default** so the client uses **`auto`**: the first [inference provider](https://hf.co/settings/inference-providers) that supports your **chosen model** on the Hub. Do **not** set `hf-inference` unless that model lists it — many chat models (e.g. Qwen2.5-7B-Instruct) only support **together** / **featherless-ai**, and forcing `hf-inference` yields **404**. If the auto order hits a provider that returns **401** (e.g. Novita), reorder providers in HF settings or pin e.g. `HF_INFERENCE_PROVIDER=together`.
+   - `GAIA_TEXT_MODEL` — default `Qwen/Qwen2.5-7B-Instruct` (broad provider mapping via Together).
+   - `GAIA_ASR_MODEL` — default `openai/whisper-large-v3`
+   - `GAIA_VISION_MODEL` — default `meta-llama/Llama-3.2-11B-Vision-Instruct`
+   - `GAIA_API_URL` — default `https://agents-course-unit4-scoring.hf.space`
+   - `GAIA_USE_CACHE` — `1` (default) or `0` to disable `gaia_answers_cache.json`
+Keep the Space **public** so `agent_code` (`…/tree/main`) verifies for the leaderboard.
+## Local dry-run (no submission)
+```bash
+cd gaia_unit4_space
+python -m venv .venv && source .venv/bin/activate
+pip install -r requirements.txt
+export HF_TOKEN=hf_...
+python run_local_eval.py
+```
+This fetches `/questions`, runs the agent, prints answers, and writes `local_eval_answers.json`. It does **not** call `/submit`.
+## What was fixed vs the stock template
+- Downloads attachments when `file_name` is set (`GET /files/{task_id}`).
+- Tool-using agent (web, Wikipedia, Python, Excel, ASR, vision, YouTube transcripts).
+- Deterministic shortcuts for the reversed-English puzzle, Cayley-table commutativity, `.py` stdout, and `.xlsx` food-sales heuristic.
+- Optional **Crypto** tab (BTC/USD demo only; not used for GAIA).
+## Leaderboard
+Submit scores via the Gradio app after logging in. Student leaderboard:
+[`agents-course/Students_leaderboard`](https://huggingface.co/spaces/agents-course/Students_leaderboard).

agent.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""GAIA Unit 4 agent: tool-calling loop via Hugging Face Inference API."""
+from __future__ import annotations
+import os
+from typing import Any, Optional
+from huggingface_hub import InferenceClient
+from answer_normalize import normalize_answer
+from inference_client_factory import inference_client_kwargs
+from tools.registry import TOOL_DEFINITIONS, deterministic_attempt, dispatch_tool
+SYSTEM_PROMPT = """You solve GAIA benchmark questions for the Hugging Face Agents Course.
+Hard rules:
+- Call tools as needed (search, Wikipedia, fetch URL, Python, audio, image, Excel).
+- Your final assistant message must contain ONLY the answer text required by the question — no labels like "FINAL ANSWER", no markdown fences, no extra sentences.
+- Match the question's format exactly (comma-separated, alphabetical order, IOC codes, algebraic notation, two-decimal USD, first name only, etc.).
+- When a local attachment path is given, use the appropriate tool with that exact path.
+- For English Wikipedia tasks, use wikipedia_* tools; cross-check with web_search if needed.
+- For YouTube URLs in the question, try youtube_transcript first.
+"""
+class GaiaAgent:
+    def __init__(
+        self,
+        *,
+        hf_token: Optional[str] = None,
+        text_model: Optional[str] = None,
+        max_iterations: int = 14,
+    ):
+        self.hf_token = (
+            hf_token
+            or os.environ.get("HF_TOKEN")
+            or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+        )
+        self.text_model = text_model or os.environ.get(
+            "GAIA_TEXT_MODEL", "Qwen/Qwen2.5-7B-Instruct"
+        )
+        self.max_iterations = max_iterations
+        self._client: Optional[InferenceClient] = None
+    def _get_client(self) -> InferenceClient:
+        if self._client is None:
+            if not self.hf_token:
+                raise RuntimeError(
+                    "HF_TOKEN or HUGGINGFACEHUB_API_TOKEN is required for GaiaAgent."
+                )
+            kw = inference_client_kwargs(self.hf_token)
+            self._client = InferenceClient(**kw)
+        return self._client
+    def __call__(
+        self,
+        question: str,
+        attachment_path: Optional[str] = None,
+        task_id: Optional[str] = None,
+    ) -> str:
+        det = deterministic_attempt(question, attachment_path)
+        if det is not None:
+            return normalize_answer(det)
+        if not self.hf_token:
+            return normalize_answer(
+                "Error: missing HF_TOKEN; cannot run LLM tools for this question."
+            )
+        user_text = _build_user_payload(question, attachment_path, task_id)
+        messages: list[dict[str, Any]] = [
+            {"role": "system", "content": SYSTEM_PROMPT},
+            {"role": "user", "content": user_text},
+        ]
+        client = self._get_client()
+        last_text = ""
+        for _ in range(self.max_iterations):
+            try:
+                completion = client.chat_completion(
+                    messages=messages,
+                    model=self.text_model,
+                    tools=TOOL_DEFINITIONS,
+                    tool_choice="auto",
+                    max_tokens=1024,
+                    temperature=0.15,
+                )
+            except Exception as e:
+                last_text = f"Inference error: {e}"
+                break
+            choice = completion.choices[0]
+            msg = choice.message
+            last_text = (msg.content or "").strip()
+            if msg.tool_calls:
+                messages.append(
+                    {
+                        "role": "assistant",
+                        "content": msg.content if msg.content else None,
+                        "tool_calls": [
+                            {
+                                "id": tc.id,
+                                "type": "function",
+                                "function": {
+                                    "name": tc.function.name,
+                                    "arguments": tc.function.arguments,
+                                },
+                            }
+                            for tc in msg.tool_calls
+                        ],
+                    }
+                )
+                for tc in msg.tool_calls:
+                    name = tc.function.name
+                    args = tc.function.arguments or "{}"
+                    result = dispatch_tool(name, args, hf_token=self.hf_token)
+                    messages.append(
+                        {
+                            "role": "tool",
+                            "tool_call_id": tc.id,
+                            "content": result[:24_000],
+                        }
+                    )
+                continue
+            if last_text:
+                break
+            if choice.finish_reason == "length":
+                last_text = "Error: model hit max length without an answer."
+                break
+        return normalize_answer(last_text or "Error: empty response.")
+def _build_user_payload(
+    question: str,
+    attachment_path: Optional[str],
+    task_id: Optional[str],
+) -> str:
+    parts = []
+    if task_id:
+        parts.append(f"task_id: {task_id}")
+    parts.append(f"Question:\n{question.strip()}")
+    if attachment_path:
+        parts.append(f"\nAttachment path (use with tools): {attachment_path}")
+    else:
+        parts.append("\nNo attachment.")
+    return "\n".join(parts)

answer_normalize.py ADDED Viewed

	@@ -0,0 +1,44 @@

+"""Post-process model output for GAIA exact-match submission."""
+import re
+from typing import Any, Union
+_FINAL_ANSWER_RE = re.compile(
+    r"^\s*(?:FINAL\s*ANSWER\s*[:：]?\s*)",
+    re.IGNORECASE,
+)
+def normalize_answer(raw: Union[str, int, float, None]) -> Union[str, int, float]:
+    """
+    Strip wrappers and forbidden prefixes. Prefer returning a string for API compatibility.
+    """
+    if raw is None:
+        return ""
+    if isinstance(raw, (int, float)) and not isinstance(raw, bool):
+        return raw
+    text = str(raw).strip()
+    if not text:
+        return ""
+    text = _FINAL_ANSWER_RE.sub("", text, count=1).strip()
+    # Strip common wrappers (single line)
+    for prefix in ("The answer is", "Answer:", "ANSWER:", "```", "`"):
+        if text.lower().startswith(prefix.lower()):
+            text = text[len(prefix) :].strip()
+    if text.startswith('"') and text.endswith('"') and len(text) >= 2:
+        text = text[1:-1].strip()
+    if text.startswith("```"):
+        text = re.sub(r"^```\w*\s*", "", text)
+        text = re.sub(r"\s*```$", "", text).strip()
+    return text.strip()
+def maybe_numeric(text: str) -> Union[str, int, float]:
+    """If the prompt expects a plain number, allow int/float submission."""
+    t = text.strip()
+    if re.fullmatch(r"-?\d+", t):
+        return int(t)
+    if re.fullmatch(r"-?\d+\.\d+", t):
+        return float(t)
+    return text

app.py CHANGED Viewed

@@ -1,105 +1,181 @@
 import os
 import gradio as gr
-import requests
-import inspect
 import pandas as pd
-# (Keep Constants as is)
-# --- Constants ---
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
-# --- Basic Agent Definition ---
-# ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
-class BasicAgent:
-    def __init__(self):
-        print("BasicAgent initialized.")
-    def __call__(self, question: str) -> str:
-        print(f"Agent received question (first 50 chars): {question[:50]}...")
-        fixed_answer = "This is a default answer."
-        print(f"Agent returning fixed answer: {fixed_answer}")
-        return fixed_answer
-def run_and_submit_all( profile: gr.OAuthProfile | None):
-    """
-    Fetches all questions, runs the BasicAgent on them, submits all answers,
-    and displays the results.
-    """
-    # --- Determine HF Space Runtime URL and Repo URL ---
-    space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
     if profile:
-        username= f"{profile.username}"
         print(f"User logged in: {username}")
     else:
         print("User not logged in.")
         return "Please Login to Hugging Face with the button.", None
-    api_url = DEFAULT_API_URL
     questions_url = f"{api_url}/questions"
     submit_url = f"{api_url}/submit"
-    # 1. Instantiate Agent ( modify this part to create your agent)
     try:
-        agent = BasicAgent()
     except Exception as e:
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
-    # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
     agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
     print(agent_code)
-    # 2. Fetch Questions
     print(f"Fetching questions from: {questions_url}")
     try:
-        response = requests.get(questions_url, timeout=15)
         response.raise_for_status()
         questions_data = response.json()
         if not questions_data:
-             print("Fetched questions list is empty.")
-             return "Fetched questions list is empty or invalid format.", None
         print(f"Fetched {len(questions_data)} questions.")
     except requests.exceptions.RequestException as e:
-        print(f"Error fetching questions: {e}")
         return f"Error fetching questions: {e}", None
-    except requests.exceptions.JSONDecodeError as e:
-         print(f"Error decoding JSON response from questions endpoint: {e}")
-         print(f"Response text: {response.text[:500]}")
-         return f"Error decoding server response for questions: {e}", None
-    except Exception as e:
-        print(f"An unexpected error occurred fetching questions: {e}")
-        return f"An unexpected error occurred fetching questions: {e}", None
-    # 3. Run your Agent
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
     for item in questions_data:
         task_id = item.get("task_id")
         question_text = item.get("question")
         if not task_id or question_text is None:
             print(f"Skipping item with missing task_id or question: {item}")
             continue
-        try:
-            submitted_answer = agent(question_text)
-            answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
-            results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
-        except Exception as e:
-             print(f"Error running agent on task {task_id}: {e}")
-             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
     if not answers_payload:
-        print("Agent did not produce any answers to submit.")
         return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
-    # 4. Prepare Submission
-    submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
-    status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
     print(status_update)
-    # 5. Submit
     print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
     try:
-        response = requests.post(submit_url, json=submission_data, timeout=60)
         response.raise_for_status()
         result_data = response.json()
         final_status = (
@@ -117,80 +193,97 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
         try:
             error_json = e.response.json()
             error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
-        except requests.exceptions.JSONDecodeError:
             error_detail += f" Response: {e.response.text[:500]}"
         status_message = f"Submission Failed: {error_detail}"
         print(status_message)
-        results_df = pd.DataFrame(results_log)
-        return status_message, results_df
     except requests.exceptions.Timeout:
         status_message = "Submission Failed: The request timed out."
         print(status_message)
-        results_df = pd.DataFrame(results_log)
-        return status_message, results_df
     except requests.exceptions.RequestException as e:
         status_message = f"Submission Failed: Network error - {e}"
         print(status_message)
-        results_df = pd.DataFrame(results_log)
-        return status_message, results_df
     except Exception as e:
         status_message = f"An unexpected error occurred during submission: {e}"
         print(status_message)
-        results_df = pd.DataFrame(results_log)
-        return status_message, results_df
-# --- Build Gradio Interface using Blocks ---
 with gr.Blocks() as demo:
-    gr.Markdown("# Basic Agent Evaluation Runner")
     gr.Markdown(
         """
-        **Instructions:**
-        1.  Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
-        2.  Log in to your Hugging Face account using the button below. This uses your HF username for submission.
-        3.  Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
-        ---
-        **Disclaimers:**
-        Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
-        This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
         """
     )
     gr.LoginButton()
-    run_button = gr.Button("Run Evaluation & Submit All Answers")
-    status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
-    # Removed max_rows=10 from DataFrame constructor
-    results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
-    run_button.click(
-        fn=run_and_submit_all,
-        outputs=[status_output, results_table]
-    )
 if __name__ == "__main__":
-    print("\n" + "-"*30 + " App Starting " + "-"*30)
-    # Check for SPACE_HOST and SPACE_ID at startup for information
     space_host_startup = os.getenv("SPACE_HOST")
-    space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
     if space_host_startup:
-        print(f"✅ SPACE_HOST found: {space_host_startup}")
-        print(f"   Runtime URL should be: https://{space_host_startup}.hf.space")
     else:
-        print("ℹ️  SPACE_HOST environment variable not found (running locally?).")
-    if space_id_startup: # Print repo URLs if SPACE_ID is found
-        print(f"✅ SPACE_ID found: {space_id_startup}")
-        print(f"   Repo URL: https://huggingface.co/spaces/{space_id_startup}")
-        print(f"   Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
     else:
-        print("ℹ️  SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
-    print("-"*(60 + len(" App Starting ")) + "\n")
-    print("Launching Gradio Interface for Basic Agent Evaluation...")
-    demo.launch(debug=True, share=False)

+import json
 import os
+import tempfile
+from pathlib import Path
 import gradio as gr
 import pandas as pd
+import requests
+from agent import GaiaAgent
+from answer_normalize import normalize_answer
 DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+CACHE_FILENAME = "gaia_answers_cache.json"
+def _cache_path() -> Path:
+    return Path(__file__).resolve().parent / CACHE_FILENAME
+def _load_cache() -> dict:
+    p = _cache_path()
+    if not p.is_file():
+        return {}
+    try:
+        return json.loads(p.read_text(encoding="utf-8"))
+    except json.JSONDecodeError:
+        return {}
+def _save_cache(cache: dict) -> None:
+    _cache_path().write_text(json.dumps(cache, indent=2), encoding="utf-8")
+def _download_attachment(api_url: str, task_id: str, file_name: str) -> str | None:
+    """Save task attachment to a temp file; return path or None."""
+    if not file_name or not str(file_name).strip():
+        return None
+    url = f"{api_url}/files/{task_id}"
+    try:
+        r = requests.get(url, timeout=120)
+    except requests.RequestException:
+        return None
+    if r.status_code != 200:
+        return None
+    ctype = (r.headers.get("Content-Type") or "").lower()
+    if "application/json" in ctype:
+        try:
+            data = r.json()
+            if isinstance(data, dict) and data.get("detail"):
+                return None
+        except json.JSONDecodeError:
+            pass
+    suffix = Path(file_name).suffix or ""
+    fd, path = tempfile.mkstemp(suffix=suffix, prefix=f"gaia_{task_id[:8]}_")
+    try:
+        with os.fdopen(fd, "wb") as f:
+            f.write(r.content)
+    except OSError:
+        return None
+    return path
+def run_and_submit_all(profile: gr.OAuthProfile | None):
+    space_id = os.getenv("SPACE_ID")
+    use_cache = os.getenv("GAIA_USE_CACHE", "1").lower() in ("1", "true", "yes")
     if profile:
+        username = f"{profile.username}"
         print(f"User logged in: {username}")
     else:
         print("User not logged in.")
         return "Please Login to Hugging Face with the button.", None
+    api_url = os.getenv("GAIA_API_URL", DEFAULT_API_URL)
     questions_url = f"{api_url}/questions"
     submit_url = f"{api_url}/submit"
     try:
+        agent = GaiaAgent()
     except Exception as e:
         print(f"Error instantiating agent: {e}")
         return f"Error initializing agent: {e}", None
     agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
     print(agent_code)
     print(f"Fetching questions from: {questions_url}")
     try:
+        response = requests.get(questions_url, timeout=60)
         response.raise_for_status()
         questions_data = response.json()
         if not questions_data:
+            return "Fetched questions list is empty or invalid format.", None
         print(f"Fetched {len(questions_data)} questions.")
     except requests.exceptions.RequestException as e:
         return f"Error fetching questions: {e}", None
+    except json.JSONDecodeError as e:
+        return f"Error decoding server response for questions: {e}", None
+    cache = _load_cache() if use_cache else {}
     results_log = []
     answers_payload = []
     print(f"Running agent on {len(questions_data)} questions...")
     for item in questions_data:
         task_id = item.get("task_id")
         question_text = item.get("question")
+        file_name = item.get("file_name") or ""
         if not task_id or question_text is None:
             print(f"Skipping item with missing task_id or question: {item}")
             continue
+        cache_key = str(task_id)
+        if use_cache and cache_key in cache:
+            submitted_answer = normalize_answer(cache[cache_key])
+            print(f"Cache hit for {task_id}")
+        else:
+            local_path: str | None = None
+            try:
+                if file_name and str(file_name).strip():
+                    local_path = _download_attachment(api_url, str(task_id), str(file_name))
+                    if local_path:
+                        print(f"Downloaded attachment for {task_id} -> {local_path}")
+                submitted_answer = agent(
+                    str(question_text),
+                    attachment_path=local_path,
+                    task_id=str(task_id),
+                )
+                submitted_answer = normalize_answer(submitted_answer)
+                if use_cache:
+                    cache[cache_key] = (
+                        submitted_answer
+                        if isinstance(submitted_answer, str)
+                        else str(submitted_answer)
+                    )
+                    _save_cache(cache)
+            except Exception as e:
+                print(f"Error running agent on task {task_id}: {e}")
+                submitted_answer = f"AGENT ERROR: {e}"
+            finally:
+                if local_path and Path(local_path).is_file():
+                    try:
+                        Path(local_path).unlink(missing_ok=True)
+                    except OSError:
+                        pass
+        answers_payload.append(
+            {
+                "task_id": task_id,
+                "submitted_answer": submitted_answer,
+            }
+        )
+        results_log.append(
+            {
+                "Task ID": task_id,
+                "Question": question_text,
+                "Submitted Answer": submitted_answer,
+            }
+        )
     if not answers_payload:
         return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
+    submission_data = {
+        "username": username.strip(),
+        "agent_code": agent_code,
+        "answers": answers_payload,
+    }
+    status_update = (
+        f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
+    )
     print(status_update)
     print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
     try:
+        response = requests.post(submit_url, json=submission_data, timeout=600)
         response.raise_for_status()
         result_data = response.json()
         final_status = (
         try:
             error_json = e.response.json()
             error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
+        except json.JSONDecodeError:
             error_detail += f" Response: {e.response.text[:500]}"
         status_message = f"Submission Failed: {error_detail}"
         print(status_message)
+        return status_message, pd.DataFrame(results_log)
     except requests.exceptions.Timeout:
         status_message = "Submission Failed: The request timed out."
         print(status_message)
+        return status_message, pd.DataFrame(results_log)
     except requests.exceptions.RequestException as e:
         status_message = f"Submission Failed: Network error - {e}"
         print(status_message)
+        return status_message, pd.DataFrame(results_log)
     except Exception as e:
         status_message = f"An unexpected error occurred during submission: {e}"
         print(status_message)
+        return status_message, pd.DataFrame(results_log)
+def crypto_btc_price() -> str:
+    """Optional demo: live BTC/USD (not used for GAIA scoring)."""
+    try:
+        r = requests.get(
+            "https://api.coingecko.com/api/v3/simple/price",
+            params={"ids": "bitcoin", "vs_currencies": "usd"},
+            timeout=20,
+        )
+        r.raise_for_status()
+        data = r.json()
+        usd = data.get("bitcoin", {}).get("usd")
+        return f"Bitcoin (BTC) ~ ${usd:,.2f} USD (CoinGecko public API)."
+    except Exception as e:
+        return f"Could not fetch price: {e}"
 with gr.Blocks() as demo:
+    gr.Markdown("# GAIA Unit 4 — Agent Evaluation Runner")
     gr.Markdown(
         """
+**Instructions**
+1. Duplicate this Space from the course template (or push this repo) and set **Secrets**: `HF_TOKEN` (read access to Inference).
+2. Optional env vars: `GAIA_TEXT_MODEL`, `GAIA_ASR_MODEL`, `GAIA_VISION_MODEL`, `GAIA_API_URL`, `GAIA_USE_CACHE` (default `1`).
+3. Log in with Hugging Face below (username is used for the leaderboard).
+4. Run **Evaluate & Submit** to answer all questions and post scores.
+Attachment tasks download `GET /files/{task_id}` automatically when `file_name` is set.
+---
+**Crypto demo (optional):** unrelated to GAIA; quick BTC spot check.
         """
     )
     gr.LoginButton()
+    with gr.Tab("GAIA evaluation"):
+        run_button = gr.Button("Run Evaluation & Submit All Answers")
+        status_output = gr.Textbox(
+            label="Run Status / Submission Result", lines=6, interactive=False
+        )
+        results_table = gr.DataFrame(
+            label="Questions and Agent Answers", wrap=True
+        )
+        run_button.click(
+            fn=run_and_submit_all,
+            outputs=[status_output, results_table],
+        )
+    with gr.Tab("Crypto intelligence (demo)"):
+        gr.Markdown(
+            "This tab does not affect GAIA scores. It demonstrates a simple public market data fetch."
+        )
+        cp_btn = gr.Button("Fetch BTC / USD")
+        cp_out = gr.Textbox(label="Output", interactive=False)
+        cp_btn.click(fn=crypto_btc_price, outputs=cp_out)
 if __name__ == "__main__":
+    print("\n" + "-" * 30 + " App Starting " + "-" * 30)
     space_host_startup = os.getenv("SPACE_HOST")
+    space_id_startup = os.getenv("SPACE_ID")
     if space_host_startup:
+        print(f"SPACE_HOST found: {space_host_startup}")
     else:
+        print("SPACE_HOST not set (local run?).")
+    if space_id_startup:
+        print(f"SPACE_ID found: {space_id_startup}")
+        print(f"Repo tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
     else:
+        print("SPACE_ID not set (local run?).")
+    print("-" * 62 + "\n")
+    demo.launch(debug=True, share=False)

inference_client_factory.py ADDED Viewed

	@@ -0,0 +1,31 @@

+"""Build InferenceClient with a provider that accepts the user's HF token."""
+from __future__ import annotations
+import os
+from huggingface_hub import InferenceClient
+def inference_client_kwargs(token: str) -> dict:
+    """
+    Default: **no** ``provider`` → the library uses ``auto``: first provider for this
+    model per your https://hf.co/settings/inference-providers order.
+    Forcing ``hf-inference`` breaks many chat models (e.g. Qwen2.5-7B-Instruct is only on
+    together / featherless-ai — the router then returns **404** for …/hf-inference/models/…).
+    Set ``HF_INFERENCE_PROVIDER`` to pin one provider (e.g. ``together``, ``sambanova``)
+    or ``auto`` explicitly. Use ``hf-inference`` only for models that actually list it.
+    """
+    raw = os.environ.get("HF_INFERENCE_PROVIDER")
+    if raw is None:
+        return {"token": token}
+    r = raw.strip().lower()
+    if r in ("", "auto"):
+        return {"token": token}
+    return {"token": token, "provider": r}
+def make_inference_client(token: str) -> InferenceClient:
+    return InferenceClient(**inference_client_kwargs(token))

requirements.txt CHANGED Viewed

@@ -1,2 +1,11 @@
-gradio
-requests

+gradio>=4.44.0
+requests>=2.31.0
+pandas>=2.0.0
+openpyxl>=3.1.0
+beautifulsoup4>=4.12.0
+lxml>=5.0.0
+duckduckgo-search>=6.0.0
+wikipedia>=1.4.0
+huggingface_hub>=0.26.0
+youtube-transcript-api>=0.6.0
+Pillow>=10.0.0

run_local_eval.py ADDED Viewed

	@@ -0,0 +1,111 @@

+#!/usr/bin/env python3
+"""Fetch GAIA course questions, run GaiaAgent, save JSON — does not submit."""
+from __future__ import annotations
+import argparse
+import json
+import os
+import sys
+import tempfile
+from pathlib import Path
+import requests
+ROOT = Path(__file__).resolve().parent
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+from agent import GaiaAgent  # noqa: E402
+from answer_normalize import normalize_answer  # noqa: E402
+DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
+def download_file(api_url: str, task_id: str, file_name: str) -> str | None:
+    if not file_name or not str(file_name).strip():
+        return None
+    url = f"{api_url}/files/{task_id}"
+    r = requests.get(url, timeout=120)
+    if r.status_code != 200:
+        return None
+    ctype = (r.headers.get("Content-Type") or "").lower()
+    if "application/json" in ctype:
+        try:
+            data = r.json()
+            if isinstance(data, dict) and data.get("detail"):
+                return None
+        except json.JSONDecodeError:
+            pass
+    suffix = Path(file_name).suffix or ""
+    fd, path = tempfile.mkstemp(suffix=suffix, prefix=f"gaia_{task_id[:8]}_")
+    with os.fdopen(fd, "wb") as f:
+        f.write(r.content)
+    return path
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--api-url",
+        default=os.environ.get("GAIA_API_URL", DEFAULT_API_URL),
+    )
+    parser.add_argument(
+        "-o",
+        "--output",
+        default=str(ROOT / "local_eval_answers.json"),
+        help="Write answers JSON here",
+    )
+    args = parser.parse_args()
+    q_url = f"{args.api_url.rstrip('/')}/questions"
+    print(f"GET {q_url}")
+    r = requests.get(q_url, timeout=60)
+    r.raise_for_status()
+    items = r.json()
+    print(f"{len(items)} questions")
+    token = os.environ.get("HF_TOKEN") or os.environ.get("HUGGINGFACEHUB_API_TOKEN")
+    agent = GaiaAgent(hf_token=token) if token else None
+    out: list[dict] = []
+    for item in items:
+        tid = item.get("task_id")
+        q = item.get("question")
+        fn = item.get("file_name") or ""
+        if not tid or q is None:
+            continue
+        local = None
+        try:
+            if fn and str(fn).strip():
+                local = download_file(args.api_url, str(tid), str(fn))
+            if agent is not None:
+                ans = agent(str(q), attachment_path=local, task_id=str(tid))
+            else:
+                from tools.registry import deterministic_attempt
+                d = deterministic_attempt(str(q), local)
+                ans = d if d is not None else "NO_HF_TOKEN"
+        finally:
+            if local and Path(local).is_file():
+                Path(local).unlink(missing_ok=True)
+        if isinstance(ans, (int, float)) and not isinstance(ans, bool):
+            sub = ans
+        else:
+            sub = normalize_answer(ans)
+        out.append(
+            {
+                "task_id": tid,
+                "question": q,
+                "submitted_answer": sub,
+            }
+        )
+        print(f"--- {tid[:8]}… -> {out[-1]['submitted_answer']!r}")
+    Path(args.output).write_text(json.dumps(out, indent=2), encoding="utf-8")
+    print(f"Wrote {args.output}")
+if __name__ == "__main__":
+    main()