sajjadzeak commited on
Commit
4844294
·
verified ·
1 Parent(s): fe9889e

Upload 10 files

Browse files
Files changed (10) hide show
  1. README.md +113 -0
  2. agent-architecture.mermaid +33 -0
  3. app.py +443 -0
  4. debug_agent.py +57 -0
  5. helpers.py +77 -0
  6. prompts.yaml +84 -0
  7. pyproject.toml +50 -0
  8. requirements-dev.txt +5 -0
  9. requirements.txt +26 -0
  10. tools.py +289 -0
README.md ADDED
@@ -0,0 +1,113 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: GAIA Agent (Final Assignment of HF Agents Course)
3
+ emoji: 🕵🏻‍♂️
4
+ colorFrom: indigo
5
+ colorTo: indigo
6
+ sdk: gradio
7
+ sdk_version: 5.33.0
8
+ app_file: app.py
9
+ pinned: false
10
+ hf_oauth: true
11
+ # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
+ hf_oauth_expiration_minutes: 480
13
+ ---
14
+
15
+
16
+ # GAIA AI Agent via LangGraph
17
+
18
+ This repository contains a **LangGraph‑powered** agent that scores over 30% on the GAIA Level‑1 benchmark *without any RAG leaks*.
19
+ It routes questions, invokes the right tool, and returns an exact‑match string for the grader.
20
+
21
+ ## 📜 What is GAIA?
22
+
23
+ **GAIA = _“General AI Assistants”_** – a multi-domain benchmark introduced in the paper [GAIA: A Benchmark for General AI Assistants](https://arxiv.org/abs/2311.12983).
24
+ The public leaderboard is hosted on Hugging Face:
25
+ <https://huggingface.co/spaces/gaia-benchmark/leaderboard>
26
+
27
+ ---
28
+
29
+ ## ✨ Key features
30
+
31
+ | Capability | Implementation |
32
+ |------------|---------------|
33
+ | Multi‑step routing | LangGraph state machine (`route_question → invoke_tools → synthesize_response → format_output`) |
34
+ | Web & Wiki search | Tavily ➜ DuckDuckGo fallback |
35
+ | YouTube | `youtube_transcript_api` ➜ generate captions |
36
+ | Spreadsheets | `analyze_excel_file` (*pandas* one‑liner generator) |
37
+ | Attached code | Safe `subprocess` sandbox via `run_py` |
38
+ | Audio | OpenAI‑Whisper |
39
+ | Vision | VLM (GPT-4o-mini)|
40
+
41
+ ---
42
+
43
+ ## 📂 Repository guide
44
+
45
+ | File | Purpose |
46
+ |------|---------|
47
+ | `app.py` | Gradio UI, API submission, LangGraph workflow |
48
+ | `tools.py` | All custom LangChain tools (search, Excel, Whisper, *etc*.) |
49
+ | `prompts.yaml` | LLM prompts |
50
+ | `helpers.py` | Tiny utilities (debug prints *etc*.) |
51
+ | `debug_agent.py` | Run agent on a single GAIA question from CLI |
52
+ | `requirements.txt` | Runtime deps |
53
+ | `requirements-dev.txt` | Dev / lint deps |
54
+
55
+ ---
56
+
57
+ ## 🚀 Quick start
58
+
59
+ # clone repo / space
60
+ pip install -r requirements.txt # Python ≥ 3.11
61
+ python app.py # launches local Gradio UI
62
+
63
+ Run **one** task from CLI (handy while tuning prompts):
64
+
65
+ python debug_agent.py <GAIA_task_id>
66
+
67
+ ### Environment variables
68
+
69
+ | Var | Used for | Example |
70
+ |-----|----------|---------|
71
+ | `OPENAI_API_KEY` | Router & answer LLM (OpenAI) | `sk‑…` |
72
+ | `TAVILY_API_KEY` | Higher‑quality web search (optional) | `tvly_…` |
73
+
74
+ *(Agent falls back to DuckDuckGo if `TAVILY_API_KEY` is absent.)*
75
+
76
+ ---
77
+
78
+ ## Agent Routing & Tool-Execution Flow
79
+
80
+
81
+ ![GAIA Agent Routing & Tool-Execution Flow](agent_routing.png)
82
+
83
+ - **route_question** routes to one of eight labels.
84
+ - **invoke_tools** invokes the matching tool and stores context.
85
+ - **synthesize_response** calls the answer LLM unless the answer was computed.
86
+ - **format_output** normalizes output for GAIA’s exact‑match scorer.
87
+
88
+
89
+ ## 📝 Prompt snippet
90
+
91
+ All LLM prompts are available in `prompts.yaml`:
92
+
93
+ ## 🛠️ Dev helpers
94
+
95
+ 1️⃣ Create the virtual environment and activate it.
96
+
97
+ ```
98
+ uv venv --python 3.11
99
+ source ./.venv/bin/activate
100
+ ```
101
+
102
+ 2️⃣ Install Python dependencies:
103
+
104
+ ```
105
+ uv pip install -r requirements.txt
106
+ uv pip install -r requirements-dev.txt
107
+ ```
108
+
109
+ 3️⃣ [Optional] Install Git hooks for code quality checks :
110
+
111
+ ```
112
+ pre-commit install
113
+ ```
agent-architecture.mermaid ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ graph TD
2
+ Start([User Question]) --> RouteQuestion[📋 Route Question]
3
+ RouteQuestion --> ExecuteTools[🔧 Execute Tools]
4
+ ExecuteTools --> CheckAttachment{Has Attachment?}
5
+ CheckAttachment -->|Yes| AttachmentType{Attachment Type?}
6
+ CheckAttachment -->|No| CheckLabel{Label Type?}
7
+
8
+ AttachmentType -->|Python Code| RunPy[🐍 run_py]
9
+ AttachmentType -->|Excel/CSV| AnalyzeExcel[📊 analyze_excel_file]
10
+ AttachmentType -->|Audio| TranscribeAudio[🎵 transcribe_via_whisper]
11
+ AttachmentType -->|Image| VisionTask[👁️ vision_task]
12
+
13
+ CheckLabel -->|math| Calculator[🧮 calculator]
14
+ CheckLabel -->|youtube| YouTubeTranscript[📹 youtube_transcript]
15
+ CheckLabel -->|search| WebSearch[🔍 web_multi_search]
16
+ CheckLabel -->|general| NoTool[💭 No specific tool]
17
+
18
+ RunPy --> SynthesizeResponse[🧠 Synthesize Response]
19
+ AnalyzeExcel --> SynthesizeResponse
20
+ TranscribeAudio --> SynthesizeResponse
21
+ VisionTask --> SynthesizeResponse
22
+ Calculator --> SynthesizeResponse
23
+ YouTubeTranscript --> SynthesizeResponse
24
+ WebSearch --> SynthesizeResponse
25
+ NoTool --> SynthesizeResponse
26
+
27
+ SynthesizeResponse --> NeedsSynthesis{Needs Additional<br/>Synthesis?}
28
+ NeedsSynthesis -->|No: code excel<br/>image math| DirectAnswer[✅ Use tool output directly<br/>Already complete]
29
+ NeedsSynthesis -->|Yes: youtube audio<br/>search general| UseSynthesisLLM[🤖 Additional LLM synthesis<br/>Combine with context]
30
+
31
+ DirectAnswer --> FormatOutput[✨ Format Output]
32
+ UseSynthesisLLM --> FormatOutput
33
+ FormatOutput --> End([Final Answer])
app.py ADDED
@@ -0,0 +1,443 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import re
3
+ from typing import Literal, TypedDict, get_args
4
+
5
+ import gradio as gr
6
+ import pandas as pd
7
+ import requests
8
+ from langchain_core.messages import HumanMessage, SystemMessage
9
+ from langchain_openai import ChatOpenAI
10
+ from langgraph.graph import END, StateGraph
11
+
12
+ from helpers import fetch_task_attachment, get_prompt, sniff_excel_type
13
+ from tools import (
14
+ analyze_excel_file,
15
+ calculator,
16
+ run_py,
17
+ transcribe_via_whisper,
18
+ vision_task,
19
+ web_multi_search,
20
+ wiki_search,
21
+ youtube_transcript,
22
+ )
23
+
24
+ # --------------------------------------------------------------------------- #
25
+ # CONFIGURATION #
26
+ # --------------------------------------------------------------------------- #
27
+ DEFAULT_API_URL: str = "https://agents-course-unit4-scoring.hf.space"
28
+ MODEL_NAME: str = "o4-mini" # "gpt-4.1-mini"
29
+ TEMPERATURE: float = 0.1
30
+
31
+ # --------------------------------------------------------------------------- #
32
+ # QUESTION CLASSIFIER #
33
+ # --------------------------------------------------------------------------- #
34
+ _LABELS = Literal[
35
+ "math",
36
+ "youtube",
37
+ "image",
38
+ "code",
39
+ "excel",
40
+ "audio",
41
+ "general",
42
+ ]
43
+
44
+
45
+ # --------------------------------------------------------------------------- #
46
+ # ------------------------------- AGENT STATE ----------------------------- #
47
+ # --------------------------------------------------------------------------- #
48
+ class AgentState(TypedDict):
49
+ question: str
50
+ label: str
51
+ context: str
52
+ answer: str
53
+ task_id: str | None = None
54
+
55
+
56
+ # --------------------------------------------------------------------------- #
57
+ # NODES (LangGraph functions) #
58
+ # --------------------------------------------------------------------------- #
59
+
60
+ _llm_router = ChatOpenAI(model=MODEL_NAME)
61
+ _llm_answer = ChatOpenAI(model=MODEL_NAME)
62
+
63
+
64
+ def route_question(state: AgentState) -> AgentState: # noqa: D401
65
+ """Label the task so we know which toolchain to invoke."""
66
+ question = state["question"]
67
+
68
+ label_values = set(get_args(_LABELS)) # -> ("math", "youtube", ...)
69
+ prompt = get_prompt(
70
+ prompt_key="router",
71
+ question=question,
72
+ labels=", ".join(repr(v) for v in label_values),
73
+ )
74
+ resp = _llm_router.invoke(prompt).content.strip().lower()
75
+ state["label"] = resp if resp in label_values else "general"
76
+ return state
77
+
78
+
79
+ def invoke_tools_context(state: AgentState) -> AgentState:
80
+ question, label, task_id = state["question"], state["label"], state["task_id"]
81
+
82
+ matched_pattern = r"https?://\S+"
83
+ matched_obj = re.search(matched_pattern, question)
84
+
85
+ # ---- attachment detection ------------------------------------------------
86
+ if task_id:
87
+ blob, ctype = fetch_task_attachment(api_url=DEFAULT_API_URL, task_id=task_id)
88
+
89
+ if any([blob, ctype]):
90
+ print(f"[DEBUG] attachment type={ctype} ")
91
+ # ── Python code ------------------------------------------------------
92
+ if "python" in ctype:
93
+ print("[DEBUG] Working with a Python attachment file")
94
+ state["answer"] = run_py.invoke({"code": blob.decode("utf-8")})
95
+ state["label"] = "code"
96
+ return state
97
+
98
+ # ── Excel / CSV ------------------------------------------------------
99
+ # 1) Header hints
100
+ header_says_sheet = any(key in ctype for key in ("excel", "sheet", "csv"))
101
+ # 2) Magic-number sniff (works when ctype is application/octet-stream)
102
+ blob_says_sheet = sniff_excel_type(blob) in {"xlsx", "xls", "csv"}
103
+
104
+ if header_says_sheet or blob_says_sheet:
105
+ if blob_says_sheet:
106
+ print(f"[DEBUG] octet-stream sniffed as {sniff_excel_type(blob)}")
107
+
108
+ print("[DEBUG] Working with a Excel/CSV attachment file")
109
+ state["answer"] = analyze_excel_file.invoke(
110
+ {"xls_bytes": blob, "question": question}
111
+ )
112
+ state["label"] = "excel"
113
+ return state
114
+
115
+ # ── Audio --------------------------------------------------------
116
+ if "audio" in ctype:
117
+ print("[DEBUG] Working with an audio attachment file")
118
+ state["context"] = transcribe_via_whisper.invoke({"audio_bytes": blob})
119
+ state["label"] = "audio"
120
+ return state
121
+
122
+ # ── Image --------------------------------------------------------
123
+ if "image" in ctype:
124
+ print("[DEBUG] Working with an image attachment file")
125
+ state["answer"] = vision_task.invoke(
126
+ {"img_bytes": blob, "question": question}
127
+ )
128
+ state["label"] = "image"
129
+ return state
130
+
131
+ if label == "math":
132
+ print("[TOOL] calculator")
133
+ expr = re.sub(r"\s+", "", question)
134
+ state["answer"] = calculator.invoke({"expression": expr})
135
+ elif label == "youtube" and matched_obj:
136
+ print("[TOOL] youtube_transcript")
137
+ if matched_obj:
138
+ url = matched_obj[0]
139
+ state["context"] = youtube_transcript.invoke({"url": url})
140
+ elif label == "search":
141
+ print("[TOOL] web search")
142
+ search_json = web_multi_search.invoke({"query": question})
143
+ wiki_text = wiki_search.invoke({"query": question})
144
+ state["context"] = f"{search_json}\n\n{wiki_text}"
145
+ else:
146
+ print("[TOOL] reasoning only (no search)")
147
+ state["context"] = ""
148
+ return state
149
+
150
+
151
+ def synthesize_response(state: AgentState) -> AgentState:
152
+ # Skip LLM for deterministic labels or tasks that already used LLMs
153
+ if state["label"] in {"code", "excel", "image", "math"}:
154
+ print(f"[DEBUG] ANSWER ({state['label']}) >>> {state['answer']}")
155
+ return state
156
+
157
+ prompt = [
158
+ SystemMessage(content=get_prompt("final_llm_system")),
159
+ HumanMessage(
160
+ content=get_prompt(
161
+ prompt_key="final_llm_user",
162
+ question=state["question"],
163
+ context=state["context"],
164
+ )
165
+ ),
166
+ ]
167
+ raw = _llm_answer.invoke(prompt).content.strip()
168
+ state["answer"] = raw
169
+ return state
170
+
171
+
172
+ def format_output(state: AgentState) -> AgentState:
173
+ txt = re.sub(r"^(final answer:?\s*)", "", state["answer"], flags=re.I).strip()
174
+
175
+ # If question demands a single token (first name / one word), enforce it
176
+ if any(kw in state["question"].lower() for kw in ["first name", "single word"]):
177
+ txt = txt.split(" ")[0]
178
+
179
+ state["answer"] = txt.rstrip(".")
180
+ return state
181
+
182
+
183
+ # --------------------------------------------------------------------------- #
184
+ # BUILD THE GRAPH #
185
+ # --------------------------------------------------------------------------- #
186
+ def build_graph() -> StateGraph:
187
+ g = StateGraph(AgentState)
188
+ g.set_entry_point("route_question")
189
+
190
+ g.add_node("route_question", route_question)
191
+ g.add_node("invoke_tools", invoke_tools_context)
192
+ g.add_node("synthesize_response", synthesize_response)
193
+ g.add_node("format_output", format_output)
194
+
195
+ g.add_edge("route_question", "invoke_tools")
196
+ g.add_edge("invoke_tools", "synthesize_response")
197
+ g.add_edge("synthesize_response", "format_output")
198
+ g.add_edge("format_output", END)
199
+
200
+ return g.compile()
201
+
202
+
203
+ # --------------------------------------------------------------------------- #
204
+ # ------------------------------- GAIA AGENT ------------------------------ #
205
+ # --------------------------------------------------------------------------- #
206
+ class GAIAAgent:
207
+ """Callable wrapper used by run_and_submit_all."""
208
+
209
+ def __init__(self) -> None:
210
+ self.graph = build_graph()
211
+
212
+ def __call__(self, question: str, task_id: str | None = None) -> str:
213
+ state: AgentState = {
214
+ "question": question,
215
+ "label": "general",
216
+ "context": "",
217
+ "answer": "",
218
+ "task_id": task_id,
219
+ }
220
+ final = self.graph.invoke(state)
221
+
222
+ # ── Debug trace ───────────────────────────────────────────────
223
+ route = final["label"]
224
+ llm_used = route != "math" # math path skips the generation LLM
225
+ print(f"[DEBUG] route='{route}' | LLM_used={llm_used}")
226
+ # ─────────────────────────────────────────────────────────────
227
+
228
+ return final["answer"]
229
+
230
+
231
+ def run_and_submit_all(
232
+ profile: gr.OAuthProfile | None,
233
+ ) -> tuple[str, pd.DataFrame | None]:
234
+ """
235
+ Fetches all questions, runs the BasicAgent on them, submits all answers,
236
+ and displays the results.
237
+ """
238
+ # --- Determine HF Space Runtime URL and Repo URL ---
239
+ space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
240
+
241
+ if profile:
242
+ username = f"{profile.username}"
243
+ print(f"User logged in: {username}")
244
+ else:
245
+ print("User not logged in.")
246
+ return "Please Login to Hugging Face with the button.", None
247
+
248
+ api_url = DEFAULT_API_URL
249
+ questions_url = f"{api_url}/questions"
250
+ submit_url = f"{api_url}/submit"
251
+
252
+ # 1. Instantiate Agent ( modify this part to create your agent)
253
+ try:
254
+ agent = GAIAAgent()
255
+ print("GAIA Agent initialized successfully")
256
+ except Exception as e:
257
+ print(f"Error instantiating agent: {e}")
258
+ return f"Error initializing agent: {e}", None
259
+ # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
260
+ agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
261
+ print(agent_code)
262
+
263
+ # 2. Fetch Questions
264
+ print(f"Fetching questions from: {questions_url}")
265
+ try:
266
+ response = requests.get(questions_url, timeout=15)
267
+ response.raise_for_status()
268
+ questions_data = response.json()
269
+ if not questions_data:
270
+ print("Fetched questions list is empty.")
271
+ return "Fetched questions list is empty or invalid format.", None
272
+ print(f"Fetched {len(questions_data)} questions.")
273
+ except requests.exceptions.RequestException as e:
274
+ print(f"Error fetching questions: {e}")
275
+ return f"Error fetching questions: {e}", None
276
+ except requests.exceptions.JSONDecodeError as e:
277
+ print(f"Error decoding JSON response from questions endpoint: {e}")
278
+ print(f"Response text: {response.text[:500]}")
279
+ return f"Error decoding server response for questions: {e}", None
280
+ except Exception as e:
281
+ print(f"An unexpected error occurred fetching questions: {e}")
282
+ return f"An unexpected error occurred fetching questions: {e}", None
283
+
284
+ # 3. Run your Agent
285
+ results_log = []
286
+ answers_payload = []
287
+ print(f"Running agent on {len(questions_data)} questions...")
288
+ for item in questions_data:
289
+ task_id = item.get("task_id")
290
+ question_text = item.get("question")
291
+ if not task_id or question_text is None:
292
+ print(f"Skipping item with missing task_id or question: {item}")
293
+ continue
294
+ try:
295
+ submitted_answer = agent(question=question_text, task_id=task_id)
296
+ answers_payload.append(
297
+ {"task_id": task_id, "submitted_answer": submitted_answer}
298
+ )
299
+ results_log.append(
300
+ {
301
+ "Task ID": task_id,
302
+ "Question": question_text,
303
+ "Submitted Answer": submitted_answer,
304
+ }
305
+ )
306
+ except Exception as e:
307
+ print(f"Error running agent on task {task_id}: {e}")
308
+ results_log.append(
309
+ {
310
+ "Task ID": task_id,
311
+ "Question": question_text,
312
+ "Submitted Answer": f"AGENT ERROR: {e}",
313
+ }
314
+ )
315
+
316
+ if not answers_payload:
317
+ print("Agent did not produce any answers to submit.")
318
+ return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
319
+
320
+ # 4. Prepare Submission
321
+ submission_data = {
322
+ "username": username.strip(),
323
+ "agent_code": agent_code,
324
+ "answers": answers_payload,
325
+ }
326
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
327
+ print(status_update)
328
+
329
+ # 5. Submit
330
+ print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
331
+ try:
332
+ response = requests.post(submit_url, json=submission_data, timeout=60)
333
+ response.raise_for_status()
334
+ result_data = response.json()
335
+ final_status = (
336
+ f"Submission Successful!\n"
337
+ f"User: {result_data.get('username')}\n"
338
+ f"Overall Score: {result_data.get('score', 'N/A')}% "
339
+ f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
340
+ f"Message: {result_data.get('message', 'No message received.')}"
341
+ )
342
+ print("Submission successful.")
343
+ results_df = pd.DataFrame(results_log)
344
+ return final_status, results_df
345
+ except requests.exceptions.HTTPError as e:
346
+ error_detail = f"Server responded with status {e.response.status_code}."
347
+ try:
348
+ error_json = e.response.json()
349
+ error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
350
+ except requests.exceptions.JSONDecodeError:
351
+ error_detail += f" Response: {e.response.text[:500]}"
352
+ status_message = f"Submission Failed: {error_detail}"
353
+ print(status_message)
354
+ results_df = pd.DataFrame(results_log)
355
+ return status_message, results_df
356
+ except requests.exceptions.Timeout:
357
+ status_message = "Submission Failed: The request timed out."
358
+ print(status_message)
359
+ results_df = pd.DataFrame(results_log)
360
+ return status_message, results_df
361
+ except requests.exceptions.RequestException as e:
362
+ status_message = f"Submission Failed: Network error - {e}"
363
+ print(status_message)
364
+ results_df = pd.DataFrame(results_log)
365
+ return status_message, results_df
366
+ except Exception as e:
367
+ status_message = f"An unexpected error occurred during submission: {e}"
368
+ print(status_message)
369
+ results_df = pd.DataFrame(results_log)
370
+ return status_message, results_df
371
+
372
+
373
+ # --- Build Gradio Interface using Blocks ---
374
+ with gr.Blocks() as demo:
375
+ gr.Markdown("# Basic Agent Evaluation Runner")
376
+ gr.Markdown(
377
+ """
378
+ **Instructions:**
379
+
380
+ 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
381
+ 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
382
+ 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
383
+
384
+ ---
385
+ **Disclaimers:**
386
+ Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
387
+ This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
388
+ """
389
+ )
390
+
391
+ gr.LoginButton()
392
+
393
+ run_button = gr.Button("Run Evaluation & Submit All Answers")
394
+
395
+ status_output = gr.Textbox(
396
+ label="Run Status / Submission Result", lines=5, interactive=False
397
+ )
398
+ # Removed max_rows=10 from DataFrame constructor
399
+ results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
400
+
401
+ run_button.click(fn=run_and_submit_all, outputs=[status_output, results_table])
402
+
403
+
404
+ if __name__ == "__main__":
405
+ print("\n" + "-" * 30 + " App Starting " + "-" * 30)
406
+ # Check for SPACE_HOST and SPACE_ID at startup for information
407
+ space_host_startup = os.getenv("SPACE_HOST")
408
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
409
+
410
+ if space_host_startup:
411
+ print(f"✅ SPACE_HOST found: {space_host_startup}")
412
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
413
+ else:
414
+ print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
415
+
416
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
417
+ print(f"✅ SPACE_ID found: {space_id_startup}")
418
+ print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
419
+ print(
420
+ f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main"
421
+ )
422
+ else:
423
+ print(
424
+ "ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined."
425
+ )
426
+
427
+ print("-" * (60 + len(" App Starting ")) + "\n")
428
+
429
+ print("Launching Gradio Interface for Basic Agent Evaluation...")
430
+ demo.launch(debug=True, share=False)
431
+
432
+
433
+ ## For Local testing
434
+ # if __name__ == "__main__":
435
+ # agent = GAIAAgent()
436
+ # while True:
437
+ # try:
438
+ # q = input("\nEnter question (or blank to quit): ")
439
+ # except KeyboardInterrupt:
440
+ # break
441
+ # if not q.strip():
442
+ # break
443
+ # print("Answer:", agent(q))
debug_agent.py ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import textwrap
3
+ from typing import Any
4
+
5
+ import requests
6
+
7
+ from app import DEFAULT_API_URL, GAIAAgent
8
+
9
+
10
+ def fetch_question_row(task_id: str, api: str = DEFAULT_API_URL) -> dict[str, Any]:
11
+ """Return the question dict associated with *task_id* (raises if not found)."""
12
+ resp = requests.get(f"{api}/questions", timeout=15)
13
+ resp.raise_for_status()
14
+ for row in resp.json():
15
+ if row["task_id"] == task_id:
16
+ return row
17
+ raise ValueError(f"task_id '{task_id}' not present in /questions.")
18
+
19
+
20
+ def run_one(task_id: str | None, question: str | None) -> None:
21
+ agent = GAIAAgent()
22
+
23
+ if task_id:
24
+ row = fetch_question_row(task_id)
25
+ question = row["question"]
26
+ print(f"\n{row}\n") # show full row incl. metadata
27
+
28
+ # --- show pretty question
29
+ print("=" * 90)
30
+ print(f"QUESTION ({task_id or 'adhoc'})")
31
+ print(textwrap.fill(question or "", width=90))
32
+ print("=" * 90)
33
+
34
+ assert question is not None, "Internal error: question was None"
35
+ answer = agent(question, task_id=task_id)
36
+ print(f"\nFINAL ANSWER --> {answer}")
37
+
38
+
39
+ def parse_args() -> argparse.Namespace:
40
+ parser = argparse.ArgumentParser(description="Run one GAIAAgent query locally.")
41
+ parser.add_argument("--task_id", help="GAIA task_id to fetch & run")
42
+ parser.add_argument("question", nargs="?", help="Ad-hoc question text (positional)")
43
+
44
+ ns = parser.parse_args()
45
+
46
+ # mutual-exclusion checks
47
+ if ns.task_id and ns.question:
48
+ parser.error("Provide either --task_id OR a question, not both.")
49
+ if ns.task_id is None and ns.question is None:
50
+ parser.error("You must supply a GAIA --task_id or a question.")
51
+
52
+ return ns
53
+
54
+
55
+ if __name__ == "__main__":
56
+ args = parse_args()
57
+ run_one(task_id=args.task_id, question=args.question)
helpers.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import csv
2
+ from io import BytesIO
3
+ from pathlib import Path
4
+ from sys import stderr
5
+ from traceback import print_exception
6
+ from zipfile import BadZipFile, ZipFile
7
+
8
+ import requests
9
+ from yaml import safe_load
10
+
11
+ CURRENT_DIR = Path(__file__).parent
12
+
13
+ _PROMPTS = safe_load(CURRENT_DIR.joinpath("prompts.yaml").read_text())
14
+
15
+
16
+ def fetch_task_attachment(api_url: str, task_id: str) -> tuple[bytes, str]:
17
+ """
18
+ Returns (file_bytes, content_type) or (b'', '') if no attachment found.
19
+ Follows any redirect the endpoint issues.
20
+ """
21
+ url = f"{api_url}/files/{task_id}"
22
+ try:
23
+ r = requests.get(url, timeout=15, allow_redirects=True)
24
+ except requests.RequestException as e:
25
+ print(f"[DEBUG] GET {url} failed → {e}")
26
+ return b"", ""
27
+ if r.status_code != 200:
28
+ print(f"[DEBUG] GET {url} → {r.status_code}")
29
+ return b"", ""
30
+ return r.content, r.headers.get("content-type", "").lower()
31
+
32
+
33
+ def sniff_excel_type(blob: bytes) -> str:
34
+ """
35
+ Return one of 'xlsx', 'xls', 'csv', or '' (unknown) given raw bytes.
36
+ """
37
+ # 1️⃣ XLSX / XLSM / ODS (ZIP container)
38
+ if blob[:4] == b"PK\x03\x04":
39
+ try:
40
+ with ZipFile(BytesIO(blob)) as zf:
41
+ names = set(zf.namelist())
42
+ if {"xl/workbook.xml", "[Content_Types].xml"} & names:
43
+ return "xlsx"
44
+ except BadZipFile:
45
+ pass # fall through
46
+
47
+ # 2️⃣ Legacy XLS (OLE Compound File)
48
+ if blob[:8] == b"\xd0\xcf\x11\xe0\xa1\xb1\x1a\xe1":
49
+ return "xls"
50
+
51
+ # 3️⃣ Text-like -> CSV/TSV
52
+ try:
53
+ sample = blob[:1024].decode("utf-8", "ignore")
54
+ first_line = sample.splitlines()[0]
55
+ if any(sep in first_line for sep in (",", ";", "\t")):
56
+ # Confirm via csv.Sniffer to avoid random text
57
+ csv.Sniffer().sniff(sample)
58
+ return "csv"
59
+ except (UnicodeDecodeError, csv.Error):
60
+ pass
61
+
62
+ return ""
63
+
64
+
65
+ def get_prompt(prompt_key: str, **kwargs: str) -> str:
66
+ """Get a prompt by key and fill in placeholders via `.format(**kwargs)`"""
67
+ return _PROMPTS[prompt_key].format(**kwargs)
68
+
69
+
70
+ def print_debug_trace(err: Exception, label: str = "") -> None:
71
+ """
72
+ Print the full stack trace of `err` to STDERR so it shows up in HF logs.
73
+ """
74
+ banner = f"[TRACE {label}]" if label else "[TRACE]"
75
+ print(banner, file=stderr)
76
+ print_exception(type(err), err, err.__traceback__, file=stderr)
77
+ print("-" * 60, file=stderr)
prompts.yaml ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ router: |
2
+ You are a *routing* assistant.
3
+ Your ONLY job is to print **one** of the allowed labels - nothing else.
4
+
5
+ Allowed labels
6
+ ==============
7
+ {labels}
8
+
9
+ Guidelines
10
+ ----------
11
+ - **math**: the question is a pure arithmetic/numeric expression.
12
+ - **youtube**: the question contains a YouTube URL and asks about its content.
13
+ - **code**: the task references attached Python code; caller wants its output.
14
+ - **excel**: the task references an attached .xlsx/.xls/.csv and asks for a sum, average, etc.
15
+ - **audio**: the task references an attached audio file and asks for its transcript or facts in it.
16
+ - **image**: the task could be either generic like "what is in the picture (e.g. Which animal is shown?) or could be a puzzle like asking for a *move, count, coordinate,* or other board-game tactic that needs an exact piece layout (e.g. "What is Black's winning move?").
17
+ - **search** : needs external factual information from the web
18
+ - **reason** : answer can be produced by analyzing the question text alone
19
+
20
+ Examples
21
+ ----------
22
+ (search) What is the last name of the person who founded Mercedes Benz company?
23
+ (reasoning) what is the third item of following list that is a fruit after sorting it alphabetically: ['parsley', 'orange', 'apple', 'coriander', 'lettuce', 'kiwi', 'apricot']" Answer is 'kiwi'
24
+
25
+ ~~~
26
+ User question:
27
+ {question}
28
+ ~~~
29
+
30
+ IMPORTANT: Respond with **one label exactly**, no punctuation, no explanation.
31
+
32
+ final_llm_system: |
33
+ You are a precise research assistant.
34
+ Return ONLY the literal answer - no preamble.
35
+
36
+ Formatting rules
37
+ 1. If the question asks for a *first name*, output the first given name only.
38
+ 2. If the answer is purely numeric, output digits only (no commas, units, words) as a string.
39
+ 3. Otherwise capitalize the first character of your answer **unless** doing so would change the original spelling of text you are quoting verbatim
40
+
41
+ Examples
42
+ Q: Which planet is fourth from the Sun?
43
+ A: Mars <-- capitalized
44
+
45
+ Q: What Unix command lists files?
46
+ A: ls <-- lower-case preserved
47
+
48
+ final_llm_user: |
49
+ Question: {question}
50
+
51
+ Context: {context}
52
+
53
+ Answer:
54
+
55
+ vision_system: |
56
+ You are a terse assistant. Respond with ONLY the answer to the user's question—no explanations, no punctuation except what the answer itself requires.
57
+ If the answer is a chess move, output it in algebraic notation.
58
+ IMPORTANT: Only respond with the final answer with no extra text.
59
+
60
+ excel_system: |
61
+ You are a **pandas one-liner generator**.
62
+
63
+ Context
64
+ -------
65
+ - A full DataFrame named `df` is already loaded.
66
+ - Only the preview below is shown for reference.
67
+ - IMPORTANT: use column names from the preview to determine which columns are needed.
68
+
69
+ Preview
70
+ -------
71
+ {preview}
72
+
73
+ Formatting rules
74
+ ----------------
75
+ 1. Result must be a plain Python scalar (use .item(), float(), int() …).
76
+ 2. If the question asks for currency / 2 decimals --> wrap in an f-string.
77
+ 3. If the question asks for a count --> wrap in int().
78
+ 4. **Return exactly one line.**
79
+ 5. DO NOT include any unit or currency in the output.
80
+ 6. **Do **NOT** wrap the expression in ``` or other markdown fences.**
81
+
82
+ Question
83
+ --------
84
+ {question}
pyproject.toml ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # pyproject.toml (trimmed to just tooling – no build backend)
2
+
3
+ [tool.ruff]
4
+ line-length = 88
5
+ target-version = "py311"
6
+ lint.select = [
7
+ "E", # pycodestyle errors
8
+ "W", # pycodestyle warnings
9
+ "F", # pyflakes
10
+ "I", # isort
11
+ "C", # flake8-comprehensions
12
+ "B", # flake8-bugbear
13
+ "UP", # pyupgrade
14
+ ]
15
+ lint.ignore = [
16
+ "E501", # line too long, handled by black
17
+ "B008", # do not perform function calls in argument defaults
18
+ "C901", # too complex
19
+ ]
20
+ fix = true
21
+
22
+ [tool.mypy]
23
+ python_version = "3.11"
24
+ disallow_any_generics = true
25
+ disallow_subclassing_any = true
26
+ disallow_untyped_calls = true
27
+ disallow_untyped_defs = true
28
+ disallow_incomplete_defs = true
29
+ check_untyped_defs = true
30
+ disallow_untyped_decorators = true
31
+ no_implicit_optional = true
32
+ warn_redundant_casts = true
33
+ warn_unused_ignores = true
34
+ warn_return_any = true
35
+ implicit_reexport = false
36
+ strict_equality = true
37
+ disable_error_code = [
38
+ "misc", # untyped decorator
39
+ "no-any-return", # allow Any returns temporarily
40
+ "operator", # calls on unknown operator types
41
+ ]
42
+ plugins = ["pydantic.mypy"]
43
+
44
+ follow_imports = "silent"
45
+ no_implicit_reexport = true
46
+
47
+ [tool.pydantic-mypy]
48
+ init_forbid_extra = true
49
+ init_typed = true
50
+ warn_required_dynamic_aliases = true
requirements-dev.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ pre-commit
2
+ ruff
3
+ mypy
4
+ detect-secrets
5
+ gradio[oauth]
requirements.txt ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ── core UI / infra
2
+ gradio
3
+ requests
4
+ pandas==2.2.3
5
+
6
+ # ── LangGraph + LangChain stack
7
+ langgraph==0.4.7
8
+ langchain_openai==0.3.18
9
+ langchain_core==0.3.61
10
+ langchain==0.3.25
11
+ langchain_community==0.3.24
12
+
13
+ # ── Retrieval helpers
14
+ duckduckgo-search==8.0.2 # for DuckDuckGo wrapper
15
+ tavily-python==0.3.3 # TavilySearchResults tool
16
+ wikipedia==1.4.0 # WikipediaLoader
17
+
18
+ # ── Media utilities
19
+ youtube-transcript-api==1.0.3 # YouTube transcripts
20
+ openpyxl==3.1.5 # Excel parsing when GAIA attaches .xlsx
21
+ Pillow>=10.2.0 # image handling for transformers
22
+ openai-whisper==20240930
23
+
24
+ # ── Lightweight vision model
25
+ transformers>=4.41.2
26
+ torch>=2.3.0 # auto-installs CPU wheels on HF Spaces
tools.py ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import ast
2
+ import json
3
+ import operator
4
+ import re
5
+ import subprocess
6
+ from base64 import b64encode
7
+ from functools import lru_cache
8
+ from io import BytesIO
9
+ from tempfile import NamedTemporaryFile
10
+
11
+ import numpy as np
12
+ import pandas as pd
13
+ from langchain_community.document_loaders import WikipediaLoader
14
+ from langchain_community.tools.tavily_search import TavilySearchResults
15
+ from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
16
+ from langchain_core.messages import HumanMessage, SystemMessage
17
+ from langchain_core.tools import tool
18
+ from langchain_openai import ChatOpenAI
19
+ from youtube_transcript_api import YouTubeTranscriptApi
20
+
21
+ from helpers import get_prompt, print_debug_trace
22
+
23
+ # --------------------------------------------------------------------------- #
24
+ # ARITHMETIC (SAFE CALCULATOR) #
25
+ # --------------------------------------------------------------------------- #
26
+ _ALLOWED_AST_OPS = {
27
+ ast.Add: operator.add,
28
+ ast.Sub: operator.sub,
29
+ ast.Mult: operator.mul,
30
+ ast.Div: operator.truediv,
31
+ ast.Pow: operator.pow,
32
+ ast.USub: operator.neg,
33
+ }
34
+
35
+
36
+ def _safe_eval(node: ast.AST) -> float | int | complex:
37
+ """Recursively evaluate a *restricted* AST expression tree."""
38
+ if isinstance(node, ast.Constant):
39
+ return node.n
40
+ if isinstance(node, ast.UnaryOp) and type(node.op) in _ALLOWED_AST_OPS:
41
+ return _ALLOWED_AST_OPS[type(node.op)](_safe_eval(node.operand))
42
+ if isinstance(node, ast.BinOp) and type(node.op) in _ALLOWED_AST_OPS:
43
+ return _ALLOWED_AST_OPS[type(node.op)](
44
+ _safe_eval(node.left), _safe_eval(node.right)
45
+ )
46
+ raise ValueError("Unsafe or unsupported expression")
47
+
48
+
49
+ @tool
50
+ def calculator(expression: str) -> str:
51
+ """Safely evaluate basic arithmetic expressions (no variables, functions)."""
52
+ try:
53
+ tree = ast.parse(expression, mode="eval")
54
+ value = _safe_eval(tree.body)
55
+ return str(value)
56
+ except Exception as exc:
57
+ print_debug_trace(exc, "Calculator")
58
+ return f"calc_error:{exc}"
59
+
60
+
61
+ # --------------------------------------------------------------------------- #
62
+ # WEB & WIKI SEARCH #
63
+ # --------------------------------------------------------------------------- #
64
+ @lru_cache(maxsize=256)
65
+ def _ddg_search(query: str, k: int = 6) -> list[dict[str, str]]:
66
+ """Cached DuckDuckGo JSON search."""
67
+ wrapper = DuckDuckGoSearchAPIWrapper(max_results=k)
68
+ hits = wrapper.results(query)
69
+ return [
70
+ {
71
+ "title": hit.get("title", "")[:500],
72
+ "snippet": hit.get("snippet", "")[:750],
73
+ "link": hit.get("link", "")[:300],
74
+ }
75
+ for hit in hits[:k]
76
+ ]
77
+
78
+
79
+ @tool
80
+ def web_multi_search(query: str, k: int = 6) -> str:
81
+ """Run DuckDuckGo → Tavily fallback search. Returns JSON list[dict]."""
82
+ try:
83
+ hits = _ddg_search(query, k)
84
+ if hits:
85
+ return json.dumps(hits, ensure_ascii=False)
86
+ except Exception: # fall through to Tavily
87
+ pass
88
+
89
+ try:
90
+ tavily_results = TavilySearchResults(
91
+ max_results=5,
92
+ # include_answer=True,
93
+ # search_depth="advanced",
94
+ )
95
+ search_result = tavily_results.invoke({"query": query})
96
+ print(
97
+ f"[TOOL] TAVILY search is triggered with following response: {search_result}"
98
+ )
99
+ formatted = [
100
+ {
101
+ "title": d.get("title", "")[:500],
102
+ "snippet": d.get("content", "")[:750],
103
+ "link": d.get("url", "")[:300],
104
+ }
105
+ for d in search_result
106
+ ]
107
+ return json.dumps(formatted, ensure_ascii=False)
108
+ except Exception as exc:
109
+ print_debug_trace(exc, "Multi Search")
110
+ return f"search_error:{exc}"
111
+
112
+
113
+ @tool
114
+ def wiki_search(query: str, max_pages: int = 2) -> str:
115
+ """Lightweight wrapper on WikipediaLoader; returns concatenated page texts."""
116
+ print(f"[TOOL] wiki_search called with query: {query}")
117
+ docs = WikipediaLoader(query=query, load_max_docs=max_pages).load()
118
+ joined = "\n\n---\n\n".join(d.page_content for d in docs)
119
+ return joined[:8_000] # simple guardrail – stay within context window
120
+
121
+
122
+ # --------------------------------------------------------------------------- #
123
+ # YOUTUBE TRANSCRIPT #
124
+ # --------------------------------------------------------------------------- #
125
+ @tool
126
+ def youtube_transcript(url: str, chars: int = 10_000) -> str:
127
+ """Fetch full YouTube transcript (first *chars* characters)."""
128
+ video_id_match = re.search(r"[?&]v=([A-Za-z0-9_\-]{11})", url)
129
+ if not video_id_match:
130
+ return "yt_error:id_not_found"
131
+ try:
132
+ transcript = YouTubeTranscriptApi.get_transcript(video_id_match.group(1))
133
+ text = " ".join(piece["text"] for piece in transcript)
134
+ return text[:chars]
135
+ except Exception as exc:
136
+ print_debug_trace(exc, "YouTube")
137
+ return f"yt_error:{exc}"
138
+
139
+
140
+ # --------------------------------------------------------------------------- #
141
+ # IMAGE DESCRIPTION #
142
+ # --------------------------------------------------------------------------- #
143
+
144
+ # Instantiate a lightweight CLIP‑based zero‑shot image classifier (runs on CPU)
145
+ ### The model 'openai/clip-vit-base-patch32' is a vision transformer (ViT) model trained as part of OpenAI’s CLIP project.
146
+ ### It performs zero-shot image classification by mapping images and labels into the same embedding space.
147
+ # _image_pipe = pipeline(
148
+ # "image-classification", model="openai/clip-vit-base-patch32", device="cpu"
149
+ # )
150
+
151
+ # @tool
152
+ # def image_describe(img_bytes: bytes, top_k: int = 3) -> str:
153
+ # """Return the top-k CLIP labels for an image supplied as raw bytes.
154
+
155
+ # typical result for a random cat photo can be:
156
+ # [
157
+ # {'label': 'tabby, tabby cat', 'score': 0.41},
158
+ # {'label': 'tiger cat', 'score': 0.24},
159
+ # {'label': 'Egyptian cat', 'score': 0.22}
160
+ # ]
161
+ # """
162
+
163
+ # try:
164
+ # labels = _image_pipe(BytesIO(img_bytes))[:top_k]
165
+ # return ", ".join(f"{d['label']} (score={d['score']:.2f})" for d in labels)
166
+ # except Exception as exc:
167
+ # return f"img_error:{exc}"
168
+
169
+
170
+ @tool
171
+ def vision_task(img_bytes: bytes, question: str) -> str:
172
+ """
173
+ Pass the user's question AND the referenced image to a multimodal LLM and
174
+ return its first line of text as the answer. No domain assumptions made.
175
+ """
176
+ vision_llm = ChatOpenAI(
177
+ model="gpt-4o-mini", # set OPENAI_API_KEY in env
178
+ temperature=0,
179
+ max_tokens=64,
180
+ )
181
+ try:
182
+ b64 = b64encode(img_bytes).decode()
183
+ messages = [
184
+ SystemMessage(content=get_prompt(prompt_key="vision_system")),
185
+ HumanMessage(
186
+ content=[
187
+ {"type": "text", "text": question.strip()},
188
+ {
189
+ "type": "image_url",
190
+ "image_url": {"url": f"data:image/png;base64,{b64}"},
191
+ },
192
+ ]
193
+ ),
194
+ ]
195
+ reply = vision_llm.invoke(messages).content.strip()
196
+ return reply
197
+ except Exception as exc:
198
+ print_debug_trace(exc, "vision")
199
+ return f"img_error:{exc}"
200
+
201
+
202
+ # --------------------------------------------------------------------------- #
203
+ # FILE UTILS #
204
+ # --------------------------------------------------------------------------- #
205
+ @tool
206
+ def run_py(code: str) -> str:
207
+ """Execute Python code in a sandboxed subprocess and return last stdout line."""
208
+ try:
209
+ with NamedTemporaryFile(delete=False, suffix=".py", mode="w") as f:
210
+ f.write(code)
211
+ path = f.name
212
+ proc = subprocess.run(
213
+ ["python", path], capture_output=True, text=True, timeout=45
214
+ )
215
+ out = proc.stdout.strip().splitlines()
216
+ return out[-1] if out else ""
217
+ except Exception as exc:
218
+ print_debug_trace(exc, "run_py")
219
+ return f"py_error:{exc}"
220
+
221
+
222
+ @tool
223
+ def transcribe_via_whisper(audio_bytes: bytes) -> str:
224
+ """Transcribe audio with Whisper (CPU)."""
225
+ with NamedTemporaryFile(suffix=".mp3", delete=False) as f:
226
+ f.write(audio_bytes)
227
+ path = f.name
228
+ try:
229
+ import whisper # openai-whisper
230
+
231
+ model = whisper.load_model("base")
232
+ output = model.transcribe(path)["text"].strip()
233
+ print(f"[DEBUG] Whisper transcript (first 200 chars): {output[:200]}")
234
+ return output
235
+ except Exception as exc:
236
+ print_debug_trace(exc, "Whisper")
237
+ return f"asr_error:{exc}"
238
+
239
+
240
+ @tool
241
+ def analyze_excel_file(xls_bytes: bytes, question: str) -> str:
242
+ "Analyze Excel or CSV file by passing the data preview to LLM and getting the Python Pandas operation to run"
243
+ llm = ChatOpenAI(model="gpt-4o-mini", temperature=0, max_tokens=64)
244
+
245
+ try:
246
+ df = pd.read_excel(BytesIO(xls_bytes))
247
+ except Exception:
248
+ df = pd.read_csv(BytesIO(xls_bytes))
249
+
250
+ for col in df.select_dtypes(include="number").columns:
251
+ df[col] = df[col].astype(float)
252
+
253
+ # Ask the LLM for a single expression
254
+ prompt = get_prompt(
255
+ prompt_key="excel_system",
256
+ question=question,
257
+ preview=df.head(5).to_dict(orient="list"),
258
+ )
259
+ expr = llm.invoke(prompt).content.strip()
260
+
261
+ # Run generated Pandas' one-line expression
262
+ try:
263
+ result = eval(expr, {"df": df, "pd": pd, "__builtins__": {}})
264
+ # Normalize scalars to string
265
+ if isinstance(result, np.generic):
266
+ result = float(result) # → plain Python float
267
+ return f"{result:.2f}" # or str(result) if no decimals needed
268
+
269
+ # DataFrame / Series → single-line string
270
+ return (
271
+ result.to_string(index=False)
272
+ if hasattr(result, "to_string")
273
+ else str(result)
274
+ )
275
+ except Exception as e:
276
+ print_debug_trace(e, "Excel")
277
+ return f"eval_error:{e}"
278
+
279
+
280
+ __all__ = [
281
+ "calculator",
282
+ "web_multi_search",
283
+ "wiki_search",
284
+ "youtube_transcript",
285
+ "vision_task",
286
+ "run_py",
287
+ "transcribe_via_whisper",
288
+ "analyze_excel_file",
289
+ ]