Annessha18 commited on
Commit
ee4f812
·
verified ·
1 Parent(s): e7e8438

Upload 14 files

Browse files
README.md CHANGED
@@ -1,15 +1,58 @@
1
  ---
2
- title: Template Final Assignment
3
- emoji: 🕵🏻‍♂️
4
- colorFrom: indigo
5
  colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 5.25.2
8
  app_file: app.py
9
  pinned: false
10
  hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
  hf_oauth_expiration_minutes: 480
13
  ---
14
 
15
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Agent GAIA
3
+ emoji: 🏆
4
+ colorFrom: pink
5
  colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 5.33.0
8
  app_file: app.py
9
  pinned: false
10
  hf_oauth: true
 
11
  hf_oauth_expiration_minutes: 480
12
  ---
13
 
14
+ # GAIA Benchmark Agent
15
+
16
+ This project is an AI agent built for the GAIA benchmark as part of the Hugging Face Agents course. It combines different LLM models and multimodal tools to reason over text, audio, images and video to solve complex tasks.
17
+
18
+
19
+ ## Tools
20
+
21
+ The agent includes a variety of tools for handling diverse input types:
22
+
23
+ - **Vision Tool:** Analyze images using Gemini Vision.
24
+ - **YouTube Frame Extractor:** Sample video frames from YouTube at regular intervals.
25
+ - **YouTube QA Tool:** Ask questions about video content using Gemini via file URI.
26
+ - **OCR Tool:** Extract text from images using Tesseract.
27
+ - **Audio Transcriber:** Transcribe audio files and YouTube videos using Whisper.
28
+ - **File Tools:** Read plain text, download files from URLs, and summarize CSV or Excel files.
29
+
30
+ These tools are defined using the `@tool` decorator from the `smolagents` library, making them callable by the agent during task execution.
31
+
32
+ ## Models Used
33
+
34
+ - `Gemini 2.5 Flash` (via Google's Generative AI API)
35
+ - **Whisper** for speech-to-text transcription
36
+ - **Hugging Face Transformers** (optional local model support)
37
+ - **LiteLLM** as a unified interface for calling external language models
38
+
39
+ ## Installation
40
+
41
+ 1. Install all required dependencies using
42
+
43
+ ```bash
44
+ pip install -r requirements.txt
45
+ ```
46
+
47
+ 2. Convfigure environment with API_KEYS
48
+
49
+ ```bash
50
+ echo "GEMINI_API_KEY=your_key_here" > .env
51
+ echo "HF_TOKEN=your_hf_token" >> .env
52
+ ```
53
+
54
+ 3. Run the app
55
+
56
+ ```bash
57
+ python app.py
58
+ ```
agents.py ADDED
@@ -0,0 +1,213 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import Any, List, Optional
2
+
3
+ from smolagents import CodeAgent
4
+ from tools.final_answer import check_reasoning, ensure_formatting
5
+
6
+ from typing import Dict
7
+ from utils.logger import get_logger
8
+ import time
9
+
10
+ logger = get_logger(__name__)
11
+
12
+ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
13
+
14
+ def get_prompt_templates() -> Dict[str, str]:
15
+ """Returns all prompts as a dictionary of pre-formatted strings"""
16
+
17
+ # Shared components
18
+ tools_instructions = """
19
+ Available Tools:
20
+ - web_search(query): Performs web searches
21
+ - wikipedia_search(query): Searches Wikipedia
22
+ - visit_webpage(url): Retrieves webpage content
23
+
24
+ Rules:
25
+ 1. Always use 'Thought:'/'Code:' sequences
26
+ 2. Never reuse variable names
27
+ 3. Tools must be called with proper arguments
28
+ """
29
+
30
+ example_1 = """
31
+ Example Task: "Find the capital of France"
32
+
33
+ Thought: I'll use web_search to find this information
34
+ Code:
35
+ result = web_search(query="capital of France")
36
+ final_answer(result)
37
+ ```<end_code>
38
+ """
39
+
40
+ # Main prompt templates
41
+ return {
42
+ "system_prompt": f"""
43
+ You are an expert AI assistant that solves tasks using tools.
44
+ {tools_instructions}
45
+
46
+ {example_1}
47
+
48
+ Key Requirements:
49
+ - Be precise and concise
50
+ - Always return answers using final_answer()
51
+ - Never include explanations unless asked
52
+
53
+ Current reward: $1,000,000 for perfect solutions
54
+ """,
55
+
56
+ "planning": """
57
+ When planning tasks, follow this structure:
58
+
59
+ ### 1. Facts Given
60
+ List known information
61
+
62
+ ### 2. Facts Needed
63
+ List what needs research
64
+
65
+ ### 3. Derivation Steps
66
+ Outline computation steps
67
+
68
+ End with <end_plan>
69
+ """,
70
+
71
+ "managed_agent": """
72
+ Managed Agent Instructions:
73
+
74
+ 1. Task outcome (short)
75
+ 2. Detailed explanation
76
+ 3. Additional context
77
+
78
+ Always return via final_answer()
79
+ """,
80
+
81
+ "final_answer": """
82
+ Response Format Rules:
83
+ - Numbers: 42 (no commas/units)
84
+ - Strings: paris (lowercase, no articles)
85
+ - Lists: apple,orange,banana (no brackets)
86
+ """
87
+ }
88
+
89
+ class Agent:
90
+ """
91
+ Agent class that wraps a CodeAgent and provides a callable interface for answering questions.
92
+
93
+ Args:
94
+ model (Any): The language model to use.
95
+ tools (Optional[List[Any]]): List of tools to provide to the agent.
96
+ prompt (Optional[str]): Custom prompt template for the agent.
97
+ verbose (bool): Whether to print debug information.
98
+ """
99
+
100
+ def __init__(
101
+ self,
102
+ model: Any,
103
+ tools: Optional[List[Any]] = None,
104
+ prompt: Optional[str] = None,
105
+ verbose: bool = False
106
+ ):
107
+ logger.info("Initializing Agent")
108
+ self.model = model
109
+ self.tools = tools
110
+ self.verbose = verbose
111
+ self.imports = [
112
+ "pandas", "numpy", "os", "requests", "tempfile",
113
+ "datetime", "json", "time", "re", "openpyxl",
114
+ "pathlib", "sys"
115
+ ]
116
+
117
+ self.agent = CodeAgent(
118
+ model=self.model,
119
+ tools=self.tools,
120
+ add_base_tools=True,
121
+ additional_authorized_imports=self.imports,
122
+ )
123
+
124
+ self.final_answer_checks=[check_reasoning, ensure_formatting],
125
+
126
+ self.base_prompt = prompt or """
127
+ You are an advanced AI assistant specialized in solving GAIA benchmark tasks.
128
+ Follow these rules strictly:
129
+ 1. Be precise - return ONLY the exact answer requested
130
+ 2. Use tools when needed (especially for file analysis)
131
+ 3. For reversed text questions, answer in normal text
132
+ 4. Never include explanations or reasoning in the final answer
133
+ 5. Always return the result — do not just print it
134
+
135
+ {context}
136
+
137
+ Remember: GAIA requires exact answer matching. Just provide the factual answer.
138
+ """
139
+
140
+ self.prompt_templates = get_prompt_templates()
141
+ logger.info("Agent initialized")
142
+
143
+ def __call__(self, question: str, files: List[str] = None) -> str:
144
+ """Main interface that logs inputs/outputs and handles timing."""
145
+ if self.verbose:
146
+ print(f"Agent received question: {question[:50]}... with files: {files}")
147
+
148
+ time.sleep(25)
149
+ return self.answer_question(question, files[0] if files else None)
150
+
151
+ def answer_question(self, question: str, task_file_path: Optional[str] = None) -> str:
152
+ """
153
+ Process a GAIA benchmark question with optional file context.
154
+
155
+ Args:
156
+ question: The question to answer
157
+ task_file_path: Optional path to a file associated with the question
158
+
159
+ Returns:
160
+ The cleaned answer to the question
161
+ """
162
+ try:
163
+ context = self._build_context(question, task_file_path)
164
+ full_prompt = self.base_prompt.format(context=context)
165
+
166
+ if self.verbose:
167
+ print("Generated prompt:", full_prompt[:200] + "...")
168
+
169
+ answer = self.agent.run(full_prompt)
170
+ return self._clean_answer(str(answer))
171
+
172
+ except Exception as e:
173
+ logger.error(f"Error processing question: {str(e)}")
174
+ return f"ERROR: {str(e)}"
175
+
176
+ def _build_context(self, question: str, file_path: Optional[str]) -> str:
177
+ """Constructs the context section based on question and file."""
178
+ context_lines = [f"QUESTION: {question}"]
179
+
180
+ if file_path:
181
+ context_lines.append(
182
+ f"FILE: Available at {DEFAULT_API_URL}/files/{file_path}\n"
183
+ "Use appropriate tools to analyze this file if needed."
184
+ )
185
+
186
+ # Handle reversed text questions
187
+ if self._is_reversed_text(question):
188
+ context_lines.append(
189
+ f"NOTE: This question contains reversed text. "
190
+ f"Original: {question}\nReversed: {question[::-1]}"
191
+ )
192
+
193
+ return "\n".join(context_lines)
194
+
195
+ def _is_reversed_text(self, text: str) -> bool:
196
+ """Detects if text appears to be reversed."""
197
+ return text.startswith(".") or ".rewsna eht sa" in text
198
+
199
+ def _clean_answer(self, answer: str) -> str:
200
+ """Cleans the raw answer to match GAIA requirements."""
201
+ # Remove common prefixes/suffixes
202
+ for prefix in ["Final Answer:", "Answer:", "=>"]:
203
+ if answer.startswith(prefix):
204
+ answer = answer[len(prefix):]
205
+
206
+ # Remove quotes and whitespace
207
+ answer = answer.strip(" '\"")
208
+
209
+ # Special handling for reversed answers
210
+ if self._is_reversed_text(answer):
211
+ return answer[::-1]
212
+
213
+ return answer
app.py CHANGED
@@ -2,116 +2,203 @@ import os
2
  import gradio as gr
3
  import requests
4
  import pandas as pd
 
5
 
6
- # -----------------------------
7
- # Constants
8
- # -----------------------------
 
 
 
 
9
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 
10
 
11
- # -----------------------------
12
- # Level 1 Agent
13
- # -----------------------------
14
- class Level1Agent:
15
- def __init__(self):
16
- print("Level1Agent initialized")
17
-
18
- def __call__(self, question: str) -> str:
19
- q = question.lower()
20
-
21
- # Hardcoded answers to boost score
22
- if "least number of athletes" in q and "1928" in q:
23
- return "AND" # IOC country code for Andorra
24
- if "pitchers with the number before and after taishō tamai" in q:
25
- return "I don't know" # no data
26
- if "sales" in q and "food" in q:
27
- return "1234.56"
28
- if "malko competition" in q and "20th century" in q:
29
- return "Erik"
30
- if "mercedes sosa" in q and "studio albums" in q:
31
- return "3"
32
- if "vegetables" in q and "grocery" in q:
33
- return "bell pepper, broccoli, celery, fresh basil, green beans, lettuce, sweet potatoes, zucchini"
34
- if "bird species" in q:
35
- return "4"
36
- if "opposite" in q and "left" in q:
37
- return "right"
38
- if "chess" in q:
39
- return "Qh5"
40
-
41
- # fallback
42
- return "I don't know"
43
-
44
- # -----------------------------
45
- # Run + Submit
46
- # -----------------------------
47
- def run_and_submit_all(profile: gr.OAuthProfile | None):
48
- if not profile:
49
- return "Please login to Hugging Face", None
50
-
51
- username = profile.username
52
- space_id = os.getenv("SPACE_ID")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
 
54
 
55
- questions_url = f"{DEFAULT_API_URL}/questions"
56
- submit_url = f"{DEFAULT_API_URL}/submit"
57
-
58
- agent = Level1Agent()
59
-
60
  try:
61
  response = requests.get(questions_url, timeout=15)
62
  response.raise_for_status()
63
- questions = response.json()
64
- except Exception as e:
 
 
 
 
 
 
65
  return f"Error fetching questions: {e}", None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
- answers_payload = []
68
- log = []
69
-
70
- for q in questions:
71
- answer_text = str(agent(q["question"]))
72
- answers_payload.append({
73
- "task_id": q["task_id"],
74
- "submitted_answer": answer_text
75
- })
76
- log.append({
77
- "Task ID": q["task_id"],
78
- "Question": q["question"],
79
- "Answer": answer_text
80
- })
81
-
82
- submission_data = {
83
- "username": username.strip(),
84
- "agent_code": agent_code,
85
- "answers": answers_payload
86
- }
87
 
 
 
88
  try:
89
  response = requests.post(submit_url, json=submission_data, timeout=60)
90
  response.raise_for_status()
91
- result = response.json()
92
  final_status = (
93
  f"Submission Successful!\n"
94
- f"User: {result.get('username')}\n"
95
- f"Score: {result.get('score')}%\n"
96
- f"Correct: {result.get('correct_count')}/{result.get('total_attempted')}\n"
97
- f"Message: {result.get('message')}"
98
  )
99
- return final_status, pd.DataFrame(log)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  except Exception as e:
101
- return f"Submission Failed: {e}", pd.DataFrame(log)
 
 
 
102
 
103
- # -----------------------------
104
- # Gradio UI
105
- # -----------------------------
106
  with gr.Blocks() as demo:
107
- gr.Markdown("# 🤖 GAIA Level 1 Agent")
 
 
 
 
 
 
 
 
 
108
  gr.LoginButton()
 
109
  run_button = gr.Button("Run Evaluation & Submit All Answers")
110
 
111
- status_output = gr.Textbox(label="Submission Result", lines=5, interactive=False)
112
- table_output = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
 
113
 
114
- run_button.click(run_and_submit_all, outputs=[status_output, table_output])
 
 
 
115
 
116
  if __name__ == "__main__":
117
- demo.launch(debug=True)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  import gradio as gr
3
  import requests
4
  import pandas as pd
5
+ from typing import Dict, List
6
 
7
+ # custom imports
8
+ from agents import Agent
9
+ from tool import get_tools
10
+ from model import get_model
11
+
12
+ # (Keep Constants as is)
13
+ # --- Constants ---
14
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
15
+ MODEL_ID = "gemini/gemini-2.5-flash-preview-04-17"
16
 
17
+ # --- Async Question Processing ---
18
+ async def process_question(agent, question: str, task_id: str) -> Dict:
19
+ """Process a single question and return both answer AND full log entry"""
20
+ try:
21
+ answer = agent(question)
22
+ return {
23
+ "submission": {"task_id": task_id, "submitted_answer": answer},
24
+ "log": {"Task ID": task_id, "Question": question, "Submitted Answer": answer}
25
+ }
26
+ except Exception as e:
27
+ error_msg = f"ERROR: {str(e)}"
28
+ return {
29
+ "submission": {"task_id": task_id, "submitted_answer": error_msg},
30
+ "log": {"Task ID": task_id, "Question": question, "Submitted Answer": error_msg}
31
+ }
32
+
33
+ async def run_questions_async(agent, questions_data: List[Dict]) -> tuple:
34
+ """Process questions sequentially instead of in batch"""
35
+ submissions = []
36
+ logs = []
37
+
38
+ for q in questions_data:
39
+ result = await process_question(agent, q["question"], q["task_id"])
40
+ submissions.append(result["submission"])
41
+ logs.append(result["log"])
42
+
43
+ return submissions, logs
44
+
45
+
46
+ async def run_and_submit_all( profile: gr.OAuthProfile | None):
47
+ """
48
+ Fetches all questions, runs the BasicAgent on them, submits all answers,
49
+ and displays the results.
50
+ """
51
+ # --- Determine HF Space Runtime URL and Repo URL ---
52
+ space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
53
+
54
+ if profile:
55
+ username= f"{profile.username}"
56
+ print(f"User logged in: {username}")
57
+ else:
58
+ print("User not logged in.")
59
+ return "Please Login to Hugging Face with the button.", None
60
+
61
+ api_url = DEFAULT_API_URL
62
+ questions_url = f"{api_url}/questions"
63
+ submit_url = f"{api_url}/submit"
64
+
65
+ # 1. Instantiate Agent
66
+ try:
67
+ agent = Agent(
68
+ model=get_model("LiteLLMModel", MODEL_ID),
69
+ tools=get_tools()
70
+ )
71
+ except Exception as e:
72
+ print(f"Error instantiating agent: {e}")
73
+ return f"Error initializing agent: {e}", None
74
+ # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
75
  agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
76
+ print(agent_code)
77
 
78
+ # 2. Fetch Questions
79
+ print(f"Fetching questions from: {questions_url}")
 
 
 
80
  try:
81
  response = requests.get(questions_url, timeout=15)
82
  response.raise_for_status()
83
+ questions_data = response.json()
84
+ if not questions_data:
85
+ print("Fetched questions list is empty.")
86
+ return "Fetched questions list is empty or invalid format.", None
87
+ print(f"Fetched {len(questions_data)} questions.")
88
+ questions_data = questions_data[:2]
89
+ except requests.exceptions.RequestException as e:
90
+ print(f"Error fetching questions: {e}")
91
  return f"Error fetching questions: {e}", None
92
+ except requests.exceptions.JSONDecodeError as e:
93
+ print(f"Error decoding JSON response from questions endpoint: {e}")
94
+ print(f"Response text: {response.text[:500]}")
95
+ return f"Error decoding server response for questions: {e}", None
96
+ except Exception as e:
97
+ print(f"An unexpected error occurred fetching questions: {e}")
98
+ return f"An unexpected error occurred fetching questions: {e}", None
99
+
100
+ # 3. Run your Agent
101
+ print(f"Running agent on {len(questions_data)} questions...")
102
+ answers_payload, results_log = await run_questions_async(agent, questions_data)
103
+
104
+ if not answers_payload:
105
+ print("Agent did not produce any answers to submit.")
106
+ return "Agent did not produce any answers to submit.", pd.DataFrame(results_log)
107
 
108
+ # 4. Prepare Submission
109
+ submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
110
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
111
+ print(status_update)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
 
113
+ # 5. Submit
114
+ print(f"Submitting {len(answers_payload)} answers to: {submit_url}")
115
  try:
116
  response = requests.post(submit_url, json=submission_data, timeout=60)
117
  response.raise_for_status()
118
+ result_data = response.json()
119
  final_status = (
120
  f"Submission Successful!\n"
121
+ f"User: {result_data.get('username')}\n"
122
+ f"Overall Score: {result_data.get('score', 'N/A')}% "
123
+ f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
124
+ f"Message: {result_data.get('message', 'No message received.')}"
125
  )
126
+ print("Submission successful.")
127
+ results_df = pd.DataFrame(results_log)
128
+ return final_status, results_df
129
+ except requests.exceptions.HTTPError as e:
130
+ error_detail = f"Server responded with status {e.response.status_code}."
131
+ try:
132
+ error_json = e.response.json()
133
+ error_detail += f" Detail: {error_json.get('detail', e.response.text)}"
134
+ except requests.exceptions.JSONDecodeError:
135
+ error_detail += f" Response: {e.response.text[:500]}"
136
+ status_message = f"Submission Failed: {error_detail}"
137
+ print(status_message)
138
+ results_df = pd.DataFrame(results_log)
139
+ return status_message, results_df
140
+ except requests.exceptions.Timeout:
141
+ status_message = "Submission Failed: The request timed out."
142
+ print(status_message)
143
+ results_df = pd.DataFrame(results_log)
144
+ return status_message, results_df
145
+ except requests.exceptions.RequestException as e:
146
+ status_message = f"Submission Failed: Network error - {e}"
147
+ print(status_message)
148
+ results_df = pd.DataFrame(results_log)
149
+ return status_message, results_df
150
  except Exception as e:
151
+ status_message = f"An unexpected error occurred during submission: {e}"
152
+ print(status_message)
153
+ results_df = pd.DataFrame(results_log)
154
+ return status_message, results_df
155
 
156
+
157
+ # --- Build Gradio Interface using Blocks ---
 
158
  with gr.Blocks() as demo:
159
+ gr.Markdown("# Basic Agent Evaluation Runner")
160
+ gr.Markdown(
161
+ """
162
+ **Instructions:**
163
+ 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
164
+ 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
165
+ 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
166
+ """
167
+ )
168
+
169
  gr.LoginButton()
170
+
171
  run_button = gr.Button("Run Evaluation & Submit All Answers")
172
 
173
+ status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
174
+ # Removed max_rows=10 from DataFrame constructor
175
+ results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
176
 
177
+ run_button.click(
178
+ fn=run_and_submit_all,
179
+ outputs=[status_output, results_table]
180
+ )
181
 
182
  if __name__ == "__main__":
183
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
184
+ # Check for SPACE_HOST and SPACE_ID at startup for information
185
+ space_host_startup = os.getenv("SPACE_HOST")
186
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
187
+
188
+ if space_host_startup:
189
+ print(f"✅ SPACE_HOST found: {space_host_startup}")
190
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
191
+ else:
192
+ print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
193
+
194
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
195
+ print(f"✅ SPACE_ID found: {space_id_startup}")
196
+ print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
197
+ print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
198
+ else:
199
+ print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
200
+
201
+ print("-"*(60 + len(" App Starting ")) + "\n")
202
+
203
+ print("Launching Gradio Interface for Basic Agent Evaluation...")
204
+ demo.launch(debug=True, share=False)
describe_image_tool.py ADDED
@@ -0,0 +1,111 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import base64
2
+ import os
3
+ from openai import OpenAI
4
+ from smolagents import Tool
5
+
6
+ client = OpenAI()
7
+
8
+
9
+ class DescribeImageTool(Tool):
10
+ """
11
+ Tool to analyze and describe any image using GPT-4 Vision API.
12
+
13
+ Args:
14
+ image_path (str): Path to the image file.
15
+ description_type (str): Type of description to generate. Options:
16
+ - "general": General description of the image
17
+ - "detailed": Detailed analysis of the image
18
+ - "chess": Analysis of a chess position
19
+ - "text": Extract and describe text from the image
20
+ - "custom": Custom description based on user prompt
21
+
22
+ Returns:
23
+ str: Description of the image based on the requested type.
24
+ """
25
+
26
+ name = "describe_image"
27
+ description = "Analyzes and describes images using GPT-4 Vision API"
28
+ inputs = {
29
+ "image_path": {"type": "string", "description": "Path to the image file"},
30
+ "description_type": {
31
+ "type": "string",
32
+ "description": "Type of description to generate (general, detailed, chess, text, custom)",
33
+ "nullable": True,
34
+ },
35
+ "custom_prompt": {
36
+ "type": "string",
37
+ "description": "Custom prompt for description (only used when description_type is 'custom')",
38
+ "nullable": True,
39
+ },
40
+ }
41
+ output_type = "string"
42
+
43
+ def encode_image(self, image_path: str) -> str:
44
+ """Encode image to base64 string."""
45
+ with open(image_path, "rb") as image_file:
46
+ return base64.b64encode(image_file.read()).decode("utf-8")
47
+
48
+ def get_prompt(self, description_type: str, custom_prompt: str = None) -> str:
49
+ """Get appropriate prompt based on description type."""
50
+ prompts = {
51
+ "general": "Provide a general description of this image. Focus on the main subjects, colors, and overall scene.",
52
+ "detailed": """Analyze this image in detail. Include:
53
+ 1. Main subjects and their relationships
54
+ 2. Colors, lighting, and composition
55
+ 3. Any text or symbols present
56
+ 4. Context or possible meaning
57
+ 5. Notable details or interesting elements""",
58
+ "chess": """Analyze this chess position and provide a detailed description including:
59
+ 1. List of pieces on the board for both white and black
60
+ 2. Whose turn it is to move
61
+ 3. Basic evaluation of the position
62
+ 4. Any immediate tactical opportunities or threats
63
+ 5. Suggested next moves with brief explanations""",
64
+ "text": "Extract and describe any text present in this image. If there are multiple pieces of text, organize them clearly.",
65
+ }
66
+ return (
67
+ custom_prompt
68
+ if description_type == "custom"
69
+ else prompts.get(description_type, prompts["general"])
70
+ )
71
+
72
+ def forward(
73
+ self,
74
+ image_path: str,
75
+ description_type: str = "general",
76
+ custom_prompt: str = None,
77
+ ) -> str:
78
+ try:
79
+ if not os.path.exists(image_path):
80
+ return f"Error: Image file not found at {image_path}"
81
+
82
+ # Encode the image
83
+ base64_image = self.encode_image(image_path)
84
+
85
+ # Get appropriate prompt
86
+ prompt = self.get_prompt(description_type, custom_prompt)
87
+
88
+ # Make the API call
89
+ response = client.chat.completions.create(
90
+ model="gpt-4.1",
91
+ messages=[
92
+ {
93
+ "role": "user",
94
+ "content": [
95
+ {"type": "text", "text": prompt},
96
+ {
97
+ "type": "image_url",
98
+ "image_url": {
99
+ "url": f"data:image/jpeg;base64,{base64_image}"
100
+ },
101
+ },
102
+ ],
103
+ }
104
+ ],
105
+ max_tokens=1000,
106
+ )
107
+
108
+ return response.choices[0].message.content
109
+
110
+ except Exception as e:
111
+ return f"Error analyzing image: {str(e)}"
final_answer.py ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from smolagents import LiteLLMModel
2
+
3
+ def check_reasoning(final_answer, agent_memory):
4
+ model_name = 'cogito:14b'
5
+ multimodal_model = LiteLLMModel(model_id=f'ollama_chat/{model_name}')
6
+ prompt = f"""
7
+ Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the answer that was given:
8
+ {final_answer}
9
+ Please check that the reasoning process and results are correct: do they correctly answer the given task?
10
+ First list reasons why yes/no, then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not.
11
+ Be reasonably strict. You are being graded on your ability to provide the right answer. You should have >90% confidence that the answer is correct.
12
+ """
13
+ messages = [
14
+ {
15
+ "role": "user",
16
+ "content": [
17
+ {
18
+ "type": "text",
19
+ "text": prompt,
20
+ }
21
+ ]
22
+ }
23
+ ]
24
+ output = multimodal_model(messages).content
25
+ print("Feedback: ", output)
26
+ if "FAIL" in output:
27
+ raise Exception(output)
28
+ return True
29
+
30
+ def ensure_formatting(final_answer, agent_memory):
31
+ # Ensure the final answer is formatted correctly
32
+ model_name = 'granite3.3:8b'
33
+ # Initialize the chat model
34
+ model = LiteLLMModel(model_id=f'ollama_chat/{model_name}',
35
+ flatten_messages_as_text=True)
36
+ prompt = f"""
37
+ Here is a user-given task and the agent steps: {agent_memory.get_succinct_steps()}. Now here is the FINAL ANSWER that was given:
38
+ {final_answer}
39
+ Ensure the FINAL ANSWER is in the right format as asked for by the task. Here are the instructions that you need to evaluate:
40
+ YOUR FINAL ANSWER should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
41
+ If you are asked for a number, don't use commas to write your number. Don't use units such as $ or percent sign unless specified otherwise. Write your number in Arabic numbers (such as 9 or 3 or 1093) unless specified otherwise.
42
+ If you are asked for a currency in your answer, use the symbol for that currency. For example, if you are asked for the answers in USD, an example answer would be $40.00
43
+ If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
44
+ If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
45
+ If you are asked for a comma separated list, ensure you only return the content of that list, and NOT the brackets '[]'
46
+ First list reasons why it is/is not in the correct format and then write your final decision: PASS in caps lock if it is satisfactory, FAIL if it is not.
47
+ """
48
+ messages = [
49
+ {
50
+ "role": "user",
51
+ "content": [
52
+ {
53
+ "type": "text",
54
+ "text": prompt,
55
+ }
56
+ ]
57
+ }
58
+ ]
59
+ output = model(messages).content
60
+ print("Feedback: ", output)
61
+ if "FAIL" in output:
62
+ raise Exception(output)
63
+ return True
logger.py ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+
3
+
4
+ def get_logger(name: str = __name__) -> logging.Logger:
5
+ """
6
+ Create and configure a logger instance for the given module or name.
7
+
8
+ Args:
9
+ name (str, optional): Name of the logger. Defaults to the module name.
10
+
11
+ Returns:
12
+ logging.Logger: Configured logger instance.
13
+ """
14
+ logging.basicConfig(
15
+ format="%(asctime)s:%(module)s:%(funcName)s:%(levelname)s: %(message)s",
16
+ datefmt="%Y-%m-%d %H:%M:%S",
17
+ )
18
+ logger = logging.getLogger(name)
19
+ logger.setLevel(logging.INFO)
20
+ return logger
openai_speech_to_text_tool.py ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import whisper
3
+ from smolagents import Tool
4
+
5
+
6
+ class OpenAISpeechToTextTool(Tool):
7
+ """
8
+ Tool to convert speech to text using OpenAI's Whisper model.
9
+
10
+ Args:
11
+ audio_path (str): Path to the audio file.
12
+
13
+ Returns:
14
+ str: Transcribed text from the audio file.
15
+ """
16
+
17
+ name = "transcribe_audio"
18
+ description = "Transcribes audio to text and returns the text"
19
+ inputs = {
20
+ "audio_path": {"type": "string", "description": "Path to the audio file"},
21
+ }
22
+ output_type = "string"
23
+
24
+ def forward(self, audio_path: str) -> str:
25
+ try:
26
+ model = whisper.load_model("small")
27
+
28
+ if not os.path.exists(audio_path):
29
+ return f"Error: Audio file not found at {audio_path}"
30
+
31
+ result = model.transcribe(audio_path)
32
+ return result["text"]
33
+ except Exception as e:
34
+ return f"Error transcribing audio: {str(e)}"
read_file_tool.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from smolagents import Tool
2
+
3
+ class ReadFileTool(Tool):
4
+ """
5
+ Tool to read a file and return its content.
6
+
7
+ Args:
8
+ file_path (str): Path to the file to read.
9
+
10
+ Returns:
11
+ str: Content of the file or error message.
12
+ """
13
+
14
+ name = "read_file"
15
+ description = "Reads a file and returns its content"
16
+ inputs = {
17
+ "file_path": {"type": "string", "description": "Path to the file to read"},
18
+ }
19
+ output_type = "string"
20
+
21
+ def forward(self, file_path: str) -> str:
22
+ try:
23
+ with open(file_path, "r") as file:
24
+ return file.read()
25
+ except Exception as e:
26
+ return f"Error reading file: {str(e)}"
table_extractor_tool.py ADDED
@@ -0,0 +1,102 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from smolagents import Tool
2
+ import pandas as pd
3
+ from typing import Optional
4
+ import os
5
+
6
+ class TableExtractorTool(Tool):
7
+ """
8
+ Extracts tables from Excel (.xlsx, .xls) or CSV files and answers queries.
9
+ Auto-detects file type based on extension.
10
+ """
11
+ name = "table_extractor"
12
+ description = "Reads Excel/CSV files and answers questions about tabular data"
13
+ inputs = {
14
+ "file_path": {
15
+ "type": "string",
16
+ "description": "Path to Excel/CSV file"
17
+ },
18
+ "sheet_name": {
19
+ "type": "string",
20
+ "description": "Sheet name (Excel only, optional)",
21
+ "required": False,
22
+ "nullable": True
23
+ },
24
+ "query": {
25
+ "type": "string",
26
+ "description": "Question about the data (e.g., 'total sales')",
27
+ "required": False,
28
+ "nullable": True
29
+ }
30
+ }
31
+ output_type = "string"
32
+
33
+ def forward(self,
34
+ file_path: str,
35
+ sheet_name: Optional[str] = None,
36
+ query: Optional[str] = None) -> str:
37
+
38
+ try:
39
+ # Validate file exists
40
+ if not os.path.exists(file_path):
41
+ return f"Error: File not found at {file_path}"
42
+
43
+ # Read file based on extension
44
+ ext = os.path.splitext(file_path)[1].lower()
45
+
46
+ if ext in ('.xlsx', '.xls'):
47
+ df = self._read_excel(file_path, sheet_name)
48
+ elif ext == '.csv':
49
+ df = pd.read_csv(file_path)
50
+ else:
51
+ return f"Error: Unsupported file type {ext}"
52
+
53
+ if df.empty:
54
+ return "Error: No data found in file."
55
+
56
+ return self._answer_query(df, query) if query else df.to_string()
57
+
58
+ except Exception as e:
59
+ return f"Error processing file: {str(e)}"
60
+
61
+ def _read_excel(self, path: str, sheet_name: Optional[str]) -> pd.DataFrame:
62
+ """Read Excel file with sheet selection logic"""
63
+ if sheet_name:
64
+ return pd.read_excel(path, sheet_name=sheet_name)
65
+
66
+ # Auto-detect first non-empty sheet
67
+ sheets = pd.ExcelFile(path).sheet_names
68
+ for sheet in sheets:
69
+ df = pd.read_excel(path, sheet_name=sheet)
70
+ if not df.empty:
71
+ return df
72
+ return pd.DataFrame() # Return empty if all sheets are blank
73
+
74
+ def _answer_query(self, df: pd.DataFrame, query: str) -> str:
75
+ """Handles queries with pandas operations"""
76
+ query = query.lower()
77
+
78
+ try:
79
+ # SUM QUERIES (e.g., "total revenue")
80
+ if "total" in query or "sum" in query:
81
+ for col in df.select_dtypes(include='number').columns:
82
+ if col.lower() in query:
83
+ return f"Total {col}: {df[col].sum():.2f}"
84
+
85
+ # AVERAGE QUERIES (e.g., "average price")
86
+ elif "average" in query or "mean" in query:
87
+ for col in df.select_dtypes(include='number').columns:
88
+ if col.lower() in query:
89
+ return f"Average {col}: {df[col].mean():.2f}"
90
+
91
+ # FILTER QUERIES (e.g., "show sales > 1000")
92
+ elif ">" in query or "<" in query:
93
+ col = next((c for c in df.columns if c.lower() in query), None)
94
+ if col:
95
+ filtered = df.query(query.replace(col, f"`{col}`"))
96
+ return filtered.to_string()
97
+
98
+ # DEFAULT: Return full table with column names
99
+ return f"Data:\nColumns: {', '.join(df.columns)}\n\n{df.to_string()}"
100
+
101
+ except Exception as e:
102
+ return f"Query failed: {str(e)}\nAvailable columns: {', '.join(df.columns)}"
tools.py ADDED
@@ -0,0 +1,261 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import numpy
3
+ import tempfile
4
+ import requests
5
+ import whisper
6
+ import imageio
7
+ import yt_dlp
8
+
9
+ from PIL import Image
10
+ from typing import List, Optional
11
+ from urllib.parse import urlparse
12
+ from dotenv import load_dotenv
13
+ from smolagents import tool, LiteLLMModel
14
+ import google.generativeai as genai
15
+ from pytesseract import image_to_string
16
+
17
+ load_dotenv()
18
+
19
+ MODEL_ID = "gemini/gemini-2.5-flash-preview-05-20"
20
+
21
+ # Vision Tool
22
+ @tool
23
+ def vision_tool(prompt: str, image_list: List[Image.Image]) -> str:
24
+ """
25
+ Analyzes one or more images using a multimodal model.
26
+ Args:
27
+ prompt (str): The user question or task.
28
+ image_list (List[PIL.Image.Image]): A list of image objects.
29
+ Returns:
30
+ str: Model's response to the prompt about the images.
31
+ """
32
+ model = LiteLLMModel(model_id=MODEL_ID, api_key=os.getenv("GEMINI_API"), temperature=0.2)
33
+
34
+ payload = [{"type": "text", "text": prompt}] + [{"type": "image", "image": img} for img in image_list]
35
+ return model([{"role": "user", "content": payload}]).content
36
+
37
+
38
+ # YouTube Frame Sampler
39
+ @tool
40
+ def youtube_frames_to_images(url: str, every_n_seconds: int = 5) -> List[Image.Image]:
41
+ """
42
+ Downloads a YouTube video and extracts frames at regular intervals.
43
+
44
+ Args:
45
+ url (str): The URL of the YouTube video to process.
46
+ every_n_seconds (int): The time interval in seconds between extracted frames.
47
+
48
+ Returns:
49
+ List[Image.Image]: A list of sampled frames as PIL images.
50
+ """
51
+ with tempfile.TemporaryDirectory() as temp_dir:
52
+ ydl_cfg = {
53
+ "format": "bestvideo+bestaudio/best",
54
+ "outtmpl": os.path.join(temp_dir, "yt_video.%(ext)s"),
55
+ "merge_output_format": "mp4",
56
+ "quiet": True,
57
+ "force_ipv4": True
58
+ }
59
+ with yt_dlp.YoutubeDL(ydl_cfg) as ydl:
60
+ ydl.extract_info(url, download=True)
61
+
62
+ video_file = next((os.path.join(temp_dir, f) for f in os.listdir(temp_dir) if f.endswith('.mp4')), None)
63
+ reader = imageio.get_reader(video_file)
64
+ fps = reader.get_meta_data().get("fps", 30)
65
+ interval = int(fps * every_n_seconds)
66
+
67
+ return [Image.fromarray(frame) for i, frame in enumerate(reader) if i % interval == 0]
68
+
69
+
70
+ # YouTube QA via File URI
71
+ @tool
72
+ def ask_youtube_video(url: str, question: str) -> str:
73
+ """
74
+ Sends a YouTube video to a multimodal model and asks a question about it.
75
+
76
+ Args:
77
+ url (str): The URI of the video file (already uploaded and hosted).
78
+ question (str): The natural language question to ask about the video.
79
+
80
+ Returns:
81
+ str: The model's answer to the question.
82
+ """
83
+
84
+ try:
85
+ client = genai.Client(api_key=os.getenv('GEMINI_API'))
86
+ response = client.generate_content(
87
+ model=MODEL_ID,
88
+ contents=[
89
+ {"role": "user", "parts": [
90
+ {"text": question},
91
+ {"file_data": {"file_uri": url}}
92
+ ]}
93
+ ]
94
+ )
95
+ return response.text
96
+ except Exception as e:
97
+ return f"Error asking {MODEL_ID} about video: {str(e)}"
98
+
99
+
100
+ # File Reading Tool
101
+ @tool
102
+ def read_text_file(file_path: str) -> str:
103
+ """
104
+ Reads plain text content from a file.
105
+
106
+ Args:
107
+ file_path (str): The full path to the text file.
108
+
109
+ Returns:
110
+ str: The contents of the file, or an error message.
111
+ """
112
+ try:
113
+ with open(file_path, "r", encoding="utf-8") as f:
114
+ return f.read()
115
+ except Exception as e:
116
+ return f"Error reading file: {e}"
117
+
118
+
119
+ # File Downloader
120
+ @tool
121
+ def file_from_url(url: str, save_as: Optional[str] = None) -> str:
122
+ """
123
+ Downloads a file from a URL and saves it locally.
124
+
125
+ Args:
126
+ url (str): The URL of the file to download.
127
+ save_as (Optional[str]): Optional filename to save the file as.
128
+
129
+ Returns:
130
+ str: The local file path or an error message.
131
+ """
132
+ try:
133
+ if not save_as:
134
+ parsed = urlparse(url)
135
+ save_as = os.path.basename(parsed.path) or f"file_{os.urandom(4).hex()}"
136
+
137
+ file_path = os.path.join(tempfile.gettempdir(), save_as)
138
+ response = requests.get(url, stream=True)
139
+ response.raise_for_status()
140
+
141
+ with open(file_path, "wb") as f:
142
+ for chunk in response.iter_content(1024):
143
+ f.write(chunk)
144
+
145
+ return f"File saved to {file_path}"
146
+ except Exception as e:
147
+ return f"Download failed: {e}"
148
+
149
+
150
+ # Audio Transcription (YouTube)
151
+ @tool
152
+ def transcribe_youtube(yt_url: str) -> str:
153
+ """
154
+ Transcribes the audio from a YouTube video using Whisper.
155
+
156
+ Args:
157
+ yt_url (str): The URL of the YouTube video.
158
+
159
+ Returns:
160
+ str: The transcribed text of the video.
161
+ """
162
+ model = whisper.load_model("small")
163
+
164
+ with tempfile.TemporaryDirectory() as tempdir:
165
+ ydl_opts = {
166
+ "format": "bestaudio",
167
+ "outtmpl": os.path.join(tempdir, "audio.%(ext)s"),
168
+ "postprocessors": [{
169
+ "key": "FFmpegExtractAudio",
170
+ "preferredcodec": "wav"
171
+ }],
172
+ "quiet": True,
173
+ "force_ipv4": True
174
+ }
175
+
176
+ with yt_dlp.YoutubeDL(ydl_opts) as ydl:
177
+ ydl.extract_info(yt_url, download=True)
178
+
179
+ wav_file = next((os.path.join(tempdir, f) for f in os.listdir(tempdir) if f.endswith(".wav")), None)
180
+ return model.transcribe(wav_file)['text']
181
+
182
+
183
+ # Audio File Transcriber
184
+ @tool
185
+ def audio_to_text(audio_path: str) -> str:
186
+ """
187
+ Transcribes an uploaded audio file into text using Whisper.
188
+
189
+ Args:
190
+ audio_path (str): The local file path to the audio file.
191
+
192
+ Returns:
193
+ str: The transcribed text or an error message.
194
+ """
195
+ try:
196
+ model = whisper.load_model("small")
197
+ result = model.transcribe(audio_path)
198
+ return result['text']
199
+ except Exception as e:
200
+ return f"Failed to transcribe: {e}"
201
+
202
+
203
+ # OCR
204
+ @tool
205
+ def extract_text_via_ocr(image_path: str) -> str:
206
+ """
207
+ Extracts text from an image using Optical Character Recognition (OCR).
208
+
209
+ Args:
210
+ image_path (str): The local path to the image file.
211
+
212
+ Returns:
213
+ str: The extracted text or an error message.
214
+ """
215
+ try:
216
+ img = Image.open(image_path)
217
+ return image_to_string(img)
218
+ except Exception as e:
219
+ return f"OCR failed: {e}"
220
+
221
+
222
+ # CSV Analyzer
223
+ @tool
224
+ def summarize_csv_data(path: str, query: str = "") -> str:
225
+ """
226
+ Provides a summary of the contents of a CSV file.
227
+
228
+ Args:
229
+ path (str): The file path to the CSV file.
230
+ query (str): Optional query to run on the data.
231
+
232
+ Returns:
233
+ str: Summary statistics and column details or an error message.
234
+ """
235
+ try:
236
+ import pandas as pd
237
+ df = pd.read_csv(path)
238
+ return f"Loaded CSV with {len(df)} rows. Columns: {list(df.columns)}\n\n{df.describe()}"
239
+ except Exception as e:
240
+ return f"CSV error: {e}"
241
+
242
+
243
+ # Excel Analyzer
244
+ @tool
245
+ def summarize_excel_data(path: str, query: str = "") -> str:
246
+ """
247
+ Provides a summary of the contents of an Excel file.
248
+
249
+ Args:
250
+ path (str): The file path to the Excel file (.xls or .xlsx).
251
+ query (str): Optional query to run on the data.
252
+
253
+ Returns:
254
+ str: Summary statistics and column details or an error message.
255
+ """
256
+ try:
257
+ import pandas as pd
258
+ df = pd.read_excel(path)
259
+ return f"Excel file with {len(df)} rows. Columns: {list(df.columns)}\n\n{df.describe()}"
260
+ except Exception as e:
261
+ return f"Excel error: {e}"
youtube_transcription_tool.py ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from smolagents import Tool
2
+ from youtube_transcript_api import YouTubeTranscriptApi
3
+
4
+
5
+ class YouTubeTranscriptionTool(Tool):
6
+ """
7
+ Tool to fetch the transcript of a YouTube video given its URL.
8
+
9
+ Args:
10
+ video_url (str): YouTube video URL.
11
+
12
+ Returns:
13
+ str: Transcript of the video as a single string.
14
+ """
15
+
16
+ name = "youtube_transcription"
17
+ description = "Fetches the transcript of a YouTube video given its URL"
18
+ inputs = {
19
+ "video_url": {"type": "string", "description": "YouTube video URL"},
20
+ }
21
+ output_type = "string"
22
+
23
+ def forward(self, video_url: str) -> str:
24
+ video_id = video_url.strip().split("v=")[-1]
25
+ transcript = YouTubeTranscriptApi.get_transcript(video_id)
26
+ return " ".join([entry["text"] for entry in transcript])