Final_Assignment_Agent1

Sleeping

App Files Files Community

vkublytskyi commited on May 19, 2025

Commit

f2f8c38

1 Parent(s): 81917a3

Completed Final Assignemnt Agent V1

Browse files

Files changed (31) hide show

.env.tpl +16 -0
.gitattributes +3 -0
.gitignore +7 -0
README.md +54 -2
agents.py +338 -0
app.py +22 -5
exploration/chess.ipynb +3 -0
exploration/data/tasks/1f975693-876d-457b-a649-393859e79bf3/1f975693-876d-457b-a649-393859e79bf3.mp3 +3 -0
exploration/data/tasks/7bd855d8-463d-4ed5-93ca-5fe35145f733/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx +0 -0
exploration/data/tasks/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3 +3 -0
exploration/data/tasks/answers.agents_attempt_1.yaml +0 -0
exploration/data/tasks/answers.yaml +0 -0
exploration/data/tasks/answers_llm_only.yaml +0 -0
exploration/data/tasks/cca530fc-4052-43b2-b130-b30968d8aa44/cca530fc-4052-43b2-b130-b30968d8aa44.png +3 -0
exploration/data/tasks/f918266a-b3e0-4914-865d-4faa564f1aef/f918266a-b3e0-4914-865d-4faa564f1aef.py +35 -0
exploration/data/tasks/tasks.yaml +118 -0
exploration/data_analyst.ipynb +3 -0
exploration/information_retrieval.ipynb +3 -0
exploration/main.ipynb +3 -0
exploration/multi_agent.ipynb +3 -0
exploration/speech_recognition.ipynb +3 -0
exploration/youtube_exploration.ipynb +3 -0
performance_agent_v1.png +3 -0
tools/__init__.py +19 -0
tools/chess_tools.py +126 -0
tools/classifier_tool.py +89 -0
tools/content_retriever_tool.py +89 -0
tools/get_attachment_tool.py +77 -0
tools/google_search_tools.py +90 -0
tools/speech_recognition_tool.py +113 -0
tools/youtube_video_tool.py +383 -0

.env.tpl ADDED Viewed

	@@ -0,0 +1,16 @@

+### Environment variables required for agent
+HF_TOKEN=
+OPENAI_API_KEY=
+# Web search uses Google Custom Search API.
+# See: https://developers.google.com/custom-search/v1/overview
+# API key with access to Custom Search API.
+# See: https://developers.google.com/custom-search/v1/introduction#identify_your_application_to_google_with_api_key
+GOOGLE_SEARCH_API_KEY=
+# CX parameter from Control Panel.
+# See: https://developers.google.com/custom-search/v1/using_rest#make_a_request
+GOOGLE_SEARCH_ENGINE_ID=
+### Environment variable required for notebooks
+ANTHROPIC_API_KEY=
+GOOGLE_API_KEY=
+GEMINI_API_KEY=

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.ipynb filter=lfs diff=lfs merge=lfs -text
+*.mp3 filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,7 @@

+.env
+.venv*
+exploration/data/gaia-validation-metadata.jsonl
+__pycache__
+.DS_Store

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
-title: Template Final Assignment
-emoji: 🕵🏻‍♂️
 colorFrom: indigo
 colorTo: indigo
 sdk: gradio
@@ -12,4 +12,56 @@ hf_oauth: true
 hf_oauth_expiration_minutes: 480
 ---
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Final Assignment Agent by Volodymyr Kublytskyi
+emoji: 🧑‍💻
 colorFrom: indigo
 colorTo: indigo
 sdk: gradio
 hf_oauth_expiration_minutes: 480
 ---
+# Agent Performance
+![Agent performance heatmap](./performance_agent_v1.png)
+# Implementation Highlights
+* Uses `gpt-4.1` (main driver) and `gpt-4.1-mini` (selected tools).
+* Agents and tools implemented with [smolagents framework](https://huggingface.co/docs/smolagents/en/index).
+* Agents orchestration is done with [LangGraph](https://www.langchain.com/langgraph).
+* Output tailored specifically for [GAIA Benchmark](https://huggingface.co/spaces/gaia-benchmark/leaderboard).
+* Tools:
+  * [`openai/whisper-large-v3-turbo`](https://huggingface.co/openai/whisper-large-v3-turbo) for speech recognition,
+  * [`docling`](https://docling-project.github.io/docling/) for documents comprehension (web search, PDFs)
+  * [`sentence-transformers/all-MiniLM-L6-v2`](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) for document semantic chunks embedding,
+  * custom documents chunks retrieval with cosine similarities and softmax,
+  * YouTube vide processing with `gpt-4.1` inspired by [OpenAI cookbook](https://developers.google.com/custom-search/v1/overview).
+  * Chess problem solving with FEN recognition (`gpt-4.1`) and [`stockfish`](https://stockfishchess.org/)
+  * web search via (Google Custom Search API)[https://developers.google.com/custom-search/v1/overview],
+  * attachments downloaded by agent decision.
+# Agents List
+`general_assistant` Answers questions for best of knowledge and common reasoning grounded on already known information. Can understand multimedia including audio and video files and YouTube. ToolCallingAgent.
+`web_researcher` Answers questions that require grounding in unknown information through search on web sites and other online resources. ToolCallingAgent.
+`data_analyst` : Data analyst with advanced skills in statistic, handling tabular data and related Python packages. CodeAgent.
+`chess_player`: Chess grandmaster empowered by chess engine. Always thinks at least 100 steps ahead. CodeAgent.
+# Custom Tools
+`GetAttachmentTool`: Retrieves attachment for current task in specified format. Supported formats are `URL`, `DATA_URL`, `LOCAL_FILE_PATH`, `TEXT`.
+`GoogleSearchTool`: Performs a Google web search for query then returns top search results in markdown format.
+`GoogleSiteSearchTool`: Performs a Google search within the website for query then returns top search results in markdown format.
+`ContentRetrieverTool`: Retrieve the content of a webpage or document in markdown format. Supports PDF, DOCX, XLSX, HTML, images, and more.
+`SpeechRecognitionTool`: Transcribes speech from audio.
+`YoutubeVideoTool`: Process the video and return the requested information from it.
+`ClassifierTool`: Classifies given items into given categories from perspective of specific knowledge area.
+`ImageToChessBoardFENTool`: Convert a chessboard image to board part of the FEN.
+`chess_engine_locator`: Get the path to the chess engine binary. Can be used with `chess.engine.SimpleEngine.popen_uci` function from `chess.engine` Python module.
+# Space Configuration
 Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

agents.py ADDED Viewed

	@@ -0,0 +1,338 @@

+from typing import TypedDict, Optional
+from langgraph.graph import StateGraph, START, END
+from langchain_openai import ChatOpenAI
+from langchain_core.messages import HumanMessage
+from rich.console import Console
+from smolagents import (
+    CodeAgent,
+    ToolCallingAgent,
+    OpenAIServerModel,
+    AgentLogger,
+    LogLevel,
+    Panel,
+    Text,
+)
+from tools import (
+    GetAttachmentTool,
+    GoogleSearchTool,
+    GoogleSiteSearchTool,
+    ContentRetrieverTool,
+    YoutubeVideoTool,
+    SpeechRecognitionTool,
+    ClassifierTool,
+    ImageToChessBoardFENTool,
+    chess_engine_locator,
+)
+import openai
+import backoff
+def create_general_ai_agent(verbosity: int = LogLevel.INFO):
+    get_attachment_tool = GetAttachmentTool()
+    speech_recognition_tool = SpeechRecognitionTool()
+    env_tools = [
+        get_attachment_tool,
+    ]
+    model = OpenAIServerModel(model_id="gpt-4.1")
+    console = Console(record=True)
+    logger = AgentLogger(level=verbosity, console=console)
+    steps_buffer = []
+    def capture_step_log(agent) -> None:
+        steps_buffer.append(console.export_text(clear=True))
+    agents = {
+        agent.name: agent
+        for agent in [
+            ToolCallingAgent(
+                name="general_assistant",
+                description="Answers questions for best of knowledge and common reasoning grounded on already known information. Can understand multimedia including audio and video files and YouTube.",
+                model=model,
+                tools=env_tools
+                + [
+                    speech_recognition_tool,
+                    YoutubeVideoTool(
+                        client=model.client,
+                        speech_recognition_tool=speech_recognition_tool,
+                        frames_interval=3,
+                        chunk_duration=60,
+                        debug=True,
+                    ),
+                    ClassifierTool(
+                        client=model.client,
+                        model_id="gpt-4.1-mini",
+                    ),
+                ],
+                logger=logger,
+                step_callbacks=[capture_step_log],
+            ),
+            ToolCallingAgent(
+                name="web_researcher",
+                description="Answers questions that require grounding in unknown information through search on web sites and other online resources.",
+                tools=env_tools
+                + [
+                    GoogleSearchTool(),
+                    GoogleSiteSearchTool(),
+                    ContentRetrieverTool(),
+                ],
+                model=model,
+                planning_interval=3,
+                max_steps=9,
+                logger=logger,
+                step_callbacks=[capture_step_log],
+            ),
+            CodeAgent(
+                name="data_analyst",
+                description="Data analyst with advanced skills in statistic, handling tabular data and related Python packages.",
+                tools=env_tools,
+                additional_authorized_imports=[
+                    "numpy",
+                    "pandas",
+                    "tabulate",
+                    "matplotlib",
+                    "seaborn",
+                ],
+                model=model,
+                logger=logger,
+                step_callbacks=[capture_step_log],
+            ),
+            CodeAgent(
+                name="chess_player",
+                description="Chess grandmaster empowered by chess engine. Always thinks at least 100 steps ahead.",
+                tools=env_tools
+                + [
+                    ImageToChessBoardFENTool(client=model.client),
+                    chess_engine_locator,
+                ],
+                additional_authorized_imports=[
+                    "chess",
+                    "chess.engine",
+                ],
+                model=model,
+                logger=logger,
+                step_callbacks=[capture_step_log],
+            ),
+        ]
+    }
+    class GAIATask(TypedDict):
+        task_id: Optional[str | None] = None
+        question: str
+        steps: list[str] = []
+        agent: Optional[str | None] = None
+        raw_answer: Optional[str | None] = None
+        final_answer: Optional[str | None] = None
+    llm = ChatOpenAI(model="gpt-4.1")
+    logger = AgentLogger(level=verbosity)
+    @backoff.on_exception(backoff.expo, openai.RateLimitError, max_time=60, max_tries=6)
+    def llm_invoke_with_retry(messages):
+        response = llm.invoke(messages)
+        return response
+    def read_question(state: GAIATask):
+        logger.log_task(
+            content=state["question"].strip(),
+            subtitle=f"LangGraph with {type(llm).__name__} - {llm.model_name}",
+            level=LogLevel.INFO,
+            title="Final Assignment Agent for Hugging Face Agents Course",
+        )
+        get_attachment_tool.attachment_for(state["task_id"])
+        return {
+            "steps": [],
+            "agent": None,
+            "raw_answer": None,
+            "final_answer": None,
+        }
+    def select_agent(state: GAIATask):
+        agents_description = "\n\n".join(
+            [
+                f"AGENT NAME: {a.name}\nAGENT DESCRIPTION: {a.description}"
+                for a in agents.values()
+            ]
+        )
+        prompt = f"""\
+    You are a general AI assistant.
+    I will provide you a question and a list of agents with their descriptions.
+    Your task is to select the most appropriate agent to answer the question.
+    You can select one of the agents or decide that no agent is needed.
+    If question has attachment only agent can answer it.
+    QUESTION:
+    {state["question"]}
+    {agents_description}
+    Now, return the name of the agent you selected or "no agent needed" if you think that no agent is needed.
+    """
+        response = llm_invoke_with_retry([HumanMessage(content=prompt)])
+        agent_name = response.content.strip()
+        if agent_name in agents:
+            logger.log(
+                f"Agent {agent_name} selected for solving the task.",
+                level=LogLevel.DEBUG,
+            )
+            return {
+                "agent": agent_name,
+                "steps": state.get("steps", [])
+                + [
+                    f"Agent '{agent_name}' selected for task execution.",
+                ],
+            }
+        elif agent_name == "no agent needed":
+            logger.log(
+                "No appropriate agent found in the list. No agent will be used.",
+                level=LogLevel.DEBUG,
+            )
+            return {
+                "agent": None,
+                "steps": state.get("steps", [])
+                + [
+                    "A decision is made to solve the task directly without invoking any agent.",
+                ],
+            }
+        else:
+            logger.log(
+                f"[bold red]Warning to user: Unexpected agent name '{agent_name}' selected. No agent will be used.[/bold red]",
+                level=LogLevel.INFO,
+            )
+            return {
+                "agent": None,
+                "steps": state.get("steps", [])
+                + [
+                    f"Attempt to select non-existing agent '{agent_name}'. No agent will be used.",
+                ],
+            }
+    def delegate_to_agent(state: GAIATask):
+        agent_name = state.get("agent", None)
+        if not agent_name:
+            raise ValueError("Agent not selected.")
+        if agent_name not in agents:
+            raise ValueError(f"Agent '{agent_name}' is not available.")
+        logger.log(
+            Panel(Text(f"Calling agent: {agent_name}.")),
+            level=LogLevel.INFO,
+        )
+        agent = agents[agent_name]
+        agent_answer = agent.run(task=state["question"])
+        steps = [f"Agent '{agent_name}' step:\n{s}" for s in steps_buffer]
+        steps_buffer.clear()
+        return {
+            "raw_answer": agent_answer,
+            "steps": state.get("steps", []) + steps,
+        }
+    def one_shot_answering(state: GAIATask):
+        response = llm_invoke_with_retry([HumanMessage(content=state.get("question"))])
+        return {
+            "raw_answer": response.content,
+            "steps": state.get("steps", [])
+            + [
+                f"One-shot answer:\n{response.content}",
+            ],
+        }
+    def refine_answer(state: GAIATask):
+        question = state.get("question")
+        answer = state.get("raw_answer", None)
+        if not answer:
+            return {"final_answer": "No answer."}
+        prompt = f"""\
+    You are a general AI assistant.
+    I will provide you a question and correct answer to it. Answer is correct but may be too verbose or not follow the rules below.
+    Your task is to rephrase answer according to rules below.
+    Answer should be a number OR as few words as possible OR a comma separated list of numbers and/or strings.
+    If you are asked for a number, don't use comma to write your number neither use units such as $ or percent sign unless specified otherwise.
+    If you are asked for a string, don't use articles, neither abbreviations (e.g. for cities), and write the digits in plain text unless specified otherwise.
+    If you are asked for a comma separated list, apply the above rules depending of whether the element to be put in the list is a number or a string.
+    If you are asked for a comma separated list, use space after comma and before next element of the list unless other directly specified in a question.
+    Check question context to define if letters case matters. Do not change case if not prescribed by other rules or question.
+    If you are not asked for the list, capitalize the first letter of the answer unless it changes meaning of the answer.
+    If answer is number, use digits only not words unless other directly specified in a question.
+    If answer is not full sentence, do not add period at the end.
+    Preserve all items if the answer is a list.
+    QUESTION:
+    {question}
+    ANSWER:
+    {answer}
+    """
+        response = llm_invoke_with_retry([HumanMessage(content=prompt)])
+        refined_answer = response.content.strip()
+        logger.log(
+            Text(f"GAIA final answer: {refined_answer}", style="bold #d4b702"),
+            level=LogLevel.INFO,
+        )
+        return {
+            "final_answer": refined_answer,
+            "steps": state.get("steps", [])
+            + [
+                "Refining the answer according to GAIA benchmark rules.",
+                f"FINAL ANSWER: {response.content}",
+            ],
+        }
+    def route_task(state: GAIATask) -> str:
+        if state.get("agent") in agents:
+            return "agent selected"
+        else:
+            return "no agent matched"
+    # Create the graph
+    gaia_graph = StateGraph(GAIATask)
+    # Add nodes
+    gaia_graph.add_node("read_question", read_question)
+    gaia_graph.add_node("select_agent", select_agent)
+    gaia_graph.add_node("delegate_to_agent", delegate_to_agent)
+    gaia_graph.add_node("one_shot_answering", one_shot_answering)
+    gaia_graph.add_node("refine_answer", refine_answer)
+    # Start the edges
+    gaia_graph.add_edge(START, "read_question")
+    # Add edges - defining the flow
+    gaia_graph.add_edge("read_question", "select_agent")
+    # Add conditional branching from select_agent
+    gaia_graph.add_conditional_edges(
+        "select_agent",
+        route_task,
+        {"agent selected": "delegate_to_agent", "no agent matched": "one_shot_answering"},
+    )
+    # Add the final edges
+    gaia_graph.add_edge("delegate_to_agent", "refine_answer")
+    gaia_graph.add_edge("one_shot_answering", "refine_answer")
+    gaia_graph.add_edge("refine_answer", END)
+    gaia = gaia_graph.compile()
+    return gaia

app.py CHANGED Viewed

@@ -3,6 +3,7 @@ import gradio as gr
 import requests
 import inspect
 import pandas as pd
 # (Keep Constants as is)
 # --- Constants ---
@@ -12,12 +13,16 @@ DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
 # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
 class BasicAgent:
     def __init__(self):
         print("BasicAgent initialized.")
-    def __call__(self, question: str) -> str:
         print(f"Agent received question (first 50 chars): {question[:50]}...")
-        fixed_answer = "This is a default answer."
-        print(f"Agent returning fixed answer: {fixed_answer}")
-        return fixed_answer
 def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
@@ -33,6 +38,18 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
     else:
         print("User not logged in.")
         return "Please Login to Hugging Face with the button.", None
     api_url = DEFAULT_API_URL
     questions_url = f"{api_url}/questions"
@@ -80,7 +97,7 @@ def run_and_submit_all( profile: gr.OAuthProfile | None):
             print(f"Skipping item with missing task_id or question: {item}")
             continue
         try:
-            submitted_answer = agent(question_text)
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:

 import requests
 import inspect
 import pandas as pd
+import agents
 # (Keep Constants as is)
 # --- Constants ---
 # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
 class BasicAgent:
     def __init__(self):
+        self.gaia = agents.create_general_ai_agent(verbosity=0)
         print("BasicAgent initialized.")
+    def __call__(self, task_id: str, question: str) -> str:
         print(f"Agent received question (first 50 chars): {question[:50]}...")
+        task = self.gaia.invoke({
+            "task_id": task_id,
+            "question": question,
+        })
+        print(f"Agent returning fixed answer: {task["final_answer"]}")
+        return task["final_answer"]
 def run_and_submit_all( profile: gr.OAuthProfile | None):
     """
     else:
         print("User not logged in.")
         return "Please Login to Hugging Face with the button.", None
+    # --- Allow only space owner to run agent to avoid misuse ---
+    if not space_id.startswith(username.strip()):
+        print("User is not an owner of the space. Please duplicate space and configure OPENAI_API_KEY, HF_TOKEN, GOOGLE_SEARCH_API_KEY, and GOOGLE_SEARCH_ENGINE_ID environment variables.")
+        return "Please duplicate space to your account to run the agent.", None
+    # --- Check for required environment variables ---
+    required_env_vars = ["OPENAI_API_KEY", "HF_TOKEN", "GOOGLE_SEARCH_API_KEY", "GOOGLE_SEARCH_ENGINE_ID"]
+    missing_env_vars = [var for var in required_env_vars if not os.getenv(var)]
+    if missing_env_vars:
+        print(f"Missing environment variables: {', '.join(missing_env_vars)}")
+        return f"Missing environment variables: {', '.join(missing_env_vars)}", None
     api_url = DEFAULT_API_URL
     questions_url = f"{api_url}/questions"
             print(f"Skipping item with missing task_id or question: {item}")
             continue
         try:
+            submitted_answer = agent(task_id=task_id, question=question_text)
             answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
             results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
         except Exception as e:

exploration/chess.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3aaf7b47141fb12d48fd268b9ebb8d03863e570256b35e3753df247736f13473
+size 489015

exploration/data/tasks/1f975693-876d-457b-a649-393859e79bf3/1f975693-876d-457b-a649-393859e79bf3.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:200f767e732b49efef5c05d128903ee4d2c34e66fdce7f5593ac123b2e637673
+size 280868

exploration/data/tasks/7bd855d8-463d-4ed5-93ca-5fe35145f733/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx ADDED Viewed

Binary file (5.29 kB). View file

exploration/data/tasks/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b218c951c1f888f0bbe6f46c080f57afc7c9348fffc7ba4da35749ff1e2ac40f
+size 179304

exploration/data/tasks/answers.agents_attempt_1.yaml ADDED Viewed

The diff for this file is too large to render. See raw diff

exploration/data/tasks/answers.yaml ADDED Viewed

The diff for this file is too large to render. See raw diff

exploration/data/tasks/answers_llm_only.yaml ADDED Viewed

The diff for this file is too large to render. See raw diff

exploration/data/tasks/cca530fc-4052-43b2-b130-b30968d8aa44/cca530fc-4052-43b2-b130-b30968d8aa44.png ADDED Viewed

Git LFS Details

SHA256: daaa417b9746471ec313c3233bb63175908d49de0859b5bce99431392e45efd8
Pointer size: 130 Bytes
Size of remote file: 63.1 kB

exploration/data/tasks/f918266a-b3e0-4914-865d-4faa564f1aef/f918266a-b3e0-4914-865d-4faa564f1aef.py ADDED Viewed

	@@ -0,0 +1,35 @@

+from random import randint
+import time
+class UhOh(Exception):
+    pass
+class Hmm:
+    def __init__(self):
+        self.value = randint(-100, 100)
+    def Yeah(self):
+        if self.value == 0:
+            return True
+        else:
+            raise UhOh()
+def Okay():
+    while True:
+        yield Hmm()
+def keep_trying(go, first_try=True):
+    maybe = next(go)
+    try:
+        if maybe.Yeah():
+            return maybe.value
+    except UhOh:
+        if first_try:
+            print("Working...")
+            print("Please wait patiently...")
+        time.sleep(0.1)
+        return keep_trying(go, first_try=False)
+if __name__ == "__main__":
+    go = Okay()
+    print(f"{keep_trying(go)}")

exploration/data/tasks/tasks.yaml ADDED Viewed

	@@ -0,0 +1,118 @@

+- task_id: 8e867cd7-cff9-4e6c-867a-ff5ddc2550be
+  question: |-
+    How many studio albums were published by Mercedes Sosa between 2000 and 2009 (included)? You can use the latest 2022 version of english wikipedia.
+  attachment: null
+- task_id: a1e91b78-d3d8-4675-bb8d-62741b4b68a6
+  question: |-
+    In the video https://www.youtube.com/watch?v=L1vXCYZAYYM, what is the highest number of bird species to be on camera simultaneously?
+  attachment: null
+- task_id: 2d83110e-a098-4ebb-9987-066c06fa42d0
+  question: |-
+    .rewsna eht sa "tfel" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI
+  attachment: null
+- task_id: cca530fc-4052-43b2-b130-b30968d8aa44
+  question: |-
+    Review the chess position provided in the image. It is black's turn. Provide the correct next move for black which guarantees a win. Please provide your response in algebraic notation.
+  attachment:
+    url: https://agents-course-unit4-scoring.hf.space/files/cca530fc-4052-43b2-b130-b30968d8aa44
+    local_path: file://./cca530fc-4052-43b2-b130-b30968d8aa44/cca530fc-4052-43b2-b130-b30968d8aa44.png
+    mime_type: image/png
+- task_id: 4fc2f1ae-8625-45b5-ab34-ad4433bc21f8
+  question: |-
+    Who nominated the only Featured Article on English Wikipedia about a dinosaur that was promoted in November 2016?
+  attachment: null
+- task_id: 6f37996b-2ac7-44b0-8e68-6d28256631b4
+  question: |-
+    Given this table defining * on the set S = {a, b, c, d, e}
+    |*|a|b|c|d|e|
+    |---|---|---|---|---|---|
+    |a|a|b|c|b|d|
+    |b|b|c|a|e|c|
+    |c|c|a|b|b|a|
+    |d|b|e|b|e|d|
+    |e|d|b|a|d|c|
+    provide the subset of S involved in any possible counter-examples that prove * is not commutative. Provide your answer as a comma separated list of the elements in the set in alphabetical order.
+  attachment: null
+- task_id: 9d191bce-651d-4746-be2d-7ef8ecadb9c2
+  question: |-
+    Examine the video at https://www.youtube.com/watch?v=1htKBjuUWec.
+    What does Teal'c say in response to the question "Isn't that hot?"
+  attachment: null
+- task_id: cabe07ed-9eca-40ea-8ead-410ef5e83f91
+  question: |-
+    What is the surname of the equine veterinarian mentioned in 1.E Exercises from the chemistry materials licensed by Marisa Alviar-Agnew & Henry Agnew under the CK-12 license in LibreText's Introductory Chemistry materials as compiled 08/21/2023?
+  attachment: null
+- task_id: 3cef3a44-215e-4aed-8e3b-b1e3f08063b7
+  question: |-
+    I'm making a grocery list for my mom, but she's a professor of botany and she's a real stickler when it comes to categorizing things. I need to add different foods to different categories on the grocery list, but if I make a mistake, she won't buy anything inserted in the wrong category. Here's the list I have so far:
+    milk, eggs, flour, whole bean coffee, Oreos, sweet potatoes, fresh basil, plums, green beans, rice, corn, bell pepper, whole allspice, acorns, broccoli, celery, zucchini, lettuce, peanuts
+    I need to make headings for the fruits and vegetables. Could you please create a list of just the vegetables from my list? If you could do that, then I can figure out how to categorize the rest of the list into the appropriate categories. But remember that my mom is a real stickler, so make sure that no botanical fruits end up on the vegetable list, or she won't get them when she's at the store. Please alphabetize the list of vegetables, and place each item in a comma separated list.
+  attachment: null
+- task_id: 99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3
+  question: |-
+    Hi, I'm making a pie but I could use some help with my shopping list. I have everything I need for the crust, but I'm not sure about the filling. I got the recipe from my friend Aditi, but she left it as a voice memo and the speaker on my phone is buzzing so I can't quite make out what she's saying. Could you please listen to the recipe and list all of the ingredients that my friend described? I only want the ingredients for the filling, as I have everything I need to make my favorite pie crust. I've attached the recipe as Strawberry pie.mp3.
+    In your response, please only list the ingredients, not any measurements. So if the recipe calls for "a pinch of salt" or "two cups of ripe strawberries" the ingredients on the list would be "salt" and "ripe strawberries".
+    Please format your response as a comma separated list of ingredients. Also, please alphabetize the ingredients.
+  attachment:
+    url: https://agents-course-unit4-scoring.hf.space/files/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3
+    local_path: file://./99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3/99c9cc74-fdc8-46c6-8f8d-3ce2d3bfeea3.mp3
+    mime_type: audio/mpeg
+- task_id: 305ac316-eef6-4446-960a-92d80d542f82
+  question: |-
+    Who did the actor who played Ray in the Polish-language version of Everybody Loves Raymond play in Magda M.? Give only the first name.
+  attachment: null
+- task_id: f918266a-b3e0-4914-865d-4faa564f1aef
+  question: |-
+    What is the final numeric output from the attached Python code?
+  attachment:
+    url: https://agents-course-unit4-scoring.hf.space/files/f918266a-b3e0-4914-865d-4faa564f1aef
+    local_path: file://./f918266a-b3e0-4914-865d-4faa564f1aef/f918266a-b3e0-4914-865d-4faa564f1aef.py
+    mime_type: text/x-python; charset=utf-8
+- task_id: 3f57289b-8c60-48be-bd80-01f8099ca449
+  question: |-
+    How many at bats did the Yankee with the most walks in the 1977 regular season have that same season?
+  attachment: null
+- task_id: 1f975693-876d-457b-a649-393859e79bf3
+  question: |-
+    Hi, I was out sick from my classes on Friday, so I'm trying to figure out what I need to study for my Calculus mid-term next week. My friend from class sent me an audio recording of Professor Willowbrook giving out the recommended reading for the test, but my headphones are broken :(
+    Could you please listen to the recording for me and tell me the page numbers I'm supposed to go over? I've attached a file called Homework.mp3 that has the recording. Please provide just the page numbers as a comma-delimited list. And please provide the list in ascending order.
+  attachment:
+    url: https://agents-course-unit4-scoring.hf.space/files/1f975693-876d-457b-a649-393859e79bf3
+    local_path: file://./1f975693-876d-457b-a649-393859e79bf3/1f975693-876d-457b-a649-393859e79bf3.mp3
+    mime_type: audio/mpeg
+- task_id: 840bfca7-4f7b-481a-8794-c560c340185d
+  question: |-
+    On June 6, 2023, an article by Carolyn Collins Petersen was published in Universe Today. This article mentions a team that produced a paper about their observations, linked at the bottom of the article. Find this paper. Under what NASA award number was the work performed by R. G. Arendt supported by?
+  attachment: null
+- task_id: bda648d7-d618-4883-88f4-3466eabd860e
+  question: |-
+    Where were the Vietnamese specimens described by Kuznetzov in Nedoshivina's 2010 paper eventually deposited? Just give me the city name without abbreviations.
+  attachment: null
+- task_id: cf106601-ab4f-4af9-b045-5295fe67b37d
+  question: |-
+    What country had the least number of athletes at the 1928 Summer Olympics? If there's a tie for a number of athletes, return the first in alphabetical order. Give the IOC country code as your answer.
+  attachment: null
+- task_id: a0c07678-e491-4bbc-8f0b-07405144218f
+  question: "Who are the pitchers with the number before and after Taish\u014D Tamai's\
+    \ number as of July 2023? Give them to me in the form Pitcher Before, Pitcher\
+    \ After, use their last names only, in Roman characters."
+  attachment: null
+- task_id: 7bd855d8-463d-4ed5-93ca-5fe35145f733
+  question: |-
+    The attached Excel file contains the sales of menu items for a local fast-food chain. What were the total sales that the chain made from food (not including drinks)? Express your answer in USD with two decimal places.
+  attachment:
+    url: https://agents-course-unit4-scoring.hf.space/files/7bd855d8-463d-4ed5-93ca-5fe35145f733
+    local_path: file://./7bd855d8-463d-4ed5-93ca-5fe35145f733/7bd855d8-463d-4ed5-93ca-5fe35145f733.xlsx
+    mime_type: application/octet-stream
+- task_id: 5a0c1adf-205e-4841-a666-7c3ef95def9d
+  question: |-
+    What is the first name of the only Malko Competition recipient from the 20th Century (after 1977) whose nationality on record is a country that no longer exists?
+  attachment: null

exploration/data_analyst.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e264ea36751ebf653b01f256b41f1169ad07aada1838c08cf66c2d7b94d2fc30
+size 90059

exploration/information_retrieval.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9a1cfe31ee37560db13f49544065ea29d2e530ddd1be280460e38075b02f9649
+size 1986575

exploration/main.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ab439266c76330f855d257963675df6a06233d39ce410e2fe3e730943b01193d
+size 1708503

exploration/multi_agent.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b5c790c3436ccfffd85ac6d3d9d3b73df7dcb41d8f681313b361041d63e8a9a
+size 1071534

exploration/speech_recognition.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1123c2d76f235f9458ced0a7b8ba516a40cdc28406daa8d0ce7fa8d7bfa685ed
+size 105242

exploration/youtube_exploration.ipynb ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:527d7c45e9144921cfab812c4b2eeb8c4451d93dc1c5da6573c9536740c5c0fc
+size 33011286

performance_agent_v1.png ADDED Viewed

Git LFS Details

SHA256: bb0383da603fd82f0d1ae5b05d9ded905c7fdf62bff7964b5e067779e0709ebc
Pointer size: 131 Bytes
Size of remote file: 117 kB

tools/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from .get_attachment_tool import GetAttachmentTool
+from .google_search_tools import GoogleSearchTool, GoogleSiteSearchTool
+from .content_retriever_tool import ContentRetrieverTool
+from .speech_recognition_tool import SpeechRecognitionTool
+from .youtube_video_tool import YoutubeVideoTool
+from .classifier_tool import ClassifierTool
+from .chess_tools import ImageToChessBoardFENTool, chess_engine_locator
+__all__ = [
+    "GetAttachmentTool",
+    "GoogleSearchTool",
+    "GoogleSiteSearchTool",
+    "ContentRetrieverTool",
+    "SpeechRecognitionTool",
+    "YoutubeVideoTool",
+    "ClassifierTool",
+    "ImageToChessBoardFENTool",
+    "chess_engine_locator",
+]

tools/chess_tools.py ADDED Viewed

	@@ -0,0 +1,126 @@

+from smolagents import Tool, tool
+from openai import OpenAI
+import shutil
+@tool
+def chess_engine_locator() -> str | None:
+    """
+    Get the path to the chess engine binary. Can be used with chess.engine.SimpleEngine.popen_uci function from chess.engine Python module.
+    Returns:
+        str: Path to the chess engine.
+    """
+    path = shutil.which("stockfish")
+    return path if path else None
+class ImageToChessBoardFENTool(Tool):
+    name = "image_to_chess_board_fen"
+    description = """Convert a chessboard image to board part of the FEN."""
+    inputs = {
+        "image_url": {
+            "type": "string",
+            "description": "Public URL of the image (preferred) or base64 encoded image in data URL format.",
+        }
+    }
+    output_type = "string"
+    def __init__(self, client: OpenAI | None = None, **kwargs):
+        self.client = client if client is not None else OpenAI()
+        super().__init__(**kwargs)
+    def attachment_for(self, task_id: str | None):
+        self.task_id = task_id
+    def forward(self, image_url: str) -> str:
+        """
+        Convert a chessboard image to board part of the FEN.
+        Args:
+            image_url (str): Public URL of the image (preferred) or base64 encoded image in data URL format.
+        Returns:
+            str: Board part of the FEN.
+        """
+        client = self.client
+        response = client.responses.create(
+            model="gpt-4.1",
+            input=[
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "input_text",
+                            "text": "Describe the position of the pieces on the chessboard from the image. Please, nothing else but description.",
+                        },
+                        {"type": "input_image", "image_url": image_url},
+                    ],
+                }
+            ],
+        )
+        response = client.responses.create(
+            model="gpt-4.1",
+            input=[
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "input_text",
+                            "text": "Describe the position of the pieces on the chessboard from the image. Please, nothing else but description.",
+                        },
+                    ],
+                }
+            ]
+            + response.output
+            + [
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "input_text",
+                            "text": """\
+          Write down all positions with known pieces.
+          Use a standard one-letter code to name pieces.
+          It is important to use the correct case for piece code. Use upper case for white and lower case for black.
+          It is important to include information about all the mentioned positions.
+          Describe each position in a new line.
+          Follow format: <piece><position> (piece first, than position, no spaces)
+          Return nothing but lines with positions.
+          """,
+                        },
+                    ],
+                }
+            ],
+        )
+        board_pos = response.output_text
+        pos_dict = {}
+        for pos_str in board_pos.splitlines():
+            pos_str = pos_str.strip()
+            if len(pos_str) != 3:
+                continue
+            piece = pos_str[0]
+            pos = pos_str[1:3]
+            pos_dict[pos] = piece
+        board_fen = ""
+        for rank in range(8, 0, -1):
+            empty = 0
+            for file_c in range(ord("a"), ord("h") + 1):
+                file = chr(file_c)
+                square = file + str(rank)
+                if square in pos_dict:
+                    if empty > 0:
+                        board_fen += str(empty)
+                        empty = 0
+                    board_fen += pos_dict[square]
+                else:
+                    empty += 1
+            if empty > 0:
+                board_fen += str(empty)
+            if rank != 1:
+                board_fen += "/"
+        return board_fen

tools/classifier_tool.py ADDED Viewed

	@@ -0,0 +1,89 @@

+from smolagents import Tool
+from openai import OpenAI
+class ClassifierTool(Tool):
+    name = "open_classifier"
+    description = """Classifies given items into given categories from perspective of specific knowledge area."""
+    inputs = {
+        "knowledge_area": {
+            "type": "string",
+            "description": "The knowledge area that should be used for classification.",
+        },
+        "environment": {  # context make models too verbose
+            "type": "string",
+            "description": "Couple words that describe environment or location in which items should be classified in case of plural meaning or if only part of item relevant for classification.",
+        },
+        "categories": {
+            "type": "string",
+            "description": "Comma separated list of categories to distribute objects.",
+        },
+        "items": {
+            "type": "string",
+            "description": "Comma separated list of items to be classified. Please include adjectives if available.",
+        },
+    }
+    output_type = "string"
+    def __init__(
+        self,
+        client: OpenAI | None = None,
+        model_id: str = "gpt-4.1-mini",
+        **kwargs,
+    ):
+        self.client = client or OpenAI()
+        self.model_id = model_id
+        super().__init__(**kwargs)
+    def forward(
+        self, knowledge_area: str, environment: str, categories: str, items: str
+    ) -> str:
+        response = self.client.responses.create(
+            model=self.model_id,
+            input=[
+                {
+                    "role": "user",
+                    "content": [
+                        {
+                            "type": "input_text",
+                            "text": self._prompt(
+                                knowledge_area=knowledge_area,
+                                context=environment,
+                                categories=categories,
+                                items=items,
+                            ),
+                        },
+                    ],
+                }
+            ],
+        )
+        answer = response.output_text
+        return answer
+    def _prompt(
+        self, knowledge_area: str, context: str, categories: str, items: str
+    ) -> str:
+        return f"""\
+You are {knowledge_area} classifier located in {context} context.
+I will provide you a list of items and a list of categories and context in which items should be considered.
+Your task is to classify the items into the categories.
+Use context to determine the meaning of the items and decide if you need to classify entire item or only part of it.
+Do not miss any item and do not add any item to the list of categories.
+Use highest probability category for each item.
+You can add category "Other" if you are not sure about the classification.
+Use only considerations from from the {knowledge_area} perspective.
+Explain your reasoning from {knowledge_area} perspective in {context} context and then provide final answer.
+Important: Do not allow {context} influence your judgment for classification.
+ITEMS: {items}
+CATEGORIES: {categories}
+Now provide your reasoning and finalize it with the classification in the following format:
+Category 1: items list
+Category 2: items list
+Other (if needed): items list
+"""

tools/content_retriever_tool.py ADDED Viewed

	@@ -0,0 +1,89 @@

+from smolagents import Tool
+from docling.document_converter import DocumentConverter
+from docling.chunking import HierarchicalChunker
+from sentence_transformers import SentenceTransformer, util
+import torch
+class ContentRetrieverTool(Tool):
+    name = "retrieve_content"
+    description = """Retrieve the content of a webpage or document in markdown format. Supports PDF, DOCX, XLSX, HTML, images, and more."""
+    inputs = {
+        "url": {
+            "type": "string",
+            "description": "The URL or local path of the webpage or document to retrieve.",
+        },
+        "query": {
+            "type": "string",
+            "description": "The subject on the page you are looking for. The shorter the more relevant content is returned.",
+        },
+    }
+    output_type = "string"
+    def __init__(
+        self,
+        model_name: str | None = None,
+        threshold: float = 0.2,
+        **kwargs,
+    ):
+        self.threshold = threshold
+        self._document_converter = DocumentConverter()
+        self._model = SentenceTransformer(
+            model_name if model_name is not None else "all-MiniLM-L6-v2"
+        )
+        self._chunker = HierarchicalChunker()
+        super().__init__(**kwargs)
+    def forward(self, url: str, query: str) -> str:
+        document = self._document_converter.convert(url).document
+        chunks = list(self._chunker.chunk(dl_doc=document))
+        if len(chunks) == 0:
+            return "No content found."
+        chunks_text = [chunk.text for chunk in chunks]
+        chunks_with_context = [self._chunker.contextualize(chunk) for chunk in chunks]
+        chunks_context = [
+            chunks_with_context[i].replace(chunks_text[i], "").strip()
+            for i in range(len(chunks))
+        ]
+        chunk_embeddings = self._model.encode(chunks_text, convert_to_tensor=True)
+        context_embeddings = self._model.encode(chunks_context, convert_to_tensor=True)
+        query_embedding = self._model.encode(
+            [term.strip() for term in query.split(",") if term.strip()],
+            convert_to_tensor=True,
+        )
+        selected_indices = []  # aggregate indexes across chunks and context matches and for all queries
+        for embeddings in [
+            context_embeddings,
+            chunk_embeddings,
+        ]:
+            # Compute cosine similarities (returns 1D tensor)
+            for cos_scores in util.pytorch_cos_sim(query_embedding, embeddings):
+                # Convert to softmax probabilities
+                probabilities = torch.nn.functional.softmax(cos_scores, dim=0)
+                # Sort by probability descending
+                sorted_indices = torch.argsort(probabilities, descending=True)
+                # Accumulate until total probability reaches threshold
+                cumulative = 0.0
+                for i in sorted_indices:
+                    cumulative += probabilities[i].item()
+                    selected_indices.append(i.item())
+                    if cumulative >= self.threshold:
+                        break
+        selected_indices = list(
+            dict.fromkeys(selected_indices)
+        )  # remove duplicates and preserve order
+        selected_indices = selected_indices[
+            ::-1
+        ]  # make most relevant items last for better focus
+        if len(selected_indices) == 0:
+            return "No content found."
+        return "\n\n".join([chunks_with_context[idx] for idx in selected_indices])

tools/get_attachment_tool.py ADDED Viewed

	@@ -0,0 +1,77 @@

+from smolagents import Tool
+import requests
+from urllib.parse import urljoin
+import base64
+import tempfile
+class GetAttachmentTool(Tool):
+    name = "get_attachment"
+    description = """Retrieves attachment for current task in specified format."""
+    inputs = {
+        "fmt": {
+            "type": "string",
+            "description": "Format to retrieve attachment. Options are: URL (preferred), DATA_URL, LOCAL_FILE_PATH, TEXT. URL returns the URL of the file, DATA_URL returns a base64 encoded data URL, LOCAL_FILE_PATH returns a local file path to the downloaded file, and TEXT returns the content of the file as text.",
+            "nullable": True,
+            "default": "URL",
+        }
+    }
+    output_type = "string"
+    def __init__(
+        self,
+        agent_evaluation_api: str | None = None,
+        task_id: str | None = None,
+        **kwargs,
+    ):
+        self.agent_evaluation_api = (
+            agent_evaluation_api
+            if agent_evaluation_api is not None
+            else "https://agents-course-unit4-scoring.hf.space/"
+        )
+        self.task_id = task_id
+        super().__init__(**kwargs)
+    def attachment_for(self, task_id: str | None):
+        self.task_id = task_id
+    def forward(self, fmt: str = "URL") -> str:
+        fmt = fmt.upper()
+        assert fmt in ["URL", "DATA_URL", "LOCAL_FILE_PATH", "TEXT"]
+        if not self.task_id:
+            return ""
+        file_url = urljoin(self.agent_evaluation_api, f"files/{self.task_id}")
+        if fmt == "URL":
+            return file_url
+        response = requests.get(
+            file_url,
+            headers={
+                "Content-Type": "application/json",
+                "Accept": "application/json",
+            },
+        )
+        if 400 <= response.status_code < 500:
+            return ""
+        response.raise_for_status()
+        mime = response.headers.get("content-type", "text/plain")
+        if fmt == "TEXT":
+            if mime.startswith("text/"):
+                return response.text
+            else:
+                raise ValueError(
+                    f"Content of file type {mime} cannot be retrieved as TEXT."
+                )
+        elif fmt == "DATA_URL":
+            return f"data:{mime};base64,{base64.b64encode(response.content).decode('utf-8')}"
+        elif fmt == "LOCAL_FILE_PATH":
+            with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
+                tmp_file.write(response.content)
+                return tmp_file.name
+        else:
+            raise ValueError(
+                f"Unsupported format: {fmt}. Supported formats are URL, DATA_URL, LOCAL_FILE_PATH, and TEXT."
+            )

tools/google_search_tools.py ADDED Viewed

	@@ -0,0 +1,90 @@

+from smolagents import Tool
+from googleapiclient.discovery import build
+import os
+class GoogleSearchTool(Tool):
+    name = "web_search"
+    description = """Performs a google web search for query then returns top search results in markdown format."""
+    inputs = {
+        "query": {
+            "type": "string",
+            "description": "The query to perform search.",
+        },
+    }
+    output_type = "string"
+    skip_forward_signature_validation = True
+    def __init__(
+        self,
+        api_key: str | None = None,
+        search_engine_id: str | None = None,
+        num_results: int = 10,
+        **kwargs,
+    ):
+        api_key = api_key if api_key is not None else os.getenv("GOOGLE_SEARCH_API_KEY")
+        if not api_key:
+            raise ValueError(
+                "Please set the GOOGLE_SEARCH_API_KEY environment variable."
+            )
+        search_engine_id = (
+            search_engine_id
+            if search_engine_id is not None
+            else os.getenv("GOOGLE_SEARCH_ENGINE_ID")
+        )
+        if not search_engine_id:
+            raise ValueError(
+                "Please set the GOOGLE_SEARCH_ENGINE_ID environment variable."
+            )
+        self.cse = build("customsearch", "v1", developerKey=api_key).cse()
+        self.cx = search_engine_id
+        self.num = num_results
+        super().__init__(**kwargs)
+    def _collect_params(self) -> dict:
+        return {}
+    def forward(self, query: str, *args, **kwargs) -> str:
+        params = {
+            "q": query,
+            "cx": self.cx,
+            "fields": "items(title,link,snippet)",
+            "num": self.num,
+        }
+        params = params | self._collect_params(*args, **kwargs)
+        response = self.cse.list(**params).execute()
+        if "items" not in response:
+            return "No results found."
+        result = "\n\n".join(
+            [
+                f"[{item['title']}]({item['link']})\n{item['snippet']}"
+                for item in response["items"]
+            ]
+        )
+        return result
+class GoogleSiteSearchTool(GoogleSearchTool):
+    name = "site_search"
+    description = """Performs a google search within the website for query then returns top search results in markdown format."""
+    inputs = {
+        "query": {
+            "type": "string",
+            "description": "The query to perform search.",
+        },
+        "site": {
+            "type": "string",
+            "description": "The domain of the site on which to search.",
+        },
+    }
+    def _collect_params(self, site: str) -> dict:
+        return {
+            "siteSearch": site,
+            "siteSearchFilter": "i",
+        }

tools/speech_recognition_tool.py ADDED Viewed

	@@ -0,0 +1,113 @@

+from smolagents import Tool
+import torch
+from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline, logging
+import warnings
+class SpeechRecognitionTool(Tool):
+    name = "speech_to_text"
+    description = """Transcribes speech from audio."""
+    inputs = {
+        "audio": {
+            "type": "string",
+            "description": "Path to the audio file to transcribe.",
+        },
+        "with_time_markers": {
+            "type": "boolean",
+            "description": "Whether to include timestamps in the transcription output. Each timestamp appears on its own line in the format [float, float], indicating the number of seconds elapsed from the start of the audio.",
+            "nullable": True,
+            "default": False,
+        },
+    }
+    output_type = "string"
+    chunk_length_s = 30
+    def __new__(cls, *args, **kwargs):
+        device = "cuda:0" if torch.cuda.is_available() else "cpu"
+        torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+        model_id = "openai/whisper-large-v3-turbo"
+        model = AutoModelForSpeechSeq2Seq.from_pretrained(
+            model_id,
+            torch_dtype=torch_dtype,
+            low_cpu_mem_usage=True,
+            use_safetensors=True,
+        )
+        model.to(device)
+        processor = AutoProcessor.from_pretrained(model_id)
+        logging.set_verbosity_error()
+        warnings.filterwarnings(
+            "ignore",
+            category=FutureWarning,
+            message=r".*The input name `inputs` is deprecated.*",
+        )
+        cls.pipe = pipeline(
+            "automatic-speech-recognition",
+            model=model,
+            tokenizer=processor.tokenizer,
+            feature_extractor=processor.feature_extractor,
+            torch_dtype=torch_dtype,
+            device=device,
+            chunk_length_s=cls.chunk_length_s,
+            return_timestamps=True,
+        )
+        return super().__new__(cls, *args, **kwargs)
+    def forward(self, audio: str, with_time_markers: bool = False) -> str:
+        """
+        Transcribes speech from audio.
+        Args:
+            audio (str): Path to the audio file to transcribe.
+            with_time_markers (bool): Whether to include timestamps in the transcription output. Each timestamp appears on its own line in the format [float], indicating the number of seconds elapsed from the start of the audio.
+        Returns:
+            str: The transcribed text.
+        """
+        result = self.pipe(audio)
+        if not with_time_markers:
+            return result["text"].strip()
+        txt = ""
+        for chunk in self._normalize_chunks(result["chunks"]):
+            txt += f"[{chunk['start']:.2f}]\n{chunk['text']}\n[{chunk['end']:.2f}]\n"
+        return txt.strip()
+    def transcribe(self, audio, **kwargs):
+        result = self.pipe(audio, **kwargs)
+        return self._normalize_chunks(result["chunks"])
+    def _normalize_chunks(self, chunks):
+        chunk_length_s = self.chunk_length_s
+        absolute_offset = 0.0
+        chunk_offset = 0.0
+        normalized = []
+        for chunk in chunks:
+            timestamp_start = chunk["timestamp"][0]
+            timestamp_end = chunk["timestamp"][1]
+            if timestamp_start < chunk_offset:
+                absolute_offset += chunk_length_s
+                chunk_offset = timestamp_start
+            absolute_start = absolute_offset + timestamp_start
+            if timestamp_end < timestamp_start:
+                absolute_offset += chunk_length_s
+            absolute_end = absolute_offset + timestamp_end
+            chunk_offset = timestamp_end
+            chunk_text = chunk["text"].strip()
+            if chunk_text:
+                normalized.append(
+                    {
+                        "start": absolute_start,
+                        "end": absolute_end,
+                        "text": chunk_text,
+                    }
+                )
+        return normalized

tools/youtube_video_tool.py ADDED Viewed

	@@ -0,0 +1,383 @@

+from smolagents import Tool
+from openai import OpenAI
+from .speech_recognition_tool import SpeechRecognitionTool
+from io import BytesIO
+import yt_dlp
+import av
+import torchaudio
+import subprocess
+import requests
+import base64
+class YoutubeVideoTool(Tool):
+    name = "youtube_video"
+    description = """Process the video and return the requested information from it."""
+    inputs = {
+        "url": {
+            "type": "string",
+            "description": "The URL of the YouTube video.",
+        },
+        "query": {
+            "type": "string",
+            "description": "The question to answer.",
+        },
+    }
+    output_type = "string"
+    def __init__(
+        self,
+        video_quality: int = 360,
+        frames_interval: int | float | None = 2,
+        chunk_duration: int | float | None = 20,
+        speech_recognition_tool: SpeechRecognitionTool | None = None,
+        client: OpenAI | None = None,
+        model_id: str = "gpt-4.1-mini",
+        debug: bool = False,
+        **kwargs,
+    ):
+        self.video_quality = video_quality
+        self.speech_recognition_tool = speech_recognition_tool
+        self.frames_interval = frames_interval
+        self.chunk_duration = chunk_duration
+        self.client = client or OpenAI()
+        self.model_id = model_id
+        self.debug = debug
+        super().__init__(**kwargs)
+    def forward(self, url: str, query: str):
+        """
+        Process the video and return the requested information.
+        Args:
+            url (str): The URL of the YouTube video.
+            query (str): The question to answer.
+        Returns:
+            str: Answer to the query.
+        """
+        answer = ""
+        for chunk in self._split_video_into_chunks(url):
+            prompt = self._prompt(
+                chunk,
+                query,
+                answer,
+            )
+            response = self.client.responses.create(
+                model="gpt-4.1-mini",
+                input=[
+                    {
+                        "role": "user",
+                        "content": [
+                            {
+                                "type": "input_text",
+                                "text": prompt,
+                            },
+                            *[
+                                {
+                                    "type": "input_image",
+                                    "image_url": f"data:image/jpeg;base64,{frame}",
+                                }
+                                for frame in self._base64_frames(chunk["frames"])
+                            ],
+                        ],
+                    }
+                ],
+            )
+            answer = response.output_text
+            if self.debug:
+                print(
+                    f"CHUNK {chunk['start']} - {chunk['end']}:\n\n{prompt}\n\nANSWER:\n{answer}"
+                )
+        if answer.strip() == "I need to keep watching":
+            answer = ""
+        return answer
+    def _prompt(self, chunk, query, aggregated_answer):
+        prompt = [
+            f"""\
+These are some frames of a video that I want to upload.
+I will ask a question about the entire video, but I will only last part of it.
+Aggregate answer about the entire video, use information about previous parts but do not reference the previous parts in the answer directly.
+Ground your answer based on video title, description, captions, vide frames or answer from previous parts.
+If no evidences presented just say "I need to keep watching".
+VIDEO TITLE:
+{chunk["title"]}
+VIDEO DESCRIPTION:
+{chunk["description"]}
+FRAMES SUBTITLES:
+{chunk["captions"]}"""
+        ]
+        if aggregated_answer:
+            prompt.append(f"""\
+Here is the answer to the same question based on the previous video parts:
+BASED ON PREVIOUS PARTS:
+{aggregated_answer}""")
+        prompt.append(f"""\
+QUESTION:
+{query}""")
+        return "\n\n".join(prompt)
+    def _split_video_into_chunks(
+        self, url: str, with_captions: bool = True, with_frames: bool = True
+    ):
+        video = self._process_video(
+            url, with_captions=with_captions, with_frames=with_frames
+        )
+        video_duration = video["duration"]
+        chunk_duration = self.chunk_duration or video_duration
+        chunk_start = 0.0
+        while chunk_start < video_duration:
+            chunk_end = min(chunk_start + chunk_duration, video_duration)
+            chunk = self._get_video_chunk(video, chunk_start, chunk_end)
+            yield chunk
+            chunk_start += chunk_duration
+    def _get_video_chunk(self, video, start, end):
+        chunk_captions = [
+            c for c in video["captions"] if c["start"] <= end and c["end"] >= start
+        ]
+        chunk_frames = [
+            f
+            for f in video["frames"]
+            if f["timestamp"] >= start and f["timestamp"] <= end
+        ]
+        return {
+            "title": video["title"],
+            "description": video["description"],
+            "start": start,
+            "end": end,
+            "captions": "\n".join([c["text"] for c in chunk_captions]),
+            "frames": chunk_frames,
+        }
+    def _process_video(
+        self, url: str, with_captions: bool = True, with_frames: bool = True
+    ):
+        lang = "en"
+        info = self._get_video_info(url, lang)
+        if with_captions:
+            captions = self._extract_captions(
+                lang, info.get("subtitles", {}), info.get("automatic_captions", {})
+            )
+            if not captions and self.speech_recognition_tool:
+                audio_url = self._select_audio_format(info["formats"])
+                audio = self._capture_audio(audio_url)
+                waveform, sample_rate = torchaudio.load(audio)
+                assert sample_rate == 16000
+                waveform_np = waveform.squeeze().numpy()
+                captions = self.speech_recognition_tool.transcribe(waveform_np)
+        else:
+            captions = []
+        if with_frames:
+            video_url = self._select_video_format(info["formats"], 360)["url"]
+            frames = self._capture_video_frames(video_url, self.frames_interval)
+        else:
+            frames = []
+        return {
+            "id": info["id"],
+            "title": info["title"],
+            "description": info["description"],
+            "duration": info["duration"],
+            "captions": captions,
+            "frames": frames,
+        }
+    def _get_video_info(self, url: str, lang: str):
+        ydl_opts = {
+            "quiet": True,
+            "skip_download": True,
+            "format": "bestvideo[ext=mp4][height<=360]+bestaudio[ext=m4a]/best[height<=360]",
+            "forceurl": True,
+            "noplaylist": True,
+            "writesubtitles": True,
+            "writeautomaticsub": True,
+            "subtitlesformat": "vtt",
+            "subtitleslangs": [lang],
+        }
+        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
+            info = ydl.extract_info(url, download=False)
+        return info
+    def _extract_captions(self, lang, subtitles, auto_captions):
+        caption_tracks = subtitles.get(lang) or auto_captions.get(lang) or []
+        structured_captions = []
+        srt_track = next(
+            (track for track in caption_tracks if track["ext"] == "srt"), None
+        )
+        vtt_track = next(
+            (track for track in caption_tracks if track["ext"] == "vtt"), None
+        )
+        if srt_track:
+            import pysrt
+            response = requests.get(srt_track["url"])
+            response.raise_for_status()
+            srt_data = response.content.decode("utf-8")
+            def to_sec(t):
+                return (
+                    t.hours * 3600 + t.minutes * 60 + t.seconds + t.milliseconds / 1000
+                )
+            structured_captions = [
+                {
+                    "start": to_sec(sub.start),
+                    "end": to_sec(sub.end),
+                    "text": sub.text.strip(),
+                }
+                for sub in pysrt.from_str(srt_data)
+            ]
+        if vtt_track:
+            import webvtt
+            from io import StringIO
+            response = requests.get(vtt_track["url"])
+            response.raise_for_status()
+            vtt_data = response.text
+            vtt_file = StringIO(vtt_data)
+            def to_sec(t):
+                """Convert 'HH:MM:SS.mmm' to float seconds"""
+                h, m, s = t.split(":")
+                s, ms = s.split(".")
+                return int(h) * 3600 + int(m) * 60 + int(s) + int(ms) / 1000
+            for caption in webvtt.read_buffer(vtt_file):
+                structured_captions.append(
+                    {
+                        "start": to_sec(caption.start),
+                        "end": to_sec(caption.end),
+                        "text": caption.text.strip(),
+                    }
+                )
+        return structured_captions
+    def _select_video_format(self, formats, video_quality):
+        video_format = next(
+            f
+            for f in formats
+            if f.get("vcodec") != "none" and f.get("height") == video_quality
+        )
+        return video_format
+    def _capture_video_frames(self, video_url, capture_interval_sec=None):
+        ffmpeg_cmd = [
+            "ffmpeg",
+            "-i",
+            video_url,
+            "-f",
+            "matroska",  # container format
+            "-",
+        ]
+        process = subprocess.Popen(
+            ffmpeg_cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL
+        )
+        container = av.open(process.stdout)
+        stream = container.streams.video[0]
+        time_base = stream.time_base
+        frames = []
+        next_capture_time = 0
+        for frame in container.decode(stream):
+            if frame.pts is None:
+                continue
+            timestamp = float(frame.pts * time_base)
+            if capture_interval_sec is None or timestamp >= next_capture_time:
+                frames.append(
+                    {
+                        "timestamp": timestamp,
+                        "image": frame.to_image(),  # PIL image
+                    }
+                )
+                if capture_interval_sec is not None:
+                    next_capture_time += capture_interval_sec
+        process.terminate()
+        return frames
+    def _base64_frames(self, frames):
+        base64_frames = []
+        for f in frames:
+            buffered = BytesIO()
+            f["image"].save(buffered, format="JPEG")
+            encoded = base64.b64encode(buffered.getvalue()).decode("utf-8")
+            base64_frames.append(encoded)
+        return base64_frames
+    def _select_audio_format(self, formats):
+        audio_formats = [
+            f
+            for f in formats
+            if f.get("vcodec") == "none"
+            and f.get("acodec")
+            and f.get("acodec") != "none"
+        ]
+        if not audio_formats:
+            raise ValueError("No valid audio-only formats found.")
+        # Prefer m4a > webm, highest abr first
+        preferred_exts = ["m4a", "webm"]
+        def sort_key(f):
+            ext_score = (
+                preferred_exts.index(f["ext"]) if f["ext"] in preferred_exts else 99
+            )
+            abr = f.get("abr") or 0
+            return (ext_score, -abr)
+        audio_formats.sort(key=sort_key)
+        return audio_formats[0]["url"]
+    def _capture_audio(self, audio_url) -> BytesIO:
+        audio_buffer = BytesIO()
+        ffmpeg_audio_cmd = [
+            "ffmpeg",
+            "-i",
+            audio_url,
+            "-f",
+            "wav",
+            "-acodec",
+            "pcm_s16le",  # Whisper prefers PCM
+            "-ac",
+            "1",  # Mono
+            "-ar",
+            "16000",  # 16kHz for Whisper
+            "-",
+        ]
+        result = subprocess.run(
+            ffmpeg_audio_cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE
+        )
+        if result.returncode != 0:
+            raise RuntimeError("ffmpeg failed:\n" + result.stderr.decode())
+        audio_buffer = BytesIO(result.stdout)
+        audio_buffer.seek(0)
+        return audio_buffer