final_gaia_agent_hf_course

Build error

App Files Files Community

final_gaia_agent_hf_course / README.md

serverdaun

populate README.md

f3b7523 10 months ago

preview code

raw

history blame contribute delete

3.71 kB

A newer version of the Gradio SDK is available: 6.9.0

Upgrade

metadata

title: GAIA Agent
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Final Agent HF Course

This project is part of the Hugging Face Agents Course. For more information about the course, syllabus, and certification process, visit the course introduction page.

You can find and try the agent in my Hugging Face Space here: serverdaun/final_gaia_agent_hf_course.

GAIA Benchmark Target

This agent is designed to participate in the GAIA benchmark for General AI Assistants. GAIA is a comprehensive benchmark for evaluating the capabilities of general AI agents across a wide range of tasks. The benchmark is maintained by the Hugging Face community and features a public leaderboard for submissions and results.

For more information about GAIA, its datasets, and the leaderboard, visit the GAIA organization page.

Agent Logic Overview

Architecture

This project implements a modular agent using LangGraph and LangChain frameworks. The agent is orchestrated as a state graph, where each node represents a step in the reasoning or tool-use process. The core LLM is accessed via Azure OpenAI, and the agent is designed to invoke a variety of tools to solve complex tasks.

Tools

The agent is equipped with a rich set of tools, including:

Search Tools: Wikipedia, Tavily, and Arxiv search for retrieving information from the web and scientific literature.
Math Tools: Arithmetic operations, power, square root, modulus, and group theory utilities (commutativity, associativity, identity, inverses).
Web Scraping: Extracts main content from arbitrary web pages.
Image Analysis: Uses Azure OpenAI's vision capabilities to answer questions about images.
Audio Transcription: Transcribes audio files using Whisper.
Code Execution: Runs code files in various languages (Python, JS, TS, Bash, Ruby, PHP, Go) and returns output/errors.
Tabular Data Tools: Summarizes, filters, and manipulates CSV, Excel, and Parquet files.

Agent Workflow

Initialization: The agent is built using a state graph, with nodes for the LLM and tool invocation. The LLM is bound to the available tools.
Receiving Questions: The Gradio app fetches a set of questions (some with associated files) from a remote API.
Processing: For each question, the agent constructs a message history (including a system prompt and the user question/file path) and invokes the LLM. If the LLM decides a tool is needed, the appropriate tool is called and the result is fed back into the conversation.
Answer Extraction: The agent's final answer is parsed and submitted back to the evaluation server.
Submission: All answers are submitted in batch, and the results (including score and feedback) are displayed in the Gradio interface.

Extending the Agent

Adding Tools: Implement a new function in tools.py and decorate it with @tool. Add it to the TOOLS list in agent.py.
Modifying Logic: Adjust the state graph in agent.py or the agent invocation logic in app.py as needed.