serverdaun's picture
populate README.md
f3b7523
---
title: GAIA Agent
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
### Final Agent HF Course
This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course/unit0/introduction). For more information about the course, syllabus, and certification process, visit the [course introduction page](https://huggingface.co/learn/agents-course/unit0/introduction).
You can find and try the agent in my Hugging Face Space here: [serverdaun/final_gaia_agent_hf_course](https://huggingface.co/spaces/serverdaun/final_gaia_agent_hf_course).
---
## GAIA Benchmark Target
This agent is designed to participate in the [GAIA benchmark for General AI Assistants](https://huggingface.co/gaia-benchmark). GAIA is a comprehensive benchmark for evaluating the capabilities of general AI agents across a wide range of tasks. The benchmark is maintained by the Hugging Face community and features a public [leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard) for submissions and results.
For more information about GAIA, its datasets, and the leaderboard, visit the [GAIA organization page](https://huggingface.co/gaia-benchmark).
## Agent Logic Overview
### Architecture
This project implements a modular agent using [LangGraph](https://github.com/langchain-ai/langgraph) and [LangChain](https://github.com/langchain-ai/langchain) frameworks. The agent is orchestrated as a state graph, where each node represents a step in the reasoning or tool-use process. The core LLM is accessed via Azure OpenAI, and the agent is designed to invoke a variety of tools to solve complex tasks.
### Tools
The agent is equipped with a rich set of tools, including:
- **Search Tools**: Wikipedia, Tavily, and Arxiv search for retrieving information from the web and scientific literature.
- **Math Tools**: Arithmetic operations, power, square root, modulus, and group theory utilities (commutativity, associativity, identity, inverses).
- **Web Scraping**: Extracts main content from arbitrary web pages.
- **Image Analysis**: Uses Azure OpenAI's vision capabilities to answer questions about images.
- **Audio Transcription**: Transcribes audio files using Whisper.
- **Code Execution**: Runs code files in various languages (Python, JS, TS, Bash, Ruby, PHP, Go) and returns output/errors.
- **Tabular Data Tools**: Summarizes, filters, and manipulates CSV, Excel, and Parquet files.
### Agent Workflow
1. **Initialization**: The agent is built using a state graph, with nodes for the LLM and tool invocation. The LLM is bound to the available tools.
2. **Receiving Questions**: The Gradio app fetches a set of questions (some with associated files) from a remote API.
3. **Processing**: For each question, the agent constructs a message history (including a system prompt and the user question/file path) and invokes the LLM. If the LLM decides a tool is needed, the appropriate tool is called and the result is fed back into the conversation.
4. **Answer Extraction**: The agent's final answer is parsed and submitted back to the evaluation server.
5. **Submission**: All answers are submitted in batch, and the results (including score and feedback) are displayed in the Gradio interface.
### Extending the Agent
- **Adding Tools**: Implement a new function in `tools.py` and decorate it with `@tool`. Add it to the `TOOLS` list in `agent.py`.
- **Modifying Logic**: Adjust the state graph in `agent.py` or the agent invocation logic in `app.py` as needed.