Spaces:
Build error
Build error
| title: GAIA Agent | |
| emoji: 🕵🏻♂️ | |
| colorFrom: indigo | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.25.2 | |
| app_file: app.py | |
| pinned: false | |
| hf_oauth: true | |
| # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. | |
| hf_oauth_expiration_minutes: 480 | |
| ### Final Agent HF Course | |
| This project is part of the [Hugging Face Agents Course](https://huggingface.co/learn/agents-course/unit0/introduction). For more information about the course, syllabus, and certification process, visit the [course introduction page](https://huggingface.co/learn/agents-course/unit0/introduction). | |
| You can find and try the agent in my Hugging Face Space here: [serverdaun/final_gaia_agent_hf_course](https://huggingface.co/spaces/serverdaun/final_gaia_agent_hf_course). | |
| --- | |
| ## GAIA Benchmark Target | |
| This agent is designed to participate in the [GAIA benchmark for General AI Assistants](https://huggingface.co/gaia-benchmark). GAIA is a comprehensive benchmark for evaluating the capabilities of general AI agents across a wide range of tasks. The benchmark is maintained by the Hugging Face community and features a public [leaderboard](https://huggingface.co/spaces/gaia-benchmark/leaderboard) for submissions and results. | |
| For more information about GAIA, its datasets, and the leaderboard, visit the [GAIA organization page](https://huggingface.co/gaia-benchmark). | |
| ## Agent Logic Overview | |
| ### Architecture | |
| This project implements a modular agent using [LangGraph](https://github.com/langchain-ai/langgraph) and [LangChain](https://github.com/langchain-ai/langchain) frameworks. The agent is orchestrated as a state graph, where each node represents a step in the reasoning or tool-use process. The core LLM is accessed via Azure OpenAI, and the agent is designed to invoke a variety of tools to solve complex tasks. | |
| ### Tools | |
| The agent is equipped with a rich set of tools, including: | |
| - **Search Tools**: Wikipedia, Tavily, and Arxiv search for retrieving information from the web and scientific literature. | |
| - **Math Tools**: Arithmetic operations, power, square root, modulus, and group theory utilities (commutativity, associativity, identity, inverses). | |
| - **Web Scraping**: Extracts main content from arbitrary web pages. | |
| - **Image Analysis**: Uses Azure OpenAI's vision capabilities to answer questions about images. | |
| - **Audio Transcription**: Transcribes audio files using Whisper. | |
| - **Code Execution**: Runs code files in various languages (Python, JS, TS, Bash, Ruby, PHP, Go) and returns output/errors. | |
| - **Tabular Data Tools**: Summarizes, filters, and manipulates CSV, Excel, and Parquet files. | |
| ### Agent Workflow | |
| 1. **Initialization**: The agent is built using a state graph, with nodes for the LLM and tool invocation. The LLM is bound to the available tools. | |
| 2. **Receiving Questions**: The Gradio app fetches a set of questions (some with associated files) from a remote API. | |
| 3. **Processing**: For each question, the agent constructs a message history (including a system prompt and the user question/file path) and invokes the LLM. If the LLM decides a tool is needed, the appropriate tool is called and the result is fed back into the conversation. | |
| 4. **Answer Extraction**: The agent's final answer is parsed and submitted back to the evaluation server. | |
| 5. **Submission**: All answers are submitted in batch, and the results (including score and feedback) are displayed in the Gradio interface. | |
| ### Extending the Agent | |
| - **Adding Tools**: Implement a new function in `tools.py` and decorate it with `@tool`. Add it to the `TOOLS` list in `agent.py`. | |
| - **Modifying Logic**: Adjust the state graph in `agent.py` or the agent invocation logic in `app.py` as needed. |