GAIA_Agent

Sleeping

App Files Files Community

GAIA_Agent / README.md

nikhmr1235

update gemini LLM model used for core-reasoning

2803c39 verified 4 months ago

preview code

raw

history blame contribute delete

4.66 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Leaderboard Achievement

This project earned a top position on the official Hugging Face Agents Course Leaderboard, achieving 75% accuracy on the GAIA Level 1 benchmark.

You can view the full, live leaderboard here.

Note on Live Demo

The live demo on Hugging Face Spaces is currently partially functional due to dependency changes in the original course environment. However, the core agent logic and the code that achieved the leaderboard result are fully available in this repository.

GAIA Agent: A Fact-Based Reasoning Agent

This Hugging Face Space hosts a sophisticated AI agent designed for fact-based question answering, inspired by the reasoning challenges of the GAIA benchmark. The agent leverages a powerful language model (Google's Gemini 2.5 flash) and a suite of tools to answer complex questions that may require multi-step reasoning, web searches, data parsing, and analysis of various file types.

About the Agent

The core of this project is a ReAct-style agent built with LangChain. It's designed to be robust, resilient, and versatile. The agent can:

Reason and plan: Break down complex questions into a sequence of smaller, manageable steps.
Use tools: Seamlessly use a variety of tools to gather and process information.
Handle different data formats: Work with text, images, audio, and structured data like Excel files.
Recover from errors: Retry API calls and handle tool errors gracefully.

How It Works

The agent's architecture consists of several key components:

Gradio UI (app.py): A simple web interface to run the agent against a predefined set of evaluation questions. It handles user authentication, question fetching, answer submission, and displays the final score.
Agent Logic (agent.py): The BasicAgent class wraps a LangChain AgentExecutor. It includes logic for retrying API calls with exponential backoff in case of rate limits or temporary server issues.
Language Model (llm_rotator.py): The agent uses Google's gemini-1.5-pro model. The ApiKeyRotator manages a pool of Google API keys to distribute the load and avoid rate limiting.
Prompt Template (prompt.py): A highly detailed prompt template guides the agent's reasoning process. It enforces a strict ReAct format, a tool usage strategy, and extensive error handling protocols.
Tools (tools.py): A collection of tools that the agent can use to interact with the outside world and process information.

How to Use

Login: Click the "Login with Hugging Face" button to authenticate. Your HF username is used to submit your answers for evaluation.
Run Evaluation: Click the "Run Evaluation & Submit All Answers" button. This will trigger the following process:
- The application fetches a set of questions from the evaluation server.
- The agent processes each question one by one. This may take some time, as the agent might perform web searches, download files, or other time-intensive tasks.
- The agent's answers are collected and submitted to the evaluation server.
- The final score and a summary of the agent's answers are displayed.

Required Setup

To run this space, you need to provide the following API keys as Hugging Face secrets:

GOOGLE_API_KEYS: A comma-separated list of Google AI Studio API keys.
SERP_API_KEY: Your API key from SerpApi.
TAVILY_API_KEY: Your API key from Tavily AI.

Available Tools

The agent has access to the following tools:

python_repl: Executes Python code for calculations, data manipulation, and scripting.
tavily_search: A precise search engine for real-time, factual information.
serpapi_Google_Search: A broader Google search tool for more comprehensive results.
wikipedia_search_tool2: Searches Wikipedia for encyclopedic knowledge.
file_saver: Downloads files from URLs to the local environment.
audio_transcriber_tool: Transcribes audio files into text.
gemini_multimodal_tool: Analyzes the content of images to answer questions.