Spaces:
Configuration error
Multi-Model Agentic AI: GAIA Benchmark Solver
This project was developed as part of the Hugging Face Agents Course. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the GAIA (General AI Assistants) benchmark (Level 1).
The agent leverages the Re-Act (Reasoning + Acting) framework via LangGraph to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
🚀 Key Features
- Hybrid Multi-Model Orchestration: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes Gemini 2.0 Pro, with automated failover to Gemini 2.0 Flash, Mistral Large, and various models on Groq (Llama 3.3, DeepSeek R1, Qwen).
- Advanced Toolset:
- Web Semantic Search: Intelligent web browsing and information extraction.
- Data Manipulation: Tools for processing and analyzing Excel/CSV spreadsheets.
- Audio & Video Analysis: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
- Custom RAG: A Retrieval-Augmented Generation pipeline using ChromaDB for efficient context injection.
- Observability: Integrated with LangFuse (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
- User Interface: A clean, interactive UI built with Gradio and hosted on Hugging Face Spaces.
🏗️ Architecture & Project Structure
The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
File Map
app.py: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.react_agent.py: Contains the core logic for the LangGraph agent and the Re-Act prompt engineering.custom_tools.py: Definitions of the high-level tools available to the agent.utils.py: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.web_semantic_search_tool.py: Specialized module for RAG and semantic web queries using ChromaDB.requirements.txt: List of dependencies includinglanggraph,chromadb,gradio, and model SDKs.*.ipynb: Testing sandboxes for Mistral, LangChain, and agent components.
🛠️ Technical Challenges & Solutions
1. The "Free Plan" Resilience
The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
Solution: I implemented a recursive retry strategy in app.py. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
2. Video Analysis Without Video APIs
Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in utils.py that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
3. Tool Optimization
To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in utils.py.
🚦 Getting Started
Prerequisites
- Python 3.10+
- API Keys for: Google (AI Studio), Mistral AI, and Groq.
- A local LangFuse instance (optional, for tracing).
Installation
- Clone the repository:
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
cd [YOUR_SPACE_NAME]
- Install dependencies:
pip install -r requirements.txt
- Run the app:
python app.py
🎓 Certification
This project was completed for the Hugging Face Agents Course, covering:
- Theory: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
- Practice: Building and deploying a functional agent capable of autonomous tool use.
title: Template Final Assignment emoji: 🕵🏻♂️ colorFrom: indigo colorTo: indigo sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false hf_oauth: true # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. hf_oauth_expiration_minutes: 480
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference