# Multi-Model Agentic AI: GAIA Benchmark Solver This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1). The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video. ## 🚀 Key Features - **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen). - **Advanced Toolset**: - **Web Semantic Search**: Intelligent web browsing and information extraction. - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets. - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs. - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection. - **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops. - **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**. --- ## 🏗️ Architecture & Project Structure The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset. ### File Map - `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner. - `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering. - `custom_tools.py`: Definitions of the high-level tools available to the agent. - `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools. - `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB. - `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs. - `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components. --- ## 🛠️ Technical Challenges & Solutions ### 1. The "Free Plan" Resilience The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas. **Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task. ### 2. Video Analysis Without Video APIs Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process. ### 3. Tool Optimization To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`. --- ## 🚦 Getting Started ### Prerequisites - Python 3.10+ - API Keys for: Google (AI Studio), Mistral AI, and Groq. - A local LangFuse instance (optional, for tracing). ### Installation 1. Clone the repository: ```bash git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME] cd [YOUR_SPACE_NAME] ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Run the app: ```bash python app.py ``` --- ## 🎓 Certification This project was completed for the **Hugging Face Agents Course**, covering: - **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA). - **Practice**: Building and deploying a functional agent capable of autonomous tool use. --- title: Template Final Assignment emoji: 🕵🏻‍♂️ colorFrom: indigo colorTo: indigo sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false hf_oauth: true # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. hf_oauth_expiration_minutes: 480 --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference