Spaces:
Configuration error
Configuration error
| # Multi-Model Agentic AI: GAIA Benchmark Solver | |
| This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1). | |
| The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video. | |
| ## 🚀 Key Features | |
| - **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen). | |
| - **Advanced Toolset**: | |
| - **Web Semantic Search**: Intelligent web browsing and information extraction. | |
| - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets. | |
| - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs. | |
| - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection. | |
| - **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops. | |
| - **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**. | |
| --- | |
| ## 🏗️ Architecture & Project Structure | |
| The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset. | |
| ### File Map | |
| - `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner. | |
| - `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering. | |
| - `custom_tools.py`: Definitions of the high-level tools available to the agent. | |
| - `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools. | |
| - `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB. | |
| - `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs. | |
| - `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components. | |
| --- | |
| ## 🛠️ Technical Challenges & Solutions | |
| ### 1. The "Free Plan" Resilience | |
| The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas. | |
| **Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task. | |
| ### 2. Video Analysis Without Video APIs | |
| Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process. | |
| ### 3. Tool Optimization | |
| To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`. | |
| --- | |
| ## 🚦 Getting Started | |
| ### Prerequisites | |
| - Python 3.10+ | |
| - API Keys for: Google (AI Studio), Mistral AI, and Groq. | |
| - A local LangFuse instance (optional, for tracing). | |
| ### Installation | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME] | |
| cd [YOUR_SPACE_NAME] | |
| ``` | |
| 2. Install dependencies: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| 3. Run the app: | |
| ```bash | |
| python app.py | |
| ``` | |
| --- | |
| ## 🎓 Certification | |
| This project was completed for the **Hugging Face Agents Course**, covering: | |
| - **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA). | |
| - **Practice**: Building and deploying a functional agent capable of autonomous tool use. | |
| --- | |
| title: Template Final Assignment | |
| emoji: 🕵🏻♂️ | |
| colorFrom: indigo | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.25.2 | |
| app_file: app.py | |
| pinned: false | |
| hf_oauth: true | |
| # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. | |
| hf_oauth_expiration_minutes: 480 | |
| --- | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |