Spaces:
Configuration error
Configuration error
File size: 4,596 Bytes
2957155 3793b03 2957155 95c1b84 3793b03 95c1b84 3793b03 95c1b84 3793b03 95c1b84 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# Multi-Model Agentic AI: GAIA Benchmark Solver
This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
## 🚀 Key Features
- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
- **Advanced Toolset**:
- **Web Semantic Search**: Intelligent web browsing and information extraction.
- **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
- **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
- **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
---
## 🏗️ Architecture & Project Structure
The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
### File Map
- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
- `custom_tools.py`: Definitions of the high-level tools available to the agent.
- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
---
## 🛠️ Technical Challenges & Solutions
### 1. The "Free Plan" Resilience
The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
### 2. Video Analysis Without Video APIs
Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
### 3. Tool Optimization
To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
---
## 🚦 Getting Started
### Prerequisites
- Python 3.10+
- API Keys for: Google (AI Studio), Mistral AI, and Groq.
- A local LangFuse instance (optional, for tracing).
### Installation
1. Clone the repository:
```bash
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
cd [YOUR_SPACE_NAME]
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the app:
```bash
python app.py
```
---
## 🎓 Certification
This project was completed for the **Hugging Face Agents Course**, covering:
- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
- **Practice**: Building and deploying a functional agent capable of autonomous tool use.
---
title: Template Final Assignment
emoji: 🕵🏻♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |