Spaces:
Configuration error
Configuration error
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
title: Template Final Assignment
|
| 3 |
emoji: 🕵🏻♂️
|
| 4 |
colorFrom: indigo
|
|
|
|
| 1 |
+
# Multi-Model Agentic AI: GAIA Benchmark Solver
|
| 2 |
+
|
| 3 |
+
This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
|
| 4 |
+
|
| 5 |
+
The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
|
| 6 |
+
|
| 7 |
+
## 🚀 Key Features
|
| 8 |
+
|
| 9 |
+
- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
|
| 10 |
+
- **Advanced Toolset**:
|
| 11 |
+
- **Web Semantic Search**: Intelligent web browsing and information extraction.
|
| 12 |
+
- **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
|
| 13 |
+
- **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
|
| 14 |
+
- **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
|
| 15 |
+
- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
|
| 16 |
+
- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## 🏗️ Architecture & Project Structure
|
| 21 |
+
|
| 22 |
+
The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
|
| 23 |
+
|
| 24 |
+
### File Map
|
| 25 |
+
|
| 26 |
+
- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
|
| 27 |
+
- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
|
| 28 |
+
- `custom_tools.py`: Definitions of the high-level tools available to the agent.
|
| 29 |
+
- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
|
| 30 |
+
- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
|
| 31 |
+
- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
|
| 32 |
+
- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
|
| 33 |
+
|
| 34 |
---
|
| 35 |
+
|
| 36 |
+
## 🛠️ Technical Challenges & Solutions
|
| 37 |
+
|
| 38 |
+
### 1. The "Free Plan" Resilience
|
| 39 |
+
|
| 40 |
+
The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
|
| 41 |
+
**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
|
| 42 |
+
|
| 43 |
+
### 2. Video Analysis Without Video APIs
|
| 44 |
+
|
| 45 |
+
Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
|
| 46 |
+
|
| 47 |
+
### 3. Tool Optimization
|
| 48 |
+
|
| 49 |
+
To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## 🚦 Getting Started
|
| 54 |
+
|
| 55 |
+
### Prerequisites
|
| 56 |
+
|
| 57 |
+
- Python 3.10+
|
| 58 |
+
- API Keys for: Google (AI Studio), Mistral AI, and Groq.
|
| 59 |
+
- A local LangFuse instance (optional, for tracing).
|
| 60 |
+
|
| 61 |
+
### Installation
|
| 62 |
+
|
| 63 |
+
1. Clone the repository:
|
| 64 |
+
```bash
|
| 65 |
+
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
|
| 66 |
+
cd [YOUR_SPACE_NAME]
|
| 67 |
+
```
|
| 68 |
+
|
| 69 |
+
2. Install dependencies:
|
| 70 |
+
```bash
|
| 71 |
+
pip install -r requirements.txt
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
3. Run the app:
|
| 75 |
+
```bash
|
| 76 |
+
python app.py
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
+
|
| 81 |
+
## 🎓 Certification
|
| 82 |
+
|
| 83 |
+
This project was completed for the **Hugging Face Agents Course**, covering:
|
| 84 |
+
|
| 85 |
+
- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
|
| 86 |
+
- **Practice**: Building and deploying a functional agent capable of autonomous tool use.
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
title: Template Final Assignment
|
| 91 |
emoji: 🕵🏻♂️
|
| 92 |
colorFrom: indigo
|