felixmortas's picture
Update README.md
2957155 verified
# Multi-Model Agentic AI: GAIA Benchmark Solver
This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
## 🚀 Key Features
- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
- **Advanced Toolset**:
- **Web Semantic Search**: Intelligent web browsing and information extraction.
- **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
- **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
- **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
---
## 🏗️ Architecture & Project Structure
The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
### File Map
- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
- `custom_tools.py`: Definitions of the high-level tools available to the agent.
- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
---
## 🛠️ Technical Challenges & Solutions
### 1. The "Free Plan" Resilience
The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
### 2. Video Analysis Without Video APIs
Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
### 3. Tool Optimization
To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
---
## 🚦 Getting Started
### Prerequisites
- Python 3.10+
- API Keys for: Google (AI Studio), Mistral AI, and Groq.
- A local LangFuse instance (optional, for tracing).
### Installation
1. Clone the repository:
```bash
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
cd [YOUR_SPACE_NAME]
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the app:
```bash
python app.py
```
---
## 🎓 Certification
This project was completed for the **Hugging Face Agents Course**, covering:
- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
- **Practice**: Building and deploying a functional agent capable of autonomous tool use.
---
title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference