File size: 4,596 Bytes
2957155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3793b03
2957155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
95c1b84
 
 
 
3793b03
95c1b84
3793b03
 
95c1b84
 
 
3793b03
 
95c1b84
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# Multi-Model Agentic AI: GAIA Benchmark Solver

This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).

The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.

## 🚀 Key Features

- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
- **Advanced Toolset**:
  - **Web Semantic Search**: Intelligent web browsing and information extraction.
  - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
  - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
  - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.

---

## 🏗️ Architecture & Project Structure

The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.

### File Map

- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
- `custom_tools.py`: Definitions of the high-level tools available to the agent.
- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.

---

## 🛠️ Technical Challenges & Solutions

### 1. The "Free Plan" Resilience

The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.

### 2. Video Analysis Without Video APIs

Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.

### 3. Tool Optimization

To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.

---

## 🚦 Getting Started

### Prerequisites

- Python 3.10+
- API Keys for: Google (AI Studio), Mistral AI, and Groq.
- A local LangFuse instance (optional, for tracing).

### Installation

1. Clone the repository:
```bash
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
cd [YOUR_SPACE_NAME]
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the app:
```bash
python app.py
```

---

## 🎓 Certification

This project was completed for the **Hugging Face Agents Course**, covering:

- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
- **Practice**: Building and deploying a functional agent capable of autonomous tool use.

---

title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
hf_oauth_expiration_minutes: 480
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference