felixmortas commited on
Commit
2957155
·
verified ·
1 Parent(s): 5aeeef7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -0
README.md CHANGED
@@ -1,4 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  title: Template Final Assignment
3
  emoji: 🕵🏻‍♂️
4
  colorFrom: indigo
 
1
+ # Multi-Model Agentic AI: GAIA Benchmark Solver
2
+
3
+ This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
4
+
5
+ The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
6
+
7
+ ## 🚀 Key Features
8
+
9
+ - **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
10
+ - **Advanced Toolset**:
11
+ - **Web Semantic Search**: Intelligent web browsing and information extraction.
12
+ - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
13
+ - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
14
+ - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
15
+ - **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
16
+ - **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
17
+
18
+ ---
19
+
20
+ ## 🏗️ Architecture & Project Structure
21
+
22
+ The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
23
+
24
+ ### File Map
25
+
26
+ - `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
27
+ - `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
28
+ - `custom_tools.py`: Definitions of the high-level tools available to the agent.
29
+ - `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
30
+ - `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
31
+ - `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
32
+ - `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
33
+
34
  ---
35
+
36
+ ## 🛠️ Technical Challenges & Solutions
37
+
38
+ ### 1. The "Free Plan" Resilience
39
+
40
+ The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
41
+ **Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
42
+
43
+ ### 2. Video Analysis Without Video APIs
44
+
45
+ Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
46
+
47
+ ### 3. Tool Optimization
48
+
49
+ To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
50
+
51
+ ---
52
+
53
+ ## 🚦 Getting Started
54
+
55
+ ### Prerequisites
56
+
57
+ - Python 3.10+
58
+ - API Keys for: Google (AI Studio), Mistral AI, and Groq.
59
+ - A local LangFuse instance (optional, for tracing).
60
+
61
+ ### Installation
62
+
63
+ 1. Clone the repository:
64
+ ```bash
65
+ git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
66
+ cd [YOUR_SPACE_NAME]
67
+ ```
68
+
69
+ 2. Install dependencies:
70
+ ```bash
71
+ pip install -r requirements.txt
72
+ ```
73
+
74
+ 3. Run the app:
75
+ ```bash
76
+ python app.py
77
+ ```
78
+
79
+ ---
80
+
81
+ ## 🎓 Certification
82
+
83
+ This project was completed for the **Hugging Face Agents Course**, covering:
84
+
85
+ - **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
86
+ - **Practice**: Building and deploying a functional agent capable of autonomous tool use.
87
+
88
+ ---
89
+
90
  title: Template Final Assignment
91
  emoji: 🕵🏻‍♂️
92
  colorFrom: indigo