felixmortas commited on
Commit
f79be97
·
verified ·
1 Parent(s): 2957155

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -92
README.md CHANGED
@@ -1,103 +1,23 @@
1
- # Multi-Model Agentic AI: GAIA Benchmark Solver
2
 
3
- This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
4
 
5
- The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
6
 
7
- ## 🚀 Key Features
8
 
9
- - **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
10
- - **Advanced Toolset**:
11
- - **Web Semantic Search**: Intelligent web browsing and information extraction.
12
- - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
13
- - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
14
- - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
15
- - **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
16
- - **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
17
 
18
- ---
19
 
20
- ## 🏗️ Architecture & Project Structure
21
 
22
- The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
23
 
24
- ### File Map
25
 
26
- - `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
27
- - `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
28
- - `custom_tools.py`: Definitions of the high-level tools available to the agent.
29
- - `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
30
- - `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
31
- - `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
32
- - `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
33
 
34
- ---
35
 
36
- ## 🛠️ Technical Challenges & Solutions
37
-
38
- ### 1. The "Free Plan" Resilience
39
-
40
- The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
41
- **Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
42
-
43
- ### 2. Video Analysis Without Video APIs
44
-
45
- Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
46
-
47
- ### 3. Tool Optimization
48
-
49
- To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
50
-
51
- ---
52
-
53
- ## 🚦 Getting Started
54
-
55
- ### Prerequisites
56
-
57
- - Python 3.10+
58
- - API Keys for: Google (AI Studio), Mistral AI, and Groq.
59
- - A local LangFuse instance (optional, for tracing).
60
-
61
- ### Installation
62
-
63
- 1. Clone the repository:
64
- ```bash
65
- git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
66
- cd [YOUR_SPACE_NAME]
67
- ```
68
-
69
- 2. Install dependencies:
70
- ```bash
71
- pip install -r requirements.txt
72
- ```
73
-
74
- 3. Run the app:
75
- ```bash
76
- python app.py
77
- ```
78
-
79
- ---
80
-
81
- ## 🎓 Certification
82
-
83
- This project was completed for the **Hugging Face Agents Course**, covering:
84
-
85
- - **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
86
- - **Practice**: Building and deploying a functional agent capable of autonomous tool use.
87
-
88
- ---
89
-
90
- title: Template Final Assignment
91
- emoji: 🕵🏻‍♂️
92
- colorFrom: indigo
93
- colorTo: indigo
94
- sdk: gradio
95
- sdk_version: 5.25.2
96
- app_file: app.py
97
- pinned: false
98
- hf_oauth: true
99
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
100
- hf_oauth_expiration_minutes: 480
101
- ---
102
-
103
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🤖 Autonomous Agentic System GAIA Benchmark Solver
2
 
3
+ Final project for the **Hugging Face Agents Course**. I developed a high-level autonomous agent capable of solving complex, multi-step tasks from the **GAIA Benchmark** (General AI Assistants), involving real-world tool usage and multimodal reasoning.
4
 
5
+ **The concept:** A robust agentic workflow built with **LangGraph** that follows a Thought-Action-Observation cycle to decompose 20 validation queries into executable steps, navigating through technical constraints like API rate limits and data extraction challenges.
6
 
7
+ **Technical highlights:**
8
 
9
+ - **Resilient Model Orchestration:** Implemented a **fallback & routing strategy** using Gemini 2.5 Pro as the primary brain, with automatic switching to Gemini Flash, Mistral, or Groq-hosted models to bypass free-tier rate limits without interrupting the execution flow.
 
 
 
 
 
 
 
10
 
11
+ - **Advanced Tool Engineering:** Instead of overloading the context window with many small tools, I developed a `utils.py` library of complex functions. The agent uses a refined set of "Super-Tools" (Web Search, Excel manipulation, Audio Transcription, API interaction) that handle internal logic complexity autonomously.
12
 
13
+ - **Multimodal Innovation:** Engineered a **custom Video Analysis sub-agent**. Since no free direct video-to-text API was available, I built a pipeline that intelligently extracts frames and metadata to reconstruct temporal context for the LLM.
14
 
15
+ - **Custom RAG Architecture:** Integrated **ChromaDB** with a specialized retrieval algorithm optimized for the specific nuances of the GAIA dataset, ensuring the agent retrieves only the most relevant context for its reasoning steps.
16
 
17
+ - **Observability & Evaluation:** Self-hosted **LangFuse** locally to monitor traces, evaluate agent costs, and debug the Reasoning-on-Action (Re-Act) loops without incurring cloud platform fees.
18
 
19
+ - **Full-Stack Deployment:** Interface built with **Gradio** and hosted on Hugging Face Spaces, managed via Git for version control and CI/CD.
 
 
 
 
 
 
20
 
21
+ **Results:** Successfully validated 16 "Level 1" GAIA tasks, demonstrating a high degree of autonomy in tool selection and the ability to maintain long-term state across multiple reasoning cycles.
22
 
23
+ [View certification](https://cas-bridge.xethub.hf.co/xet-bridge-us/6800ea554845e4edbca48825/5348431f62a3761b560f14e536cde6005f7dcd9eeda8ac8c7d5835edebe00c15?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20260118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260118T175600Z&X-Amz-Expires=3600&X-Amz-Signature=27ccefa0283d59c99512a9117a28a66f52bfb9e73c32ffe509ae1a9dfefc4504&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=65c927db2ba32c95416eb25d&response-content-disposition=inline%3B+filename*%3DUTF-8%27%272025-07-06.png%3B+filename%3D%222025-07-06.png%22%3B&response-content-type=image%2Fpng&x-id=GetObject&Expires=1768762560&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2ODc2MjU2MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82ODAwZWE1NTQ4NDVlNGVkYmNhNDg4MjUvNTM0ODQzMWY2MmEzNzYxYjU2MGYxNGU1MzZjZGU2MDA1ZjdkY2Q5ZWVkYThhYzhjN2Q1ODM1ZWRlYmUwMGMxNSoifV19&Signature=S5%7EtuLDo36TB8V5mk8x03P2Pqo5NIOqCLS2XlFkJglZGz%7EOx6ePM8d0he166d%7E6s-KzLXenUv86%7EdSfJ8VWhDpZc7hpsrNsFqltLFYMGXAcmnflST0sZcReTqC3qx3gUlJ1H7%7Ea8geI55JvmcF36RiU-N5fQyBb-oFkOv8A47WjgEngEwSDMrGxq8FmYnKT3vDMu98HNSVQJoVDoBQG5uQxzYn2KmGTLwzWUqVHmRAMMXPoqxwCtRLsu7ZdyP1H0qQDJkD0TvTAegl3fLC2m0I1S0kSW3MQhT2SzOTOFHKKtn10lrPG7GG4iDmW487sZ7g-gU1rFoaGVezvc-W63dw__&Key-Pair-Id=K2L8F4GPSG1IFC)