Spaces:

felixmortas
/

Hf_Agent_Course_Final_Assignment

Configuration error

App Files Files Community

felixmortas commited on Apr 28

Commit

f79be97

verified ·

1 Parent(s): 2957155

Create README.md

Browse files

Files changed (1) hide show

README.md +12 -92

README.md CHANGED Viewed

@@ -1,103 +1,23 @@
-# Multi-Model Agentic AI: GAIA Benchmark Solver
-This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
-The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
-## 🚀 Key Features
-- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
-- **Advanced Toolset**:
-  - **Web Semantic Search**: Intelligent web browsing and information extraction.
-  - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
-  - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
-  - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
-- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
-- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
----
-## 🏗️ Architecture & Project Structure
-The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
-### File Map
-- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
-- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
-- `custom_tools.py`: Definitions of the high-level tools available to the agent.
-- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
-- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
-- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
-- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
----
-## 🛠️ Technical Challenges & Solutions
-### 1. The "Free Plan" Resilience
-The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
-**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
-### 2. Video Analysis Without Video APIs
-Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
-### 3. Tool Optimization
-To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
----
-## 🚦 Getting Started
-### Prerequisites
-- Python 3.10+
-- API Keys for: Google (AI Studio), Mistral AI, and Groq.
-- A local LangFuse instance (optional, for tracing).
-### Installation
-1. Clone the repository:
-```bash
-git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
-cd [YOUR_SPACE_NAME]
-```
-2. Install dependencies:
-```bash
-pip install -r requirements.txt
-```
-3. Run the app:
-```bash
-python app.py
-```
----
-## 🎓 Certification
-This project was completed for the **Hugging Face Agents Course**, covering:
-- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
-- **Practice**: Building and deploying a functional agent capable of autonomous tool use.
----
-title: Template Final Assignment
-emoji: 🕵🏻‍♂️
-colorFrom: indigo
-colorTo: indigo
-sdk: gradio
-sdk_version: 5.25.2
-app_file: app.py
-pinned: false
-hf_oauth: true
-# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
-hf_oauth_expiration_minutes: 480
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🤖 Autonomous Agentic System – GAIA Benchmark Solver
+Final project for the **Hugging Face Agents Course**. I developed a high-level autonomous agent capable of solving complex, multi-step tasks from the **GAIA Benchmark** (General AI Assistants), involving real-world tool usage and multimodal reasoning.
+**The concept:** A robust agentic workflow built with **LangGraph** that follows a Thought-Action-Observation cycle to decompose 20 validation queries into executable steps, navigating through technical constraints like API rate limits and data extraction challenges.
+**Technical highlights:**
+- **Resilient Model Orchestration:** Implemented a **fallback & routing strategy** using Gemini 2.5 Pro as the primary brain, with automatic switching to Gemini Flash, Mistral, or Groq-hosted models to bypass free-tier rate limits without interrupting the execution flow.
+- **Advanced Tool Engineering:** Instead of overloading the context window with many small tools, I developed a `utils.py` library of complex functions. The agent uses a refined set of "Super-Tools" (Web Search, Excel manipulation, Audio Transcription, API interaction) that handle internal logic complexity autonomously.
+- **Multimodal Innovation:** Engineered a **custom Video Analysis sub-agent**. Since no free direct video-to-text API was available, I built a pipeline that intelligently extracts frames and metadata to reconstruct temporal context for the LLM.
+- **Custom RAG Architecture:** Integrated **ChromaDB** with a specialized retrieval algorithm optimized for the specific nuances of the GAIA dataset, ensuring the agent retrieves only the most relevant context for its reasoning steps.
+- **Observability & Evaluation:** Self-hosted **LangFuse** locally to monitor traces, evaluate agent costs, and debug the Reasoning-on-Action (Re-Act) loops without incurring cloud platform fees.
+- **Full-Stack Deployment:** Interface built with **Gradio** and hosted on Hugging Face Spaces, managed via Git for version control and CI/CD.
+**Results:** Successfully validated 16 "Level 1" GAIA tasks, demonstrating a high degree of autonomy in tool selection and the ability to maintain long-term state across multiple reasoning cycles.
+[View certification](https://cas-bridge.xethub.hf.co/xet-bridge-us/6800ea554845e4edbca48825/5348431f62a3761b560f14e536cde6005f7dcd9eeda8ac8c7d5835edebe00c15?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20260118%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20260118T175600Z&X-Amz-Expires=3600&X-Amz-Signature=27ccefa0283d59c99512a9117a28a66f52bfb9e73c32ffe509ae1a9dfefc4504&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=65c927db2ba32c95416eb25d&response-content-disposition=inline%3B+filename*%3DUTF-8%27%272025-07-06.png%3B+filename%3D%222025-07-06.png%22%3B&response-content-type=image%2Fpng&x-id=GetObject&Expires=1768762560&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2ODc2MjU2MH19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82ODAwZWE1NTQ4NDVlNGVkYmNhNDg4MjUvNTM0ODQzMWY2MmEzNzYxYjU2MGYxNGU1MzZjZGU2MDA1ZjdkY2Q5ZWVkYThhYzhjN2Q1ODM1ZWRlYmUwMGMxNSoifV19&Signature=S5%7EtuLDo36TB8V5mk8x03P2Pqo5NIOqCLS2XlFkJglZGz%7EOx6ePM8d0he166d%7E6s-KzLXenUv86%7EdSfJ8VWhDpZc7hpsrNsFqltLFYMGXAcmnflST0sZcReTqC3qx3gUlJ1H7%7Ea8geI55JvmcF36RiU-N5fQyBb-oFkOv8A47WjgEngEwSDMrGxq8FmYnKT3vDMu98HNSVQJoVDoBQG5uQxzYn2KmGTLwzWUqVHmRAMMXPoqxwCtRLsu7ZdyP1H0qQDJkD0TvTAegl3fLC2m0I1S0kSW3MQhT2SzOTOFHKKtn10lrPG7GG4iDmW487sZ7g-gU1rFoaGVezvc-W63dw__&Key-Pair-Id=K2L8F4GPSG1IFC)