Spaces:

felixmortas
/

Hf_Agent_Course_Final_Assignment

Configuration error

App Files Files Community

felixmortas commited on Jan 18

Commit

2957155

verified ·

1 Parent(s): 5aeeef7

Update README.md

Browse files

Files changed (1) hide show

README.md +88 -0

README.md CHANGED Viewed

@@ -1,4 +1,92 @@
 ---
 title: Template Final Assignment
 emoji: 🕵🏻‍♂️
 colorFrom: indigo

+# Multi-Model Agentic AI: GAIA Benchmark Solver
+This project was developed as part of the **Hugging Face Agents Course**. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the **GAIA (General AI Assistants) benchmark** (Level 1).
+The agent leverages the **Re-Act (Reasoning + Acting)** framework via **LangGraph** to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.
+## 🚀 Key Features
+- **Hybrid Multi-Model Orchestration**: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes **Gemini 2.0 Pro**, with automated failover to **Gemini 2.0 Flash**, **Mistral Large**, and various models on **Groq** (Llama 3.3, DeepSeek R1, Qwen).
+- **Advanced Toolset**:
+  - **Web Semantic Search**: Intelligent web browsing and information extraction.
+  - **Data Manipulation**: Tools for processing and analyzing Excel/CSV spreadsheets.
+  - **Audio & Video Analysis**: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
+  - **Custom RAG**: A Retrieval-Augmented Generation pipeline using **ChromaDB** for efficient context injection.
+- **Observability**: Integrated with **LangFuse** (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
+- **User Interface**: A clean, interactive UI built with **Gradio** and hosted on **Hugging Face Spaces**.
+---
+## 🏗️ Architecture & Project Structure
+The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.
+### File Map
+- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
+- `react_agent.py`: Contains the core logic for the **LangGraph** agent and the Re-Act prompt engineering.
+- `custom_tools.py`: Definitions of the high-level tools available to the agent.
+- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
+- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
+- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
+- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.
 ---
+## 🛠️ Technical Challenges & Solutions
+### 1. The "Free Plan" Resilience
+The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
+**Solution:** I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.
+### 2. Video Analysis Without Video APIs
+Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.
+### 3. Tool Optimization
+To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.
+---
+## 🚦 Getting Started
+### Prerequisites
+- Python 3.10+
+- API Keys for: Google (AI Studio), Mistral AI, and Groq.
+- A local LangFuse instance (optional, for tracing).
+### Installation
+1. Clone the repository:
+```bash
+git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
+cd [YOUR_SPACE_NAME]
+```
+2. Install dependencies:
+```bash
+pip install -r requirements.txt
+```
+3. Run the app:
+```bash
+python app.py
+```
+---
+## 🎓 Certification
+This project was completed for the **Hugging Face Agents Course**, covering:
+- **Theory**: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
+- **Practice**: Building and deploying a functional agent capable of autonomous tool use.
+---
 title: Template Final Assignment
 emoji: 🕵🏻‍♂️
 colorFrom: indigo