Spaces:

felixmortas
/

Hf_Agent_Course_Final_Assignment

Configuration error

App Files Files Community

Hf_Agent_Course_Final_Assignment / README.md

felixmortas

Update README.md

2957155 verified 11 days ago

preview code

raw

history blame contribute delete

4.6 kB

	# Multi-Model Agentic AI: GAIA Benchmark Solver

	This project was developed as part of the Hugging Face Agents Course. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the GAIA (General AI Assistants) benchmark (Level 1).

	The agent leverages the Re-Act (Reasoning + Acting) framework via LangGraph to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.

	## 🚀 Key Features

	- Hybrid Multi-Model Orchestration: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes Gemini 2.0 Pro, with automated failover to Gemini 2.0 Flash, Mistral Large, and various models on Groq (Llama 3.3, DeepSeek R1, Qwen).
	- Advanced Toolset:
	- Web Semantic Search: Intelligent web browsing and information extraction.
	- Data Manipulation: Tools for processing and analyzing Excel/CSV spreadsheets.
	- Audio & Video Analysis: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
	- Custom RAG: A Retrieval-Augmented Generation pipeline using ChromaDB for efficient context injection.
	- Observability: Integrated with LangFuse (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
	- User Interface: A clean, interactive UI built with Gradio and hosted on Hugging Face Spaces.

	---

	## 🏗️ Architecture & Project Structure

	The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.

	### File Map

	- `app.py`: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
	- `react_agent.py`: Contains the core logic for the LangGraph agent and the Re-Act prompt engineering.
	- `custom_tools.py`: Definitions of the high-level tools available to the agent.
	- `utils.py`: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
	- `web_semantic_search_tool.py`: Specialized module for RAG and semantic web queries using ChromaDB.
	- `requirements.txt`: List of dependencies including `langgraph`, `chromadb`, `gradio`, and model SDKs.
	- `*.ipynb`: Testing sandboxes for Mistral, LangChain, and agent components.

	---

	## 🛠️ Technical Challenges & Solutions

	### 1. The "Free Plan" Resilience

	The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas.
	Solution: I implemented a recursive retry strategy in `app.py`. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.

	### 2. Video Analysis Without Video APIs

	Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in `utils.py` that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.

	### 3. Tool Optimization

	To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in `utils.py`.

	---

	## 🚦 Getting Started

	### Prerequisites

	- Python 3.10+
	- API Keys for: Google (AI Studio), Mistral AI, and Groq.
	- A local LangFuse instance (optional, for tracing).

	### Installation

	1. Clone the repository:
	```bash
	git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
	cd [YOUR_SPACE_NAME]
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the app:
	```bash
	python app.py
	```

	---

	## 🎓 Certification

	This project was completed for the Hugging Face Agents Course, covering:

	- Theory: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
	- Practice: Building and deploying a functional agent capable of autonomous tool use.

	---

	title: Template Final Assignment
	emoji: 🕵🏻‍♂️
	colorFrom: indigo
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.25.2
	app_file: app.py
	pinned: false
	hf_oauth: true
	# optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
	hf_oauth_expiration_minutes: 480
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference