felixmortas's picture
Update README.md
2957155 verified

Multi-Model Agentic AI: GAIA Benchmark Solver

This project was developed as part of the Hugging Face Agents Course. It features an advanced autonomous agent designed to solve complex, multi-step tasks from the GAIA (General AI Assistants) benchmark (Level 1).

The agent leverages the Re-Act (Reasoning + Acting) framework via LangGraph to navigate through tools, manage long-form reasoning, and handle diverse data formats including web content, spreadsheets, audio, and video.

🚀 Key Features

  • Hybrid Multi-Model Orchestration: To overcome rate limits of free-tier plans, the system implements a robust fallback mechanism. It primary utilizes Gemini 2.0 Pro, with automated failover to Gemini 2.0 Flash, Mistral Large, and various models on Groq (Llama 3.3, DeepSeek R1, Qwen).
  • Advanced Toolset:
    • Web Semantic Search: Intelligent web browsing and information extraction.
    • Data Manipulation: Tools for processing and analyzing Excel/CSV spreadsheets.
    • Audio & Video Analysis: Custom-built logic to transcribe audio and analyze video content without relying on expensive, dedicated video APIs.
    • Custom RAG: A Retrieval-Augmented Generation pipeline using ChromaDB for efficient context injection.
  • Observability: Integrated with LangFuse (hosted locally) to monitor agent traces, evaluate performance, and debug the Thought-Action-Observation loops.
  • User Interface: A clean, interactive UI built with Gradio and hosted on Hugging Face Spaces.

🏗️ Architecture & Project Structure

The project is organized to separate the agent logic from the core utility functions, ensuring the agent doesn't get "confused" by an over-saturated toolset.

File Map

  • app.py: The entry point. Manages the Gradio UI, Hugging Face OAuth, and the multi-model fallback loop for the evaluation runner.
  • react_agent.py: Contains the core logic for the LangGraph agent and the Re-Act prompt engineering.
  • custom_tools.py: Definitions of the high-level tools available to the agent.
  • utils.py: The "engine room" containing complex functions (video analysis logic, audio transcription, file processing) called by the tools.
  • web_semantic_search_tool.py: Specialized module for RAG and semantic web queries using ChromaDB.
  • requirements.txt: List of dependencies including langgraph, chromadb, gradio, and model SDKs.
  • *.ipynb: Testing sandboxes for Mistral, LangChain, and agent components.

🛠️ Technical Challenges & Solutions

1. The "Free Plan" Resilience

The biggest challenge was maintaining execution during the 20-question GAIA evaluation without crashing due to API quotas. Solution: I implemented a recursive retry strategy in app.py. If one provider (e.g., Google) returns a 429 or 500 error, the agent automatically re-instantiates using a different provider (Mistral or Groq) and continues from the same task.

2. Video Analysis Without Video APIs

Since free video analysis tools are scarce, I developed a custom "Video-to-Insight" pipeline in utils.py that breaks down video tasks into manageable image and text analysis steps that standard LLMs can process.

3. Tool Optimization

To prevent the agent from losing focus, I followed the "Thin Tool, Fat Utility" pattern. Instead of giving the agent 20 simple tools, I gave it 5 powerful, "smart" tools that utilize complex logic hidden in utils.py.


🚦 Getting Started

Prerequisites

  • Python 3.10+
  • API Keys for: Google (AI Studio), Mistral AI, and Groq.
  • A local LangFuse instance (optional, for tracing).

Installation

  1. Clone the repository:
git clone https://huggingface.co/spaces/[YOUR_USERNAME]/[YOUR_SPACE_NAME]
cd [YOUR_SPACE_NAME]
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the app:
python app.py

🎓 Certification

This project was completed for the Hugging Face Agents Course, covering:

  • Theory: LLM Mechanics, Re-Act, LangGraph, RAG, and Benchmarking (GAIA).
  • Practice: Building and deploying a functional agent capable of autonomous tool use.

title: Template Final Assignment emoji: 🕵🏻‍♂️ colorFrom: indigo colorTo: indigo sdk: gradio sdk_version: 5.25.2 app_file: app.py pinned: false hf_oauth: true # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes. hf_oauth_expiration_minutes: 480

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference