@woai
Add HybridGAIAAgent and clean up project structure
04ffb15

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Template Final Assignment
emoji: πŸ•΅πŸ»β€β™‚οΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

GAIA Hybrid Agent

This repository contains a hybrid GAIA agent implementation combining universal LLM capabilities with multimodal processing.

Features

Hybrid Agent (hybrid_agent.py)

  • Universal LLM Approach: Simplified logic that trusts LLM capabilities over hardcoded rules
  • Multimodal Processing: Integrated Gemini API for handling various content types
  • Smart File Detection: Automatically detects and processes file references in questions
  • YouTube Integration: Processes YouTube videos with metadata and transcript extraction
  • Multiple Search Sources: Web, Wikipedia, and ArXiv search capabilities
  • Question Type Analysis: Intelligent categorization for optimal processing strategy

Supported File Types

  • Images: .jpg, .png, .gif, .bmp, .webp, .tiff
  • Audio: .mp3, .wav, .m4a, .aac, .ogg, .flac
  • Video: .mp4, .avi, .mov, .mkv, .webm, .wmv
  • Documents: .pdf, .txt, .docx
  • Spreadsheets: .xlsx, .xls, .csv
  • Code: .py, .js, .html, .css, .java, .cpp, .c
  • YouTube URLs: Full video processing with transcripts

Core Components

Search Tools (search_tools.py)

  • Wikipedia search via LangChain
  • Web search via Tavily API
  • ArXiv search for academic papers
  • Unified interface for all search operations

YouTube Tools (youtube_tools.py)

  • Video metadata extraction
  • Transcript extraction and processing
  • yt-dlp integration for comprehensive video analysis
  • Fallback mechanisms for various video types

LLM Integration (llm.py)

  • Gemini 2.0 Flash model integration
  • Retry logic for API reliability
  • Optimized generation settings for accuracy
  • Image processing capabilities

Code Agent (code_agent.py)

  • Code execution and analysis
  • Safe code interpretation
  • Support for various programming languages

Image Utils (image_utils.py)

  • Image encoding/decoding utilities
  • Base64 conversion functions
  • Image processing helpers

Usage

Running the Application

  1. Quick Start:
python run_app.py
  1. Direct Launch:
python app.py

Using the Agent Programmatically

from hybrid_agent import HybridGAIAAgent

agent = HybridGAIAAgent()
answer = agent("Your question here")
print(answer)

Environment Setup

  1. Install dependencies:
pip install -r requirements.txt
  1. Set up environment variables:
export GOOGLE_API_KEY="your_gemini_api_key"
export TAVILY_API_KEY="your_tavily_api_key"
export YOUTUBE_API_KEY="your_youtube_api_key"  # Optional
  1. Run the application:
python run_app.py

The Gradio interface will be available at http://127.0.0.1:7860

File Structure

β”œβ”€β”€ app.py                 # Main Gradio web interface
β”œβ”€β”€ hybrid_agent.py        # Hybrid GAIA agent implementation
β”œβ”€β”€ search_tools.py        # Search functionality (Wikipedia, Web, ArXiv)
β”œβ”€β”€ youtube_tools.py       # YouTube video processing
β”œβ”€β”€ llm.py                 # LLM integration with Gemini API
β”œβ”€β”€ code_agent.py          # Code execution and analysis
β”œβ”€β”€ image_utils.py         # Image processing utilities
β”œβ”€β”€ run_app.py             # Application launcher
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ YOUTUBE_GUIDE.md       # YouTube integration documentation
└── .gitattributes         # Git configuration

Key Features

  1. Hybrid Architecture: Combines the best of universal LLM approach with specialized multimodal processing
  2. File Availability Detection: Returns "I don't know" when required files are missing
  3. YouTube Integration: Comprehensive video analysis with metadata and transcripts
  4. Multiple Search Sources: Wikipedia, web search, and academic papers for comprehensive coverage
  5. Question Type Analysis: Intelligent routing based on question characteristics
  6. Robust Error Handling: Graceful fallbacks for various failure scenarios

Performance

The hybrid agent achieves improved performance through:

  • Smart Question Routing: Different strategies for different question types
  • Multimodal Capabilities: Proper handling of images, videos, and documents
  • Search Optimization: Multiple sources for better factual coverage
  • YouTube Processing: Advanced video analysis capabilities

Documentation

  • YOUTUBE_GUIDE.md - Detailed guide for YouTube integration and video processing
  • Inline code documentation for all major functions
  • Comprehensive logging for debugging and monitoring

Recent Updates

  • βœ… Cleaned up project structure
  • βœ… Removed outdated test files and agents
  • βœ… Consolidated functionality into hybrid agent
  • βœ… Improved documentation and code organization
  • βœ… Enhanced error handling and logging