A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Template Final Assignment
emoji: π΅π»ββοΈ
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
GAIA Hybrid Agent
This repository contains a hybrid GAIA agent implementation combining universal LLM capabilities with multimodal processing.
Features
Hybrid Agent (hybrid_agent.py)
- Universal LLM Approach: Simplified logic that trusts LLM capabilities over hardcoded rules
- Multimodal Processing: Integrated Gemini API for handling various content types
- Smart File Detection: Automatically detects and processes file references in questions
- YouTube Integration: Processes YouTube videos with metadata and transcript extraction
- Multiple Search Sources: Web, Wikipedia, and ArXiv search capabilities
- Question Type Analysis: Intelligent categorization for optimal processing strategy
Supported File Types
- Images:
.jpg,.png,.gif,.bmp,.webp,.tiff - Audio:
.mp3,.wav,.m4a,.aac,.ogg,.flac - Video:
.mp4,.avi,.mov,.mkv,.webm,.wmv - Documents:
.pdf,.txt,.docx - Spreadsheets:
.xlsx,.xls,.csv - Code:
.py,.js,.html,.css,.java,.cpp,.c - YouTube URLs: Full video processing with transcripts
Core Components
Search Tools (search_tools.py)
- Wikipedia search via LangChain
- Web search via Tavily API
- ArXiv search for academic papers
- Unified interface for all search operations
YouTube Tools (youtube_tools.py)
- Video metadata extraction
- Transcript extraction and processing
- yt-dlp integration for comprehensive video analysis
- Fallback mechanisms for various video types
LLM Integration (llm.py)
- Gemini 2.0 Flash model integration
- Retry logic for API reliability
- Optimized generation settings for accuracy
- Image processing capabilities
Code Agent (code_agent.py)
- Code execution and analysis
- Safe code interpretation
- Support for various programming languages
Image Utils (image_utils.py)
- Image encoding/decoding utilities
- Base64 conversion functions
- Image processing helpers
Usage
Running the Application
- Quick Start:
python run_app.py
- Direct Launch:
python app.py
Using the Agent Programmatically
from hybrid_agent import HybridGAIAAgent
agent = HybridGAIAAgent()
answer = agent("Your question here")
print(answer)
Environment Setup
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
export GOOGLE_API_KEY="your_gemini_api_key"
export TAVILY_API_KEY="your_tavily_api_key"
export YOUTUBE_API_KEY="your_youtube_api_key" # Optional
- Run the application:
python run_app.py
The Gradio interface will be available at http://127.0.0.1:7860
File Structure
βββ app.py # Main Gradio web interface
βββ hybrid_agent.py # Hybrid GAIA agent implementation
βββ search_tools.py # Search functionality (Wikipedia, Web, ArXiv)
βββ youtube_tools.py # YouTube video processing
βββ llm.py # LLM integration with Gemini API
βββ code_agent.py # Code execution and analysis
βββ image_utils.py # Image processing utilities
βββ run_app.py # Application launcher
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ YOUTUBE_GUIDE.md # YouTube integration documentation
βββ .gitattributes # Git configuration
Key Features
- Hybrid Architecture: Combines the best of universal LLM approach with specialized multimodal processing
- File Availability Detection: Returns "I don't know" when required files are missing
- YouTube Integration: Comprehensive video analysis with metadata and transcripts
- Multiple Search Sources: Wikipedia, web search, and academic papers for comprehensive coverage
- Question Type Analysis: Intelligent routing based on question characteristics
- Robust Error Handling: Graceful fallbacks for various failure scenarios
Performance
The hybrid agent achieves improved performance through:
- Smart Question Routing: Different strategies for different question types
- Multimodal Capabilities: Proper handling of images, videos, and documents
- Search Optimization: Multiple sources for better factual coverage
- YouTube Processing: Advanced video analysis capabilities
Documentation
YOUTUBE_GUIDE.md- Detailed guide for YouTube integration and video processing- Inline code documentation for all major functions
- Comprehensive logging for debugging and monitoring
Recent Updates
- β Cleaned up project structure
- β Removed outdated test files and agents
- β Consolidated functionality into hybrid agent
- β Improved documentation and code organization
- β Enhanced error handling and logging