Final_Assignment_Template_dzianis

Sleeping

App Files Files Community

Final_Assignment_Template_dzianis / README.md

@woai

Add HybridGAIAAgent and clean up project structure

04ffb15 11 months ago

preview code

raw

history blame contribute delete

5.3 kB

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

metadata

title: Template Final Assignment
emoji: 🕵🏻‍♂️
colorFrom: indigo
colorTo: indigo
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: false
hf_oauth: true
hf_oauth_expiration_minutes: 480

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

GAIA Hybrid Agent

This repository contains a hybrid GAIA agent implementation combining universal LLM capabilities with multimodal processing.

Features

Hybrid Agent (`hybrid_agent.py`)

Universal LLM Approach: Simplified logic that trusts LLM capabilities over hardcoded rules
Multimodal Processing: Integrated Gemini API for handling various content types
Smart File Detection: Automatically detects and processes file references in questions
YouTube Integration: Processes YouTube videos with metadata and transcript extraction
Multiple Search Sources: Web, Wikipedia, and ArXiv search capabilities
Question Type Analysis: Intelligent categorization for optimal processing strategy

Supported File Types

Images: .jpg, .png, .gif, .bmp, .webp, .tiff
Audio: .mp3, .wav, .m4a, .aac, .ogg, .flac
Video: .mp4, .avi, .mov, .mkv, .webm, .wmv
Documents: .pdf, .txt, .docx
Spreadsheets: .xlsx, .xls, .csv
Code: .py, .js, .html, .css, .java, .cpp, .c
YouTube URLs: Full video processing with transcripts

Core Components

Search Tools (`search_tools.py`)

Wikipedia search via LangChain
Web search via Tavily API
ArXiv search for academic papers
Unified interface for all search operations

YouTube Tools (`youtube_tools.py`)

Video metadata extraction
Transcript extraction and processing
yt-dlp integration for comprehensive video analysis
Fallback mechanisms for various video types

LLM Integration (`llm.py`)

Gemini 2.0 Flash model integration
Retry logic for API reliability
Optimized generation settings for accuracy
Image processing capabilities

Code Agent (`code_agent.py`)

Code execution and analysis
Safe code interpretation
Support for various programming languages

Image Utils (`image_utils.py`)

Image encoding/decoding utilities
Base64 conversion functions
Image processing helpers

Usage

Running the Application

Quick Start:

python run_app.py

Direct Launch:

python app.py

Using the Agent Programmatically

from hybrid_agent import HybridGAIAAgent

agent = HybridGAIAAgent()
answer = agent("Your question here")
print(answer)

Environment Setup

Install dependencies:

pip install -r requirements.txt

Set up environment variables:

export GOOGLE_API_KEY="your_gemini_api_key"
export TAVILY_API_KEY="your_tavily_api_key"
export YOUTUBE_API_KEY="your_youtube_api_key"  # Optional

Run the application:

python run_app.py

The Gradio interface will be available at http://127.0.0.1:7860

File Structure

├── app.py                 # Main Gradio web interface
├── hybrid_agent.py        # Hybrid GAIA agent implementation
├── search_tools.py        # Search functionality (Wikipedia, Web, ArXiv)
├── youtube_tools.py       # YouTube video processing
├── llm.py                 # LLM integration with Gemini API
├── code_agent.py          # Code execution and analysis
├── image_utils.py         # Image processing utilities
├── run_app.py             # Application launcher
├── requirements.txt       # Python dependencies
├── README.md              # This file
├── YOUTUBE_GUIDE.md       # YouTube integration documentation
└── .gitattributes         # Git configuration

Key Features

Hybrid Architecture: Combines the best of universal LLM approach with specialized multimodal processing
File Availability Detection: Returns "I don't know" when required files are missing
YouTube Integration: Comprehensive video analysis with metadata and transcripts
Multiple Search Sources: Wikipedia, web search, and academic papers for comprehensive coverage
Question Type Analysis: Intelligent routing based on question characteristics
Robust Error Handling: Graceful fallbacks for various failure scenarios

Performance

The hybrid agent achieves improved performance through:

Smart Question Routing: Different strategies for different question types
Multimodal Capabilities: Proper handling of images, videos, and documents
Search Optimization: Multiple sources for better factual coverage
YouTube Processing: Advanced video analysis capabilities

Documentation

YOUTUBE_GUIDE.md - Detailed guide for YouTube integration and video processing
Inline code documentation for all major functions
Comprehensive logging for debugging and monitoring

Recent Updates

✅ Cleaned up project structure
✅ Removed outdated test files and agents
✅ Consolidated functionality into hybrid agent
✅ Improved documentation and code organization
✅ Enhanced error handling and logging