|
|
---
|
|
|
title: GAIA Benchmark Agent
|
|
|
emoji: 🧠
|
|
|
colorFrom: blue
|
|
|
colorTo: indigo
|
|
|
sdk: gradio
|
|
|
sdk_version: 5.25.2
|
|
|
app_file: app.py
|
|
|
pinned: false
|
|
|
hf_oauth: true
|
|
|
hf_oauth_expiration_minutes: 480
|
|
|
---
|
|
|
|
|
|
# GAIA Benchmark Agent
|
|
|
|
|
|
This Hugging Face Space hosts a GAIA (General AI Assistant) benchmark agent designed to solve certification challenges across various domains of AI and machine learning.
|
|
|
|
|
|
## Features
|
|
|
|
|
|
- Processes questions from the GAIA benchmark
|
|
|
- Uses LangChain and OpenAI's language models
|
|
|
- Analyzes questions and identifies their types
|
|
|
- Retrieves relevant context when needed
|
|
|
- Generates accurate, well-reasoned answers
|
|
|
- Integrates with external information sources:
|
|
|
- SerpAPI for real-time web search capabilities
|
|
|
- YouTube for video content search and transcript analysis
|
|
|
- Tavily for AI-optimized search results
|
|
|
- Audio processing for speech-to-text conversion and analysis
|
|
|
|
|
|
## Usage
|
|
|
|
|
|
1. Log in to your Hugging Face account using the button
|
|
|
2. Click 'Run Evaluation & Submit All Answers' to:
|
|
|
- Fetch questions from the GAIA benchmark
|
|
|
- Run the agent on all questions
|
|
|
- Submit answers and see your score
|
|
|
|
|
|
## Implementation Details
|
|
|
|
|
|
The agent uses a modular architecture with specialized handlers for different question types:
|
|
|
- Factual knowledge questions
|
|
|
- Technical implementation questions
|
|
|
- Mathematical questions
|
|
|
- Context-based analysis questions
|
|
|
- Ethical/societal impact questions
|
|
|
- Media content questions (videos, podcasts, audio recordings)
|
|
|
- Current events questions
|
|
|
- Categorization questions with enhanced botanical classification
|
|
|
|
|
|
### Botanical Classification
|
|
|
|
|
|
The agent has been enhanced with comprehensive botanical classification capabilities, allowing it to:
|
|
|
- Accurately distinguish between botanical fruits and vegetables
|
|
|
- Provide detailed explanations of botanical classifications
|
|
|
- Correctly identify commonly misclassified items (tomatoes, bell peppers, cucumbers, etc.)
|
|
|
- Explain the difference between botanical and culinary classifications
|
|
|
|
|
|
### External Information Sources
|
|
|
|
|
|
The agent can access external information to provide more accurate and up-to-date answers:
|
|
|
|
|
|
- **SerpAPI Integration**: Enables real-time web search capabilities for current events and factual information
|
|
|
- **YouTube Integration**:
|
|
|
- Search for relevant videos on specific topics
|
|
|
- Extract and analyze video transcripts for information
|
|
|
- **Tavily Search**: AI-optimized search engine that provides relevant results for complex queries
|
|
|
|
|
|
### Audio Processing Capabilities
|
|
|
|
|
|
The agent has been enhanced with audio processing capabilities, allowing it to:
|
|
|
- Transcribe audio files using OpenAI's Whisper API with Google Speech Recognition fallback
|
|
|
- Extract ingredients from recipe audio recordings
|
|
|
- Process and analyze spoken content from various audio formats
|
|
|
- Format responses according to user requests for audio content
|
|
|
|
|
|
### API Keys Configuration
|
|
|
|
|
|
To use the external information sources, you need to set the following API keys in your environment:
|
|
|
- `SERPAPI_API_KEY`: For web search capabilities
|
|
|
- `YOUTUBE_API_KEY`: For YouTube video search and transcript analysis
|
|
|
- `TAVILY_API_KEY`: For AI-optimized search results
|
|
|
- `WHISPER_API_KEY`: For audio transcription (defaults to OPENAI_API_KEY if not set)
|
|
|
|
|
|
## Repository
|
|
|
|
|
|
The code for this agent is available at: https://huggingface.co/derkaal/GAIA-agent
|
|
|
|
|
|
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference |