Chris commited on
Commit
959548a
·
1 Parent(s): e277613
Files changed (6) hide show
  1. .gitignore +4 -0
  2. FREE_SETUP_GUIDE.md +0 -201
  3. README.md +0 -240
  4. app.py +57 -523
  5. requirements.txt +1 -12
  6. simple_test.py +0 -134
.gitignore ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ todo.md
2
+ project_data.md
3
+ .env
4
+ questions.json
FREE_SETUP_GUIDE.md DELETED
@@ -1,201 +0,0 @@
1
- # 🆓 Free Multi-Agent System Setup Guide
2
-
3
- This guide shows how to run the multi-agent system using **only free and open-source tools** - achieving the bonus criteria!
4
-
5
- ## 🎯 Success Criteria Status
6
-
7
- | Criteria | Status | Notes |
8
- |----------|--------|-------|
9
- | ✅ Multi-agent LangGraph implementation | **COMPLETE** | Supervisor + 3 specialized agents |
10
- | ✅ Only use free tools (BONUS) | **COMPLETE** | No paid services required |
11
- | 🎯 30%+ score on GAIA benchmark | **PENDING** | Need actual submission |
12
-
13
- ## 🆓 Free Tool Options
14
-
15
- ### Option 1: LocalAI (Recommended)
16
- **Best performance, OpenAI-compatible API**
17
-
18
- ```bash
19
- # Install LocalAI
20
- curl https://localai.io/install.sh | sh
21
-
22
- # Or with Docker
23
- docker run -p 8080:8080 localai/localai:latest
24
-
25
- # Download a model
26
- local-ai run llama-3.2-1b-instruct:q4_k_m
27
- ```
28
-
29
- ### Option 2: Ollama
30
- **Easy to use, great model selection**
31
-
32
- ```bash
33
- # Install Ollama
34
- curl -fsSL https://ollama.ai/install.sh | sh
35
-
36
- # Download and run a model
37
- ollama pull llama2
38
- ollama serve
39
- ```
40
-
41
- ### Option 3: GPT4All
42
- **Desktop application with GUI**
43
-
44
- 1. Download from https://gpt4all.io/
45
- 2. Install and run
46
- 3. Download a model through the interface
47
-
48
- ### Option 4: Fallback Mode (No Installation)
49
- **Rule-based processing for common GAIA patterns**
50
-
51
- - Works immediately without any setup
52
- - Handles reversed text questions
53
- - Basic math and logic
54
- - Already achieving 66.7% on test cases!
55
-
56
- ## 🚀 Quick Start
57
-
58
- ### 1. Clone and Setup
59
- ```bash
60
- git clone <your-repo>
61
- cd Agent_Course_Final_Assignment
62
- python3 -m venv venv
63
- source venv/bin/activate
64
- pip install -r requirements.txt
65
- ```
66
-
67
- ### 2. Choose Your Free LLM (Optional)
68
-
69
- **Option A: LocalAI**
70
- ```bash
71
- # Start LocalAI
72
- docker run -d -p 8080:8080 localai/localai:latest
73
- # Set environment variable
74
- export LOCALAI_URL="http://localhost:8080"
75
- ```
76
-
77
- **Option B: Ollama**
78
- ```bash
79
- # Start Ollama
80
- ollama serve &
81
- # Download a model
82
- ollama pull llama2
83
- ```
84
-
85
- **Option C: No Setup (Fallback Mode)**
86
- ```bash
87
- # Just run - fallback mode works immediately!
88
- python3 app.py
89
- ```
90
-
91
- ### 3. Run the System
92
- ```bash
93
- python3 app.py
94
- # Open browser to http://localhost:7860
95
- # Login with HuggingFace
96
- # Click "Run Evaluation & Submit All Answers"
97
- ```
98
-
99
- ## 📊 Expected Performance
100
-
101
- | Mode | Expected Score | Setup Time | Requirements |
102
- |------|---------------|------------|--------------|
103
- | LocalAI + Models | 40-60% | 10 min | 4GB RAM, Docker |
104
- | Ollama + Models | 35-50% | 5 min | 4GB RAM |
105
- | GPT4All | 30-45% | 2 min | 4GB RAM |
106
- | **Fallback Only** | **20-30%** | **0 min** | **None!** |
107
-
108
- ## 🎯 Fallback Mode Performance
109
-
110
- Even without any LLM installation, the system handles common GAIA patterns:
111
-
112
- ```python
113
- # Test results from simple_test.py
114
- Test 1: Reversed text question ✅ Correct! (right)
115
- Test 2: Simple math ✅ Correct! (4)
116
- Test 3: Research question ❌ (needs web search)
117
-
118
- Fallback Score: 66.7% (2/3)
119
- ```
120
-
121
- ## 🔧 Troubleshooting
122
-
123
- ### Virtual Environment Issues
124
- ```bash
125
- # Remove problematic venv
126
- rm -rf venv
127
- # Create new one with system Python
128
- /usr/bin/python3 -m venv venv
129
- source venv/bin/activate
130
- pip install -r requirements.txt
131
- ```
132
-
133
- ### LocalAI Not Starting
134
- ```bash
135
- # Check if port is available
136
- netstat -tulpn | grep 8080
137
- # Try different port
138
- docker run -p 8081:8080 localai/localai:latest
139
- export LOCALAI_URL="http://localhost:8081"
140
- ```
141
-
142
- ### Ollama Issues
143
- ```bash
144
- # Check if Ollama is running
145
- curl http://localhost:11434/api/tags
146
- # Restart Ollama
147
- pkill ollama
148
- ollama serve &
149
- ```
150
-
151
- ## 🏆 Bonus Criteria Achievement
152
-
153
- This system achieves the **"Only use free tools"** bonus criteria by:
154
-
155
- 1. **Free LLMs**: LocalAI, Ollama, GPT4All (all open-source)
156
- 2. **Free APIs**: DuckDuckGo search (no API key required)
157
- 3. **Free Framework**: LangGraph, LangChain (open-source)
158
- 4. **Free Interface**: Gradio (open-source)
159
- 5. **Fallback Mode**: Works without any external dependencies
160
-
161
- ## 📈 Performance Optimization
162
-
163
- ### For Better Scores:
164
- 1. **Use LocalAI** with a good model (llama-3.2-1b-instruct)
165
- 2. **Enable web search** for research questions
166
- 3. **Add more fallback patterns** for common GAIA questions
167
-
168
- ### Current Fallback Patterns:
169
- - ✅ Reversed text detection (`"fI"` ending)
170
- - ✅ Simple math operations
171
- - ✅ Commutativity questions
172
- - ✅ File type identification
173
- - ✅ Research question guidance
174
-
175
- ## 🎉 Submission
176
-
177
- The system can submit from:
178
- - ✅ Local machine (no deployment needed)
179
- - ✅ Hugging Face Spaces (optional)
180
- - ✅ Any environment with internet access
181
-
182
- ## 💡 Next Steps
183
-
184
- 1. **Test locally**: `python3 simple_test.py`
185
- 2. **Run full system**: `python3 app.py`
186
- 3. **Submit answers**: Use Gradio interface
187
- 4. **Check score**: Should achieve 30%+ even in fallback mode
188
- 5. **Optimize**: Add more patterns or install free LLM
189
-
190
- ## 🌟 Why This Approach Rocks
191
-
192
- - **🆓 Completely free** - no paid services
193
- - **🚀 Works immediately** - fallback mode needs no setup
194
- - **📈 Scalable** - can add free LLMs for better performance
195
- - **🏆 Bonus criteria** - "only use free tools" achieved
196
- - **🔧 Flexible** - works locally or deployed
197
- - **📊 Measurable** - clear path to 30%+ score
198
-
199
- ---
200
-
201
- **Ready to achieve the success criteria with zero cost? Let's go! 🚀**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md DELETED
@@ -1,240 +0,0 @@
1
- ---
2
- title: Advanced Multi-Agent System for GAIA Benchmark
3
- emoji: 🤖
4
- colorFrom: indigo
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.31.0
8
- app_file: app.py
9
- pinned: false
10
- hf_oauth: true
11
- # optional, default duration is 8 hours/480 minutes. Max duration is 30 days/43200 minutes.
12
- hf_oauth_expiration_minutes: 480
13
- ---
14
-
15
- # Advanced Multi-Agent System for GAIA Benchmark
16
-
17
- This project implements a sophisticated multi-agent system using **LangGraph** to tackle the GAIA (General AI Assistant) benchmark questions. The system achieves intelligent task routing and specialized processing through a supervisor-agent architecture.
18
-
19
- ## 🏗️ Architecture Overview
20
-
21
- ### Multi-Agent Design Pattern
22
-
23
- The system follows a **supervisor pattern** with specialized worker agents:
24
-
25
- ```
26
- ┌─────────────────┐
27
- │ Supervisor │ ← Routes tasks to appropriate agents
28
- │ Agent │
29
- └─────────┬───────┘
30
-
31
- ┌─────┴─────┐
32
- │ │
33
- ▼ ▼
34
- ┌─────────┐ ┌─────────┐ ┌─────────┐
35
- │Research │ │Reasoning│ │ File │
36
- │ Agent │ │ Agent │ │ Agent │
37
- └─────────┘ └─────────┘ └─────────┘
38
- ```
39
-
40
- ### Agent Specializations
41
-
42
- 1. **Supervisor Agent**
43
- - Routes incoming tasks to appropriate specialized agents
44
- - Manages workflow and coordination between agents
45
- - Makes decisions based on task content and requirements
46
-
47
- 2. **Research Agent**
48
- - Handles web searches and information gathering
49
- - Processes Wikipedia queries and YouTube analysis
50
- - Uses DuckDuckGo search for reliable information retrieval
51
-
52
- 3. **Reasoning Agent**
53
- - Processes mathematical and logical problems
54
- - Handles text analysis including reversed text puzzles
55
- - Manages set theory and pattern recognition tasks
56
-
57
- 4. **File Agent**
58
- - Analyzes various file types (images, audio, documents, code)
59
- - Provides structured analysis for multimedia content
60
- - Handles spreadsheets and code execution requirements
61
-
62
- ## 🛠️ Technical Implementation
63
-
64
- ### Core Technologies
65
-
66
- - **LangGraph**: Multi-agent orchestration framework
67
- - **LangChain**: LLM integration and tool management
68
- - **OpenAI GPT-4**: Primary language model for reasoning
69
- - **Gradio**: Web interface for interaction and submission
70
- - **DuckDuckGo**: Web search capabilities
71
-
72
- ### Key Features
73
-
74
- #### 1. Intelligent Task Classification
75
- ```python
76
- def _classify_task(self, question: str, file_name: str) -> str:
77
- """Classify tasks based on content and file presence"""
78
- if file_name:
79
- return "file_analysis"
80
- elif any(keyword in question_lower for keyword in ["wikipedia", "search"]):
81
- return "research"
82
- elif any(keyword in question_lower for keyword in ["math", "logic"]):
83
- return "reasoning"
84
- # ... additional classification logic
85
- ```
86
-
87
- #### 2. Handoff Mechanism
88
- The system uses LangGraph's `Command` primitive for seamless agent transitions:
89
- ```python
90
- @tool
91
- def create_handoff_tool(*, agent_name: str, description: str | None = None):
92
- def handoff_tool(state, tool_call_id) -> Command:
93
- return Command(
94
- goto=agent_name,
95
- update={"messages": state["messages"] + [tool_message]},
96
- graph=Command.PARENT,
97
- )
98
- return handoff_tool
99
- ```
100
-
101
- #### 3. Fallback Processing
102
- When OpenAI API is unavailable, the system includes rule-based fallback processing:
103
- - Reversed text detection and processing
104
- - Basic mathematical reasoning
105
- - File type identification and guidance
106
-
107
- ## 📊 GAIA Benchmark Performance
108
-
109
- ### Question Types Handled
110
-
111
- 1. **Research Questions**
112
- - Wikipedia information retrieval
113
- - YouTube video analysis
114
- - General web search queries
115
- - Historical and factual questions
116
-
117
- 2. **Logic & Reasoning**
118
- - Reversed text puzzles
119
- - Mathematical calculations
120
- - Set theory problems (commutativity, etc.)
121
- - Pattern recognition
122
-
123
- 3. **File Analysis**
124
- - Image analysis (chess positions, visual content)
125
- - Audio processing (speech-to-text requirements)
126
- - Code execution and analysis
127
- - Spreadsheet data processing
128
-
129
- 4. **Multi-step Problems**
130
- - Complex queries requiring multiple agents
131
- - Sequential reasoning tasks
132
- - Cross-domain problem solving
133
-
134
- ### Example Question Processing
135
-
136
- **Reversed Text Question:**
137
- ```
138
- Input: ".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI"
139
- Processing: Reasoning Agent → Text Analysis Tool → "right"
140
- ```
141
-
142
- **Research Question:**
143
- ```
144
- Input: "Who nominated the only Featured Article on English Wikipedia about a dinosaur promoted in November 2016?"
145
- Processing: Supervisor → Research Agent → Web Search → Detailed Answer
146
- ```
147
-
148
- ## 🚀 Deployment
149
-
150
- ### Hugging Face Spaces
151
-
152
- The system is designed for deployment on Hugging Face Spaces with:
153
- - Automatic dependency installation
154
- - OAuth integration for user authentication
155
- - Real-time processing and submission to GAIA API
156
- - Comprehensive result tracking and display
157
-
158
- ### Environment Variables
159
-
160
- Required for full functionality:
161
- ```bash
162
- OPENAI_API_KEY=your_openai_api_key_here
163
- SPACE_ID=your_huggingface_space_id
164
- ```
165
-
166
- ### Local Development
167
-
168
- 1. Clone the repository
169
- 2. Set up virtual environment:
170
- ```bash
171
- python3 -m venv venv
172
- source venv/bin/activate
173
- ```
174
- 3. Install dependencies:
175
- ```bash
176
- pip install -r requirements.txt
177
- ```
178
- 4. Run the application:
179
- ```bash
180
- python app.py
181
- ```
182
-
183
- ## 📈 Performance Optimization
184
-
185
- ### Scoring Strategy
186
-
187
- The system aims for **30%+ accuracy** on the GAIA benchmark through:
188
-
189
- 1. **Intelligent Routing**: Questions are automatically routed to the most appropriate specialist agent
190
- 2. **Tool Specialization**: Each agent has access to tools optimized for their domain
191
- 3. **Fallback Mechanisms**: Rule-based processing when LLM services are unavailable
192
- 4. **Error Handling**: Robust error management and graceful degradation
193
-
194
- ### Bonus Features
195
-
196
- - **LangSmith Integration**: Ready for observability and monitoring
197
- - **Free Tools Only**: Uses only free/open-source tools for accessibility
198
- - **Extensible Architecture**: Easy to add new agents and capabilities
199
-
200
- ## 🔧 Configuration
201
-
202
- ### Agent Prompts
203
-
204
- Each agent has carefully crafted prompts for optimal performance:
205
-
206
- - **Supervisor**: Focuses on task analysis and routing decisions
207
- - **Research**: Emphasizes reliable source identification and factual accuracy
208
- - **Reasoning**: Promotes step-by-step logical analysis
209
- - **File**: Provides structured analysis frameworks for different file types
210
-
211
- ### Tool Integration
212
-
213
- Tools are integrated using LangChain's `@tool` decorator with proper error handling and type hints for reliable operation.
214
-
215
- ## 📝 Usage
216
-
217
- 1. **Login**: Authenticate with your Hugging Face account
218
- 2. **Submit**: Click "Run Evaluation & Submit All Answers"
219
- 3. **Monitor**: Watch real-time processing of questions
220
- 4. **Review**: Examine results and scoring in the interface
221
-
222
- ## 🤝 Contributing
223
-
224
- This implementation serves as a foundation for advanced multi-agent systems. Key areas for enhancement:
225
-
226
- - Additional specialized agents (e.g., code execution, image analysis)
227
- - Advanced reasoning capabilities
228
- - Integration with more powerful models
229
- - Enhanced tool ecosystem
230
-
231
- ## 📚 References
232
-
233
- - [Hugging Face Agents Course](https://huggingface.co/learn/agents-course)
234
- - [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
235
- - [GAIA Benchmark](https://huggingface.co/gaia-benchmark)
236
- - [LangChain Framework](https://python.langchain.com/docs/)
237
-
238
- ---
239
-
240
- **Note**: This system demonstrates advanced multi-agent coordination using LangGraph and represents a production-ready approach to complex AI task management.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py CHANGED
@@ -1,454 +1,34 @@
1
  import os
2
  import gradio as gr
3
  import requests
 
4
  import pandas as pd
5
- from typing import Annotated, Sequence, TypedDict, Literal
6
- from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
7
- from langchain_community.llms import LlamaCpp
8
- from langchain_community.tools import DuckDuckGoSearchRun
9
- from langchain_core.tools import tool
10
- from langgraph.graph import StateGraph, START, END, MessagesState
11
- from langgraph.prebuilt import create_react_agent, ToolNode
12
- from langgraph.types import Command
13
- from langgraph.prebuilt import InjectedState
14
- from langchain_core.tools import InjectedToolCallId
15
- import operator
16
- import json
17
- import re
18
- import base64
19
- from io import BytesIO
20
- from PIL import Image
21
- import requests
22
- from urllib.parse import urlparse
23
- import math
24
 
25
- # Configuration
 
26
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
27
 
28
- # --- State Definition ---
29
- class MultiAgentState(TypedDict):
30
- messages: Annotated[Sequence[BaseMessage], operator.add]
31
- current_task: str
32
- task_type: str
33
- file_info: dict
34
- final_answer: str
35
-
36
- # --- Tools ---
37
- @tool
38
- def web_search(query: str) -> str:
39
- """Search the web for information using DuckDuckGo."""
40
- try:
41
- search = DuckDuckGoSearchRun()
42
- results = search.run(query)
43
- return f"Search results for '{query}':\n{results}"
44
- except Exception as e:
45
- return f"Search failed: {str(e)}"
46
-
47
- @tool
48
- def analyze_text(text: str) -> str:
49
- """Analyze text for patterns, reversed text, and other linguistic features."""
50
- try:
51
- # Check for reversed text
52
- if text.endswith("fI"): # "If" reversed
53
- reversed_text = text[::-1]
54
- if "understand" in reversed_text.lower() and "left" in reversed_text.lower():
55
- return "right" # opposite of "left"
56
-
57
- # Check for other patterns
58
- if "commutative" in text.lower():
59
- return "This appears to be asking about commutativity in mathematics. Need to check if operation is commutative (a*b = b*a)."
60
-
61
- # Basic text analysis
62
- word_count = len(text.split())
63
- char_count = len(text)
64
-
65
- return f"Text analysis:\n- Word count: {word_count}\n- Character count: {char_count}\n- Content: {text[:100]}..."
66
- except Exception as e:
67
- return f"Text analysis failed: {str(e)}"
68
-
69
- @tool
70
- def mathematical_reasoning(problem: str) -> str:
71
- """Solve mathematical problems and logical reasoning tasks."""
72
- try:
73
- problem_lower = problem.lower()
74
-
75
- # Handle basic math operations
76
- if any(op in problem for op in ['+', '-', '*', '/', '=', '<', '>']):
77
- # Try to extract and solve simple math
78
- import re
79
- numbers = re.findall(r'\d+', problem)
80
- if len(numbers) >= 2:
81
- return f"Mathematical analysis of: {problem}\nExtracted numbers: {numbers}"
82
-
83
- # Handle set theory and logic problems
84
- if 'commutative' in problem.lower():
85
- return f"Analyzing commutativity in: {problem}\nThis requires checking if a*b = b*a for all elements."
86
-
87
- return f"Mathematical reasoning applied to: {problem}"
88
- except Exception as e:
89
- return f"Mathematical reasoning failed: {str(e)}"
90
-
91
- @tool
92
- def file_analyzer(file_url: str, file_type: str) -> str:
93
- """Analyze files including images, audio, documents, and code."""
94
- try:
95
- if not file_url:
96
- return "No file provided for analysis."
97
-
98
- # Handle different file types
99
- if file_type.lower() in ['png', 'jpg', 'jpeg', 'gif']:
100
- return f"Image analysis for {file_url}: This appears to be an image file that would require computer vision analysis."
101
- elif file_type.lower() in ['mp3', 'wav', 'audio']:
102
- return f"Audio analysis for {file_url}: This appears to be an audio file that would require speech-to-text processing."
103
- elif file_type.lower() in ['py', 'python']:
104
- return f"Python code analysis for {file_url}: This appears to be Python code that would need to be executed or analyzed."
105
- elif file_type.lower() in ['xlsx', 'xls', 'csv']:
106
- return f"Spreadsheet analysis for {file_url}: This appears to be a spreadsheet that would need data processing."
107
- else:
108
- return f"File analysis for {file_url} (type: {file_type}): General file analysis would be needed."
109
- except Exception as e:
110
- return f"File analysis failed: {str(e)}"
111
-
112
- # --- Agent Creation ---
113
- def create_handoff_tool(*, agent_name: str, description: str | None = None):
114
- name = f"transfer_to_{agent_name}"
115
- description = description or f"Transfer to {agent_name}"
116
-
117
- @tool(name, description=description)
118
- def handoff_tool(
119
- state: Annotated[MultiAgentState, InjectedState],
120
- tool_call_id: Annotated[str, InjectedToolCallId],
121
- ) -> Command:
122
- tool_message = {
123
- "role": "tool",
124
- "content": f"Successfully transferred to {agent_name}",
125
- "name": name,
126
- "tool_call_id": tool_call_id,
127
- }
128
- return Command(
129
- goto=agent_name,
130
- update={"messages": state["messages"] + [tool_message]},
131
- graph=Command.PARENT,
132
- )
133
- return handoff_tool
134
-
135
- # Create handoff tools
136
- transfer_to_research_agent = create_handoff_tool(
137
- agent_name="research_agent",
138
- description="Transfer to research agent for web searches and information gathering."
139
- )
140
-
141
- transfer_to_reasoning_agent = create_handoff_tool(
142
- agent_name="reasoning_agent",
143
- description="Transfer to reasoning agent for logic, math, and analytical problems."
144
- )
145
-
146
- transfer_to_file_agent = create_handoff_tool(
147
- agent_name="file_agent",
148
- description="Transfer to file agent for analyzing images, audio, documents, and code."
149
- )
150
-
151
- # --- Initialize Free LLM ---
152
- def get_free_llm():
153
- """Get a free local LLM. Returns None if not available, triggering fallback mode."""
154
- try:
155
- # Try to use LocalAI if available
156
- localai_url = os.getenv("LOCALAI_URL", "http://localhost:8080")
157
-
158
- # Test if LocalAI is available
159
- try:
160
- response = requests.get(f"{localai_url}/v1/models", timeout=5)
161
- if response.status_code == 200:
162
- print(f"LocalAI available at {localai_url}")
163
- # Use LocalAI with OpenAI-compatible interface
164
- from langchain_openai import ChatOpenAI
165
- return ChatOpenAI(
166
- base_url=f"{localai_url}/v1",
167
- api_key="not-needed", # LocalAI doesn't require API key
168
- model="gpt-3.5-turbo", # Default model name
169
- temperature=0
170
- )
171
- except:
172
- pass
173
-
174
- # Try to use Ollama if available
175
- try:
176
- response = requests.get("http://localhost:11434/api/tags", timeout=5)
177
- if response.status_code == 200:
178
- print("Ollama available at localhost:11434")
179
- from langchain_community.llms import Ollama
180
- return Ollama(model="llama2") # Default model
181
- except:
182
- pass
183
-
184
- print("No free LLM service found. Using fallback mode.")
185
- return None
186
-
187
- except Exception as e:
188
- print(f"Error initializing free LLM: {e}")
189
- return None
190
-
191
- # --- Agent Definitions ---
192
- def create_supervisor_agent():
193
- """Create the supervisor agent that routes tasks to specialized agents."""
194
- llm = get_free_llm()
195
- if not llm:
196
- return None
197
-
198
- return create_react_agent(
199
- llm,
200
- tools=[transfer_to_research_agent, transfer_to_reasoning_agent, transfer_to_file_agent],
201
- prompt=(
202
- "You are a supervisor agent managing a team of specialized agents. "
203
- "Analyze the incoming task and route it to the appropriate agent:\n"
204
- "- Research Agent: For web searches, Wikipedia queries, YouTube analysis, general information gathering\n"
205
- "- Reasoning Agent: For mathematical problems, logic puzzles, text analysis, pattern recognition\n"
206
- "- File Agent: For analyzing images, audio files, documents, spreadsheets, code files\n\n"
207
- "Choose the most appropriate agent based on the task requirements. "
208
- "If a task requires multiple agents, start with the most relevant one."
209
- ),
210
- name="supervisor"
211
- )
212
-
213
- def create_research_agent():
214
- """Create the research agent for web searches and information gathering."""
215
- llm = get_free_llm()
216
- if not llm:
217
- return None
218
-
219
- return create_react_agent(
220
- llm,
221
- tools=[web_search],
222
- prompt=(
223
- "You are a research agent specialized in finding information from the web. "
224
- "Use web search to find accurate, up-to-date information. "
225
- "Focus on reliable sources like Wikipedia, official websites, and reputable publications. "
226
- "Provide detailed, factual answers based on your research."
227
- ),
228
- name="research_agent"
229
- )
230
-
231
- def create_reasoning_agent():
232
- """Create the reasoning agent for logic and mathematical problems."""
233
- llm = get_free_llm()
234
- if not llm:
235
- return None
236
-
237
- return create_react_agent(
238
- llm,
239
- tools=[analyze_text, mathematical_reasoning],
240
- prompt=(
241
- "You are a reasoning agent specialized in logic, mathematics, and analytical thinking. "
242
- "Handle text analysis (including reversed text), mathematical problems, set theory, "
243
- "logical reasoning, and pattern recognition. "
244
- "Break down complex problems step by step and provide clear, logical solutions."
245
- ),
246
- name="reasoning_agent"
247
- )
248
-
249
- def create_file_agent():
250
- """Create the file agent for analyzing various file types."""
251
- llm = get_free_llm()
252
- if not llm:
253
- return None
254
-
255
- return create_react_agent(
256
- llm,
257
- tools=[file_analyzer],
258
- prompt=(
259
- "You are a file analysis agent specialized in processing various file types. "
260
- "Analyze images, audio files, documents, spreadsheets, and code files. "
261
- "Provide detailed analysis and extract relevant information from files. "
262
- "For files you cannot directly process, provide guidance on what analysis would be needed."
263
- ),
264
- name="file_agent"
265
- )
266
-
267
- # --- Multi-Agent System ---
268
- class MultiAgentSystem:
269
  def __init__(self):
270
- self.supervisor = create_supervisor_agent()
271
- self.research_agent = create_research_agent()
272
- self.reasoning_agent = create_reasoning_agent()
273
- self.file_agent = create_file_agent()
274
- self.graph = self._build_graph()
275
-
276
- def _build_graph(self):
277
- """Build the multi-agent graph."""
278
- if not all([self.supervisor, self.research_agent, self.reasoning_agent, self.file_agent]):
279
- return None
280
-
281
- # Create the graph
282
- workflow = StateGraph(MultiAgentState)
283
-
284
- # Add nodes
285
- workflow.add_node("supervisor", self.supervisor)
286
- workflow.add_node("research_agent", self.research_agent)
287
- workflow.add_node("reasoning_agent", self.reasoning_agent)
288
- workflow.add_node("file_agent", self.file_agent)
289
-
290
- # Add edges
291
- workflow.add_edge(START, "supervisor")
292
- workflow.add_edge("research_agent", "supervisor")
293
- workflow.add_edge("reasoning_agent", "supervisor")
294
- workflow.add_edge("file_agent", "supervisor")
295
-
296
- return workflow.compile()
297
-
298
- def process_question(self, question: str, file_name: str = "") -> str:
299
- """Process a question using the multi-agent system."""
300
- if not self.graph:
301
- # Fallback for when free LLM is not available
302
- return self._fallback_processing(question, file_name)
303
-
304
- try:
305
- # Determine task type
306
- task_type = self._classify_task(question, file_name)
307
-
308
- # Prepare initial state
309
- initial_state = {
310
- "messages": [HumanMessage(content=question)],
311
- "current_task": question,
312
- "task_type": task_type,
313
- "file_info": {"file_name": file_name},
314
- "final_answer": ""
315
- }
316
-
317
- # Run the graph
318
- result = self.graph.invoke(initial_state)
319
-
320
- # Extract the final answer from the last message
321
- if result["messages"]:
322
- last_message = result["messages"][-1]
323
- if hasattr(last_message, 'content'):
324
- return last_message.content
325
-
326
- return "Unable to process the question."
327
-
328
- except Exception as e:
329
- print(f"Error in multi-agent processing: {e}")
330
- return self._fallback_processing(question, file_name)
331
-
332
- def _classify_task(self, question: str, file_name: str) -> str:
333
- """Classify the type of task based on question content and file presence."""
334
- question_lower = question.lower()
335
-
336
- if file_name:
337
- return "file_analysis"
338
- elif any(keyword in question_lower for keyword in ["wikipedia", "search", "find", "who", "what", "when", "where"]):
339
- return "research"
340
- elif any(keyword in question_lower for keyword in ["calculate", "math", "number", "commutative", "logic"]):
341
- return "reasoning"
342
- elif "youtube.com" in question or "video" in question_lower:
343
- return "research"
344
- else:
345
- return "general"
346
-
347
- def _fallback_processing(self, question: str, file_name: str) -> str:
348
- """Enhanced fallback processing when LLM is not available."""
349
- question_lower = question.lower()
350
-
351
- # Handle reversed text (GAIA benchmark pattern)
352
- if question.endswith("fI"): # "If" reversed
353
- try:
354
- reversed_text = question[::-1]
355
- if "understand" in reversed_text.lower() and "left" in reversed_text.lower():
356
- return "right" # opposite of "left"
357
- except:
358
- pass
359
-
360
- # Handle commutativity questions
361
- if "commutative" in question_lower:
362
- if "a,b,c,d,e" in question or "table" in question_lower:
363
- return "To determine non-commutativity, look for elements where a*b ≠ b*a. Common counter-examples in such tables are typically elements like 'a' and 'd'."
364
-
365
- # Handle simple math
366
- if "2 + 2" in question or "2+2" in question:
367
- return "4"
368
-
369
- # Handle research questions with fallback
370
- if any(word in question_lower for word in ["albums", "mercedes", "sosa", "wikipedia", "who", "what", "when"]):
371
- return "This question requires web research capabilities. With a free LLM service like LocalAI or Ollama, I could search for this information."
372
-
373
- # Handle file analysis
374
- if file_name:
375
- if file_name.endswith(('.png', '.jpg', '.jpeg')):
376
- return "This image file requires computer vision analysis. Consider using free tools like BLIP or similar open-source models."
377
- elif file_name.endswith(('.mp3', '.wav')):
378
- return "This audio file requires speech-to-text processing. Consider using Whisper.cpp or similar free tools."
379
- elif file_name.endswith('.py'):
380
- return "This Python code file needs to be executed or analyzed. The code should be run in a safe environment to determine the output."
381
- elif file_name.endswith(('.xlsx', '.xls')):
382
- return "This spreadsheet requires data processing. Use pandas or similar tools to analyze the data."
383
-
384
- # Default response with helpful guidance
385
- return f"Free Multi-Agent Analysis:\n\nQuestion: {question[:100]}...\n\nTo get better results, consider:\n1. Installing LocalAI (free OpenAI alternative)\n2. Setting up Ollama with local models\n3. Using specific tools for file analysis\n\nThis system is designed to work with free, open-source tools only!"
386
-
387
- # --- Main Agent Class ---
388
- class AdvancedAgent:
389
- def __init__(self):
390
- print("Initializing Free Multi-Agent System...")
391
- print("🆓 Using only free and open-source tools!")
392
- self.multi_agent_system = MultiAgentSystem()
393
-
394
- # Check what free services are available
395
- self._check_available_services()
396
- print("Free Multi-Agent System initialized.")
397
-
398
- def _check_available_services(self):
399
- """Check what free services are available."""
400
- services = []
401
-
402
- # Check LocalAI
403
- try:
404
- response = requests.get("http://localhost:8080/v1/models", timeout=2)
405
- if response.status_code == 200:
406
- services.append("✅ LocalAI (localhost:8080)")
407
- except:
408
- services.append("❌ LocalAI not available")
409
-
410
- # Check Ollama
411
- try:
412
- response = requests.get("http://localhost:11434/api/tags", timeout=2)
413
- if response.status_code == 200:
414
- services.append("✅ Ollama (localhost:11434)")
415
- except:
416
- services.append("❌ Ollama not available")
417
-
418
- print("Available free services:")
419
- for service in services:
420
- print(f" {service}")
421
-
422
- if not any("✅" in s for s in services):
423
- print("💡 To enable full functionality, install:")
424
- print(" - LocalAI: https://github.com/mudler/LocalAI")
425
- print(" - Ollama: https://ollama.ai/")
426
- print(" - GPT4All: https://gpt4all.io/")
427
-
428
- def __call__(self, question: str, file_name: str = "") -> str:
429
- print(f"🔍 Processing question: {question[:100]}...")
430
- if file_name:
431
- print(f"📁 With file: {file_name}")
432
-
433
- try:
434
- answer = self.multi_agent_system.process_question(question, file_name)
435
- print(f"✅ Generated answer: {answer[:100]}...")
436
- return answer
437
- except Exception as e:
438
- print(f"❌ Error in agent processing: {e}")
439
- return f"Error processing question: {str(e)}"
440
-
441
- # --- Gradio Interface Functions ---
442
- def run_and_submit_all(profile: gr.OAuthProfile | None):
443
  """
444
- Fetches all questions, runs the AdvancedAgent on them, submits all answers,
445
  and displays the results.
446
  """
447
  # --- Determine HF Space Runtime URL and Repo URL ---
448
- space_id = os.getenv("SPACE_ID")
449
 
450
  if profile:
451
- username = f"{profile.username}"
452
  print(f"User logged in: {username}")
453
  else:
454
  print("User not logged in.")
@@ -458,15 +38,15 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
458
  questions_url = f"{api_url}/questions"
459
  submit_url = f"{api_url}/submit"
460
 
461
- # 1. Instantiate Agent
462
  try:
463
- agent = AdvancedAgent()
464
  except Exception as e:
465
  print(f"Error instantiating agent: {e}")
466
  return f"Error initializing agent: {e}", None
467
-
468
- agent_code = f"Free Multi-Agent System using LangGraph - Local/Open Source Only"
469
- print(f"Agent description: {agent_code}")
470
 
471
  # 2. Fetch Questions
472
  print(f"Fetching questions from: {questions_url}")
@@ -483,46 +63,29 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
483
  return f"Error fetching questions: {e}", None
484
  except requests.exceptions.JSONDecodeError as e:
485
  print(f"Error decoding JSON response from questions endpoint: {e}")
 
486
  return f"Error decoding server response for questions: {e}", None
487
  except Exception as e:
488
  print(f"An unexpected error occurred fetching questions: {e}")
489
  return f"An unexpected error occurred fetching questions: {e}", None
490
 
491
- # 3. Run Agent
492
  results_log = []
493
  answers_payload = []
494
- print(f"Running free multi-agent system on {len(questions_data)} questions...")
495
-
496
- for i, item in enumerate(questions_data):
497
  task_id = item.get("task_id")
498
  question_text = item.get("question")
499
- file_name = item.get("file_name", "")
500
-
501
  if not task_id or question_text is None:
502
  print(f"Skipping item with missing task_id or question: {item}")
503
  continue
504
-
505
- print(f"Processing question {i+1}/{len(questions_data)}: {task_id}")
506
-
507
  try:
508
- submitted_answer = agent(question_text, file_name)
509
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
510
- results_log.append({
511
- "Task ID": task_id,
512
- "Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
513
- "File": file_name,
514
- "Submitted Answer": submitted_answer[:100] + "..." if len(submitted_answer) > 100 else submitted_answer
515
- })
516
  except Exception as e:
517
- print(f"Error running agent on task {task_id}: {e}")
518
- error_answer = f"AGENT ERROR: {e}"
519
- answers_payload.append({"task_id": task_id, "submitted_answer": error_answer})
520
- results_log.append({
521
- "Task ID": task_id,
522
- "Question": question_text[:100] + "..." if len(question_text) > 100 else question_text,
523
- "File": file_name,
524
- "Submitted Answer": error_answer
525
- })
526
 
527
  if not answers_payload:
528
  print("Agent did not produce any answers to submit.")
@@ -530,7 +93,7 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
530
 
531
  # 4. Prepare Submission
532
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
533
- status_update = f"Free Multi-Agent System finished. Submitting {len(answers_payload)} answers for user '{username}'..."
534
  print(status_update)
535
 
536
  # 5. Submit
@@ -540,13 +103,11 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
540
  response.raise_for_status()
541
  result_data = response.json()
542
  final_status = (
543
- f"🎉 Submission Successful! (FREE TOOLS ONLY)\n"
544
  f"User: {result_data.get('username')}\n"
545
  f"Overall Score: {result_data.get('score', 'N/A')}% "
546
  f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
547
- f"Message: {result_data.get('message', 'No message received.')}\n\n"
548
- f"🆓 This system uses only free and open-source tools!\n"
549
- f"✅ Bonus criteria met: 'Only use free tools'"
550
  )
551
  print("Submission successful.")
552
  results_df = pd.DataFrame(results_log)
@@ -578,51 +139,31 @@ def run_and_submit_all(profile: gr.OAuthProfile | None):
578
  results_df = pd.DataFrame(results_log)
579
  return status_message, results_df
580
 
581
- # --- Build Gradio Interface ---
 
582
  with gr.Blocks() as demo:
583
- gr.Markdown("# 🆓 Free Multi-Agent System for GAIA Benchmark")
584
  gr.Markdown(
585
  """
586
- **🌟 100% Free & Open Source Multi-Agent Architecture:**
587
-
588
- This system uses **only free tools** and achieves the bonus criteria! No paid services required.
589
-
590
- **🏗️ Architecture:**
591
- - **Supervisor Agent**: Routes tasks to appropriate specialized agents
592
- - **Research Agent**: Handles web searches using free DuckDuckGo API
593
- - **Reasoning Agent**: Processes logic, math, and analytical problems
594
- - **File Agent**: Analyzes images, audio, documents, and code files
595
-
596
- **🆓 Free LLM Options Supported:**
597
- - **LocalAI**: Free OpenAI alternative (localhost:8080)
598
- - **Ollama**: Local LLM runner (localhost:11434)
599
- - **GPT4All**: Desktop LLM application
600
- - **Fallback Mode**: Rule-based processing when no LLM available
601
-
602
- **📋 Instructions:**
603
- 1. (Optional) Install LocalAI, Ollama, or GPT4All for enhanced performance
604
- 2. Log in to your Hugging Face account using the button below
605
- 3. Click 'Run Evaluation & Submit All Answers' to process all questions
606
- 4. The system will automatically route each question to the most appropriate agent
607
- 5. View your score and detailed results below
608
-
609
- **🎯 Success Criteria:**
610
- - ✅ Multi-agent model using LangGraph framework
611
- - ✅ Only free tools (bonus criteria!)
612
- - 🎯 Target: 30%+ score on GAIA benchmark
613
-
614
- **💡 Performance Notes:**
615
- - With free LLMs: Enhanced reasoning and research capabilities
616
- - Fallback mode: Rule-based processing for common GAIA patterns
617
- - All processing happens locally or uses free APIs only
618
  """
619
  )
620
 
621
  gr.LoginButton()
622
 
623
- run_button = gr.Button("🚀 Run Evaluation & Submit All Answers (FREE TOOLS ONLY)", variant="primary")
624
 
625
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
 
626
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
627
 
628
  run_button.click(
@@ -631,32 +172,25 @@ with gr.Blocks() as demo:
631
  )
632
 
633
  if __name__ == "__main__":
634
- print("\n" + "-"*50 + " 🆓 FREE Multi-Agent System Starting " + "-"*50)
635
-
636
- # Check for environment variables
637
  space_host_startup = os.getenv("SPACE_HOST")
638
- space_id_startup = os.getenv("SPACE_ID")
639
- localai_url = os.getenv("LOCALAI_URL", "http://localhost:8080")
640
 
641
  if space_host_startup:
642
  print(f"✅ SPACE_HOST found: {space_host_startup}")
643
- print(f" Runtime URL: https://{space_host_startup}.hf.space")
644
  else:
645
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
646
 
647
- if space_id_startup:
648
  print(f"✅ SPACE_ID found: {space_id_startup}")
649
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
650
- print(f" Code URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
651
  else:
652
- print("ℹ️ SPACE_ID environment variable not found (running locally?).")
653
-
654
- print(f"🆓 FREE TOOLS ONLY - No paid services required!")
655
- print(f"💡 LocalAI URL: {localai_url}")
656
- print(f"💡 Ollama URL: http://localhost:11434")
657
- print(f"✅ Bonus criteria met: 'Only use free tools'")
658
 
659
- print("-"*(100 + len(" 🆓 FREE Multi-Agent System Starting ")) + "\n")
660
 
661
- print("🚀 Launching FREE Multi-Agent System Interface...")
662
  demo.launch(debug=True, share=False)
 
1
  import os
2
  import gradio as gr
3
  import requests
4
+ import inspect
5
  import pandas as pd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
 
7
+ # (Keep Constants as is)
8
+ # --- Constants ---
9
  DEFAULT_API_URL = "https://agents-course-unit4-scoring.hf.space"
10
 
11
+ # --- Basic Agent Definition ---
12
+ # ----- THIS IS WERE YOU CAN BUILD WHAT YOU WANT ------
13
+ class BasicAgent:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  def __init__(self):
15
+ print("BasicAgent initialized.")
16
+ def __call__(self, question: str) -> str:
17
+ print(f"Agent received question (first 50 chars): {question[:50]}...")
18
+ fixed_answer = "This is a default answer."
19
+ print(f"Agent returning fixed answer: {fixed_answer}")
20
+ return fixed_answer
21
+
22
+ def run_and_submit_all( profile: gr.OAuthProfile | None):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  """
24
+ Fetches all questions, runs the BasicAgent on them, submits all answers,
25
  and displays the results.
26
  """
27
  # --- Determine HF Space Runtime URL and Repo URL ---
28
+ space_id = os.getenv("SPACE_ID") # Get the SPACE_ID for sending link to the code
29
 
30
  if profile:
31
+ username= f"{profile.username}"
32
  print(f"User logged in: {username}")
33
  else:
34
  print("User not logged in.")
 
38
  questions_url = f"{api_url}/questions"
39
  submit_url = f"{api_url}/submit"
40
 
41
+ # 1. Instantiate Agent ( modify this part to create your agent)
42
  try:
43
+ agent = BasicAgent()
44
  except Exception as e:
45
  print(f"Error instantiating agent: {e}")
46
  return f"Error initializing agent: {e}", None
47
+ # In the case of an app running as a hugging Face space, this link points toward your codebase ( usefull for others so please keep it public)
48
+ agent_code = f"https://huggingface.co/spaces/{space_id}/tree/main"
49
+ print(agent_code)
50
 
51
  # 2. Fetch Questions
52
  print(f"Fetching questions from: {questions_url}")
 
63
  return f"Error fetching questions: {e}", None
64
  except requests.exceptions.JSONDecodeError as e:
65
  print(f"Error decoding JSON response from questions endpoint: {e}")
66
+ print(f"Response text: {response.text[:500]}")
67
  return f"Error decoding server response for questions: {e}", None
68
  except Exception as e:
69
  print(f"An unexpected error occurred fetching questions: {e}")
70
  return f"An unexpected error occurred fetching questions: {e}", None
71
 
72
+ # 3. Run your Agent
73
  results_log = []
74
  answers_payload = []
75
+ print(f"Running agent on {len(questions_data)} questions...")
76
+ for item in questions_data:
 
77
  task_id = item.get("task_id")
78
  question_text = item.get("question")
 
 
79
  if not task_id or question_text is None:
80
  print(f"Skipping item with missing task_id or question: {item}")
81
  continue
 
 
 
82
  try:
83
+ submitted_answer = agent(question_text)
84
  answers_payload.append({"task_id": task_id, "submitted_answer": submitted_answer})
85
+ results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": submitted_answer})
 
 
 
 
 
86
  except Exception as e:
87
+ print(f"Error running agent on task {task_id}: {e}")
88
+ results_log.append({"Task ID": task_id, "Question": question_text, "Submitted Answer": f"AGENT ERROR: {e}"})
 
 
 
 
 
 
 
89
 
90
  if not answers_payload:
91
  print("Agent did not produce any answers to submit.")
 
93
 
94
  # 4. Prepare Submission
95
  submission_data = {"username": username.strip(), "agent_code": agent_code, "answers": answers_payload}
96
+ status_update = f"Agent finished. Submitting {len(answers_payload)} answers for user '{username}'..."
97
  print(status_update)
98
 
99
  # 5. Submit
 
103
  response.raise_for_status()
104
  result_data = response.json()
105
  final_status = (
106
+ f"Submission Successful!\n"
107
  f"User: {result_data.get('username')}\n"
108
  f"Overall Score: {result_data.get('score', 'N/A')}% "
109
  f"({result_data.get('correct_count', '?')}/{result_data.get('total_attempted', '?')} correct)\n"
110
+ f"Message: {result_data.get('message', 'No message received.')}"
 
 
111
  )
112
  print("Submission successful.")
113
  results_df = pd.DataFrame(results_log)
 
139
  results_df = pd.DataFrame(results_log)
140
  return status_message, results_df
141
 
142
+
143
+ # --- Build Gradio Interface using Blocks ---
144
  with gr.Blocks() as demo:
145
+ gr.Markdown("# Basic Agent Evaluation Runner")
146
  gr.Markdown(
147
  """
148
+ **Instructions:**
149
+
150
+ 1. Please clone this space, then modify the code to define your agent's logic, the tools, the necessary packages, etc ...
151
+ 2. Log in to your Hugging Face account using the button below. This uses your HF username for submission.
152
+ 3. Click 'Run Evaluation & Submit All Answers' to fetch questions, run your agent, submit answers, and see the score.
153
+
154
+ ---
155
+ **Disclaimers:**
156
+ Once clicking on the "submit button, it can take quite some time ( this is the time for the agent to go through all the questions).
157
+ This space provides a basic setup and is intentionally sub-optimal to encourage you to develop your own, more robust solution. For instance for the delay process of the submit button, a solution could be to cache the answers and submit in a seperate action or even to answer the questions in async.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
  """
159
  )
160
 
161
  gr.LoginButton()
162
 
163
+ run_button = gr.Button("Run Evaluation & Submit All Answers")
164
 
165
  status_output = gr.Textbox(label="Run Status / Submission Result", lines=5, interactive=False)
166
+ # Removed max_rows=10 from DataFrame constructor
167
  results_table = gr.DataFrame(label="Questions and Agent Answers", wrap=True)
168
 
169
  run_button.click(
 
172
  )
173
 
174
  if __name__ == "__main__":
175
+ print("\n" + "-"*30 + " App Starting " + "-"*30)
176
+ # Check for SPACE_HOST and SPACE_ID at startup for information
 
177
  space_host_startup = os.getenv("SPACE_HOST")
178
+ space_id_startup = os.getenv("SPACE_ID") # Get SPACE_ID at startup
 
179
 
180
  if space_host_startup:
181
  print(f"✅ SPACE_HOST found: {space_host_startup}")
182
+ print(f" Runtime URL should be: https://{space_host_startup}.hf.space")
183
  else:
184
  print("ℹ️ SPACE_HOST environment variable not found (running locally?).")
185
 
186
+ if space_id_startup: # Print repo URLs if SPACE_ID is found
187
  print(f"✅ SPACE_ID found: {space_id_startup}")
188
  print(f" Repo URL: https://huggingface.co/spaces/{space_id_startup}")
189
+ print(f" Repo Tree URL: https://huggingface.co/spaces/{space_id_startup}/tree/main")
190
  else:
191
+ print("ℹ️ SPACE_ID environment variable not found (running locally?). Repo URL cannot be determined.")
 
 
 
 
 
192
 
193
+ print("-"*(60 + len(" App Starting ")) + "\n")
194
 
195
+ print("Launching Gradio Interface for Basic Agent Evaluation...")
196
  demo.launch(debug=True, share=False)
requirements.txt CHANGED
@@ -1,13 +1,2 @@
1
  gradio
2
- requests
3
- langgraph
4
- langchain
5
- langchain-community
6
- langchain-core
7
- python-dotenv
8
- # Free LLM integrations
9
- ollama
10
- # For local model support
11
- llama-cpp-python
12
- # Additional free tools
13
- duckduckgo-search
 
1
  gradio
2
+ requests
 
 
 
 
 
 
 
 
 
 
 
simple_test.py DELETED
@@ -1,134 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- Simple test to demonstrate local agent functionality
4
- """
5
-
6
- def test_fallback_agent():
7
- """Test the fallback processing logic without requiring imports"""
8
-
9
- print("Testing Multi-Agent System Fallback Logic...")
10
- print("=" * 50)
11
-
12
- # Test cases from GAIA benchmark
13
- test_cases = [
14
- {
15
- "question": ".rewsna eht sa \"tfel\" drow eht fo etisoppo eht etirw ,ecnetnes siht dnatsrednu uoy fI",
16
- "expected": "right",
17
- "description": "Reversed text question"
18
- },
19
- {
20
- "question": "What is 2 + 2?",
21
- "expected": "4",
22
- "description": "Simple math"
23
- },
24
- {
25
- "question": "How many albums did Mercedes Sosa release?",
26
- "expected": "research needed",
27
- "description": "Research question"
28
- }
29
- ]
30
-
31
- def classify_task(question, file_name=""):
32
- """Simple task classification"""
33
- question_lower = question.lower()
34
-
35
- if file_name:
36
- return "file_analysis"
37
- elif any(keyword in question_lower for keyword in ["wikipedia", "search", "find", "who", "what", "when", "where"]):
38
- return "research"
39
- elif any(keyword in question_lower for keyword in ["calculate", "math", "number", "commutative", "logic"]):
40
- return "reasoning"
41
- else:
42
- return "general"
43
-
44
- def fallback_processing(question, file_name=""):
45
- """Fallback processing logic"""
46
- question_lower = question.lower()
47
-
48
- # Handle reversed text
49
- if question.endswith("fI"): # "If" reversed
50
- try:
51
- reversed_text = question[::-1]
52
- if "understand" in reversed_text.lower():
53
- return "right" # opposite of "left"
54
- except:
55
- pass
56
-
57
- # Handle simple math
58
- if "2 + 2" in question:
59
- return "4"
60
-
61
- # Handle research questions
62
- if any(word in question_lower for word in ["albums", "mercedes", "sosa"]):
63
- return "This requires web research capabilities"
64
-
65
- return "I need more advanced capabilities to answer this question accurately."
66
-
67
- correct = 0
68
- total = len(test_cases)
69
-
70
- for i, test_case in enumerate(test_cases, 1):
71
- print(f"\nTest {i}: {test_case['description']}")
72
- print(f"Question: {test_case['question'][:60]}...")
73
-
74
- # Classify task
75
- task_type = classify_task(test_case['question'])
76
- print(f"Task type: {task_type}")
77
-
78
- # Process with fallback
79
- result = fallback_processing(test_case['question'])
80
- print(f"Agent answer: {result}")
81
- print(f"Expected: {test_case['expected']}")
82
-
83
- # Check if answer is reasonable
84
- if test_case['expected'].lower() in result.lower():
85
- correct += 1
86
- print("✅ Correct!")
87
- else:
88
- print("❌ Incorrect")
89
-
90
- score = (correct / total) * 100
91
- print(f"\n{'='*50}")
92
- print(f"FALLBACK SCORE: {score:.1f}% ({correct}/{total})")
93
- print(f"{'='*50}")
94
-
95
- return score
96
-
97
- def demonstrate_submission_format():
98
- """Show what a local submission would look like"""
99
- print("\nDemonstrating Local Submission Format:")
100
- print("=" * 50)
101
-
102
- # This is what we would submit
103
- submission_data = {
104
- "username": "your_hf_username",
105
- "agent_code": "Local Multi-Agent System using LangGraph with supervisor pattern",
106
- "answers": [
107
- {"task_id": "task_001", "submitted_answer": "right"},
108
- {"task_id": "task_002", "submitted_answer": "4"},
109
- {"task_id": "task_003", "submitted_answer": "Research needed"}
110
- ]
111
- }
112
-
113
- print("Submission format:")
114
- import json
115
- print(json.dumps(submission_data, indent=2))
116
-
117
- print("\n✅ This can be submitted from local machine!")
118
- print("✅ No Hugging Face Space deployment required!")
119
-
120
- if __name__ == "__main__":
121
- print("Local Multi-Agent System Test")
122
- print("=" * 50)
123
-
124
- score = test_fallback_agent()
125
- demonstrate_submission_format()
126
-
127
- print(f"\n{'='*60}")
128
- print("SUMMARY:")
129
- print(f"✅ Multi-agent system implemented with LangGraph")
130
- print(f"✅ Local testing works (fallback score: {score:.1f}%)")
131
- print(f"✅ Can submit from local machine")
132
- print(f"⚠️ Need OpenAI API key for full performance")
133
- print(f"⚠️ Need actual submission to verify 30%+ score")
134
- print(f"{'='*60}")