GAIA-Langgraph

Configuration error

App Files Files Community

GAIA-Langgraph / ARCHITECTURE.md

jash0803

docs: update architecture

862a3c5 16 days ago

preview code

raw

history blame contribute delete

5.52 kB

	# Architecture

	## System Overview

	```mermaid
	graph LR
	User[User via Gradio] --> App[app.py]
	App -->\|fetch questions\| API[GAIA Scoring API]
	App -->\|"run per question (+ file_name)"\| Agent[GAIAAgent]
	Agent --> Supervisor["Supervisor (GPT-5-mini)"]
	Supervisor -->\|delegate\| WebAgent[Web Research Agent]
	Supervisor -->\|delegate\| CodeAgent[Code Execution Agent]
	Supervisor -->\|delegate\| FileAgent[File Processing Agent]
	Supervisor -->\|delegate\| MathAgent[Math Agent]
	Agent -->\|"extract FINAL ANSWER"\| App
	App -->\|submit answers\| API
	App -->\|save\| Log[submission_payload.log]
	```

	## Supervisor Routing

	The supervisor receives each question and routes based on strict rules. The `[IMPORTANT CONTEXT: ...]` marker in the prompt is the only signal for file-based routing.

	```mermaid
	flowchart TD
	Q[Incoming Question] --> Analyze[Supervisor Analyzes Question]
	Analyze --> HasMarker{"Has [IMPORTANT CONTEXT: file...] marker?"}
	HasMarker -->\|Yes\| FileAgent[File Processing Agent]
	HasMarker -->\|No\| HasYT{Contains YouTube URL?}
	HasYT -->\|Yes\| WebAgent[Web Research Agent]
	HasYT -->\|No\| Classify{Question Type?}
	Classify -->\|Facts / Search\| WebAgent
	Classify -->\|Code / Algorithm\| CodeAgent[Code Execution Agent]
	Classify -->\|Math / Calculation\| MathAgent[Math Agent]
	FileAgent --> NeedMore{Need further processing?}
	NeedMore -->\|Yes\| Classify
	NeedMore -->\|No\| Extract["Extract FINAL ANSWER"]
	WebAgent --> Extract
	CodeAgent --> Extract
	MathAgent --> Extract
	Extract --> Return[Return to App]
	```

	## Agent-Tool Mapping

	Each sub-agent is built with `create_agent` and has access to specific tools.

	```mermaid
	graph TD
	subgraph web ["Web Research Agent (GPT-5-mini)"]
	Tavily[Tavily Search]
	Wiki[Wikipedia]
	Gemini["Gemini 2.5 Pro Video"]
	end

	subgraph code ["Code Execution Agent (GPT-5-mini)"]
	PythonREPL1[Python REPL]
	end

	subgraph file ["File Processing Agent (GPT-5-mini)"]
	Download["GAIA File Downloader (HF Dataset)"]
	Excel[Excel/CSV Reader]
	Audio[Whisper Transcription]
	Vision["GPT-5-mini Vision (Responses API)"]
	TextFile[Text File Reader]
	PDF[PDF Reader]
	PythonREPL2[Python REPL]
	end

	subgraph math ["Math Agent (GPT-5-mini)"]
	Calc[Calculator]
	PythonREPL3[Python REPL]
	end
	```

	## Answer Flow

	All agents use the GAIA answer format prompt: reason through the problem, then output `FINAL ANSWER: [answer]`. The extraction layer strips the prefix before submission.

	```mermaid
	flowchart LR
	Prompt["GAIA Answer Format Prompt"] --> Agent["Sub-Agent Reasons"]
	Agent --> FA["FINAL ANSWER: 42"]
	FA --> Extract["_extract_answer()"]
	Extract --> Clean["42"]
	Clean --> Submit["POST /submit"]
	```

	## Data Flow — Single Question

	```mermaid
	sequenceDiagram
	participant App as app.py
	participant GA as GAIAAgent
	participant SV as Supervisor
	participant SA as Sub-Agent
	participant Tool as Tool

	App->>GA: question + task_id + file_name
	GA->>GA: Build prompt with file context if file_name present
	GA->>SV: Invoke graph with messages
	SV->>SV: Analyze question, pick agent
	SV->>SA: Delegate with full context
	SA->>Tool: Call tool (search, video, code, file, etc.)
	Tool-->>SA: Tool result
	SA->>SA: Reason and produce FINAL ANSWER
	SA-->>SV: Response with FINAL ANSWER
	SV-->>GA: Relay FINAL ANSWER
	GA->>GA: Extract answer via regex
	GA-->>App: Clean answer string
	```

	## Submission Flow — Full Evaluation

	```mermaid
	sequenceDiagram
	participant User
	participant Gradio as Gradio UI
	participant App as app.py
	participant API as GAIA Scoring API
	participant Agent as GAIAAgent

	User->>Gradio: Click "Run Evaluation"
	Gradio->>App: run_and_submit_all(profile)
	App->>API: GET /questions
	API-->>App: 20 questions with file_name metadata

	loop For each question
	App->>Agent: agent(question, task_id, file_name)
	Agent-->>App: concise answer
	end

	App->>App: Save submission_payload.log
	App->>API: POST /submit (username, agent_code, answers)
	API-->>App: score, correct_count, total_attempted
	App-->>Gradio: Display results table + score
	Gradio-->>User: Show results
	```

	## File Processing Pipeline

	The file agent downloads from the HuggingFace GAIA dataset (with API fallback) and handles multiple modalities by extension:

	```mermaid
	flowchart LR
	Download["Download from HF Dataset"] --> Detect{File Extension?}
	Detect -->\|.xlsx .xls .csv\| Pandas[Pandas Reader]
	Detect -->\|.mp3 .wav .m4a\| Whisper[Whisper API]
	Detect -->\|.png .jpg .gif .webp\| GPT5V["GPT-5-mini Vision"]
	Detect -->\|.pdf\| PyPDF2[PyPDF2 Reader]
	Detect -->\|.txt .py .json .md\| TextReader[Text Reader]
	Pandas --> Analyze["Reason + FINAL ANSWER"]
	Whisper --> Analyze
	GPT5V --> Analyze
	PyPDF2 --> Analyze
	TextReader --> Analyze
	```

	## Video Analysis Pipeline

	YouTube video questions are handled by the web research agent using Gemini's native video understanding -- no download required:

	```mermaid
	flowchart LR
	Question["Question with YouTube URL"] --> WebAgent[Web Research Agent]
	WebAgent --> GeminiTool["analyze_youtube_video tool"]
	GeminiTool -->\|"pass URL directly"\| Gemini["Gemini 2.5 Pro"]
	Gemini -->\|watches video\| Response[Video Analysis Result]
	Response --> Answer["Reason + FINAL ANSWER"]
	```