igorpavlov-mgr's picture
Upload BASELINE.md
6474e3d verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade

GAIA Final Assignment – Baseline Specification

Objective

This project is developed as part of the Hugging Face AI Agents Course (Unit 4).
The goal is to implement an agent that:

  • Uses ReAct-style reasoning (Thought β†’ Action β†’ Observation β†’ Answer)
  • Employs at least one external tool (e.g., calculator, search)
  • Achieves at least 30% accuracy on Level 1 of the GAIA benchmark
  • Submits answers using the provided GAIA scoring API
  • Runs entirely on a Hugging Face Space (CPU-only)

Strategic Decisions

Component Decision
LLM Qwen/Qwen1.5-1.8B-Chat
Hardware CPU-only (both locally and in deployment)
Development Flow Code is tested in Google Colab (CPU mode), then ported to HF Space
Submission Interface Uses provided endpoints: /questions and /submit
UI Gradio-based interface with OAuth login (gr.LoginButton)
Logging / Observability Step-by-step logging with [REASONING], [ACTION], [OBSERVATION], [ANSWER] blocks
Agent Framework (Phase 1) Manual ReAct implementation with full control for transparency and debugging
Agent Framework (Phase 2) Planned upgrade to smolagent for simplified tool integration and scaling logic
Tooling Strategy Begin with calculator; add web search, Python code execution, and Wikipedia access incrementally

GAIA Task Level Alignment

GAIA Level Description Covered in Plan
Level 1 ReAct agent with one tool Included in Phase 1 baseline
Level 2 Robust instruction parsing Planned via prompt engineering
Level 3 Self-reflection and retry Planned in Phase 2 and 3 upgrades
Level 4 Tool chaining Planned in Phase 3
Level 5 Multimodal or complex tasks Currently out of scope

Agent Implementation Phases

Phase 1 – Manual Agent (Baseline)

  • Implemented using a custom ReAct loop in Python
  • Uses a single tool (calculator)
  • Logs all reasoning steps for transparency
  • Focused on correctness and simplicity
  • Designed to pass at least 30% of Level 1 tasks

Phase 2 – Upgrade to smolagent

  • Replace manual loop with smolagent.Agent
  • Use @tool decorators for tool registration
  • Modular reasoning loop and simplified execution
  • Easier to extend with retry logic, tool chaining, and prompt consistency
  • Supports progression to GAIA Levels 2 and 3

Evaluation and Submission Integration


Initial Agent Requirements (Baseline)

Feature Description
ReAct loop Simple reasoning + single tool use
Tools Calculator (initial)
Output format Clean, final answers (no trace steps included)
Logging Inline reasoning log to support debugging
Model behavior Deterministic generation (low temperature)
Deployment Fully compatible with HF Space CPU runtime

Enhancement Strategy

  1. Establish baseline agent with deterministic tool-based answers
  2. Add retry logic and basic self-reflection (Level 3)
  3. Add tool chaining support for multi-hop reasoning (Level 4)
  4. Introduce structured retrieval tools (Wikipedia, code execution)
  5. Upgrade to smolagent framework for better structure and extensibility

Created: May 5, 2025

Maintained by: Igor Pavlov