igorpavlov-mgr commited on
Commit
6474e3d
Β·
verified Β·
1 Parent(s): 81917a3

Upload BASELINE.md

Browse files

BASELINE.md - baselined project vision for future references

Files changed (1) hide show
  1. BASELINE.md +98 -0
BASELINE.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GAIA Final Assignment – Baseline Specification
2
+
3
+ ## Objective
4
+
5
+ This project is developed as part of the Hugging Face AI Agents Course (Unit 4).
6
+ The goal is to implement an agent that:
7
+
8
+ - Uses ReAct-style reasoning (Thought β†’ Action β†’ Observation β†’ Answer)
9
+ - Employs at least one external tool (e.g., calculator, search)
10
+ - Achieves at least 30% accuracy on Level 1 of the GAIA benchmark
11
+ - Submits answers using the provided GAIA scoring API
12
+ - Runs entirely on a Hugging Face Space (CPU-only)
13
+
14
+ ---
15
+
16
+ ## Strategic Decisions
17
+
18
+ | Component | Decision |
19
+ |----------|----------|
20
+ | LLM | Qwen/Qwen1.5-1.8B-Chat |
21
+ | Hardware | CPU-only (both locally and in deployment) |
22
+ | Development Flow | Code is tested in Google Colab (CPU mode), then ported to HF Space |
23
+ | Submission Interface | Uses provided endpoints: /questions and /submit |
24
+ | UI | Gradio-based interface with OAuth login (gr.LoginButton) |
25
+ | Logging / Observability | Step-by-step logging with [REASONING], [ACTION], [OBSERVATION], [ANSWER] blocks |
26
+ | Agent Framework (Phase 1) | Manual ReAct implementation with full control for transparency and debugging |
27
+ | Agent Framework (Phase 2) | Planned upgrade to smolagent for simplified tool integration and scaling logic |
28
+ | Tooling Strategy | Begin with calculator; add web search, Python code execution, and Wikipedia access incrementally |
29
+
30
+ ---
31
+
32
+ ## GAIA Task Level Alignment
33
+
34
+ | GAIA Level | Description | Covered in Plan |
35
+ |------------|-------------|-----------------|
36
+ | Level 1 | ReAct agent with one tool | Included in Phase 1 baseline |
37
+ | Level 2 | Robust instruction parsing | Planned via prompt engineering |
38
+ | Level 3 | Self-reflection and retry | Planned in Phase 2 and 3 upgrades |
39
+ | Level 4 | Tool chaining | Planned in Phase 3 |
40
+ | Level 5 | Multimodal or complex tasks | Currently out of scope |
41
+
42
+ ---
43
+
44
+ ## Agent Implementation Phases
45
+
46
+ ### Phase 1 – Manual Agent (Baseline)
47
+ - Implemented using a custom ReAct loop in Python
48
+ - Uses a single tool (calculator)
49
+ - Logs all reasoning steps for transparency
50
+ - Focused on correctness and simplicity
51
+ - Designed to pass at least 30% of Level 1 tasks
52
+
53
+ ### Phase 2 – Upgrade to `smolagent`
54
+ - Replace manual loop with smolagent.Agent
55
+ - Use @tool decorators for tool registration
56
+ - Modular reasoning loop and simplified execution
57
+ - Easier to extend with retry logic, tool chaining, and prompt consistency
58
+ - Supports progression to GAIA Levels 2 and 3
59
+
60
+ ---
61
+
62
+ ## Evaluation and Submission Integration
63
+
64
+ - HF OAuth Login enabled via Gradio
65
+ - Agent receives tasks from: https://agents-course-unit4-scoring.hf.space/questions
66
+ - Submits answers to: https://agents-course-unit4-scoring.hf.space/submit
67
+ - Submission includes:
68
+ - username (from login)
69
+ - agent_code (this Space URL)
70
+ - answer list (one per task)
71
+
72
+ ---
73
+
74
+ ## Initial Agent Requirements (Baseline)
75
+
76
+ | Feature | Description |
77
+ |---------|-------------|
78
+ | ReAct loop | Simple reasoning + single tool use |
79
+ | Tools | Calculator (initial) |
80
+ | Output format | Clean, final answers (no trace steps included) |
81
+ | Logging | Inline reasoning log to support debugging |
82
+ | Model behavior | Deterministic generation (low temperature) |
83
+ | Deployment | Fully compatible with HF Space CPU runtime |
84
+
85
+ ---
86
+
87
+ ## Enhancement Strategy
88
+
89
+ 1. Establish baseline agent with deterministic tool-based answers
90
+ 2. Add retry logic and basic self-reflection (Level 3)
91
+ 3. Add tool chaining support for multi-hop reasoning (Level 4)
92
+ 4. Introduce structured retrieval tools (Wikipedia, code execution)
93
+ 5. Upgrade to smolagent framework for better structure and extensibility
94
+
95
+ ---
96
+
97
+ ## Created: May 5, 2025
98
+ Maintained by: Igor Pavlov