VibecoderMcSwaggins commited on
Commit
3753c39
Β·
1 Parent(s): b9709ff

docs: Add P2 bug reports for first agent turn timeout and duplicate report content

Browse files

- Introduced documentation for two performance-related bugs:
1. **First Agent Turn Exceeds Workflow Timeout**: Identified root cause as excessive API calls in the search agent's first turn, leading to timeouts. Recommended fixes include reducing results per tool and increasing the workflow timeout.
2. **Duplicate Report Content**: Documented the issue of repeated report content due to lack of deduplication between streamed and final events. Suggested handling final events inline to avoid duplication.

Both issues affect user experience and require immediate attention for resolution.

docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -18,7 +18,20 @@
18
 
19
  **Root Cause:** Both `MagenticFinalResultEvent` and `WorkflowOutputEvent` emit the full report content that was already streamed. No deduplication exists.
20
 
21
- **Recommended Fix:** Track streamed content length in orchestrator; emit minimal "Research complete." message instead of repeating content.
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ---
24
 
 
18
 
19
  **Root Cause:** Both `MagenticFinalResultEvent` and `WorkflowOutputEvent` emit the full report content that was already streamed. No deduplication exists.
20
 
21
+ **Recommended Fix:** Handle final events inline in `run()` loop where buffer context exists. Track `last_streamed_length`; if > 100 chars, emit "Research complete." instead of full content.
22
+
23
+ ---
24
+
25
+ ### P2 - First Agent Turn Exceeds Workflow Timeout
26
+
27
+ **File:** `docs/bugs/P2_FIRST_TURN_TIMEOUT.md`
28
+ **Status:** OPEN - Performance Bug
29
+
30
+ **Problem:** The search agent's first turn can exceed the 5-minute workflow timeout, causing `iterations=0` at timeout. Users get partial research results.
31
+
32
+ **Root Cause:** Search agent does too much work in a single turn: 3 API searches β†’ 30 results β†’ 30 embedding calls β†’ 30 ChromaDB stores. The timeout is on the WORKFLOW, not individual agent turns.
33
+
34
+ **Recommended Fix:** Reduce `max_results_per_tool` from 10 to 5; increase `advanced_timeout` to 600s (10 min).
35
 
36
  ---
37
 
docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md CHANGED
@@ -3,7 +3,7 @@
3
  **Date**: 2025-12-03
4
  **Status**: OPEN
5
  **Severity**: P2 (UX - Duplicate content confuses users)
6
- **Component**: `src/orchestrators/advanced.py` + `src/app.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
8
 
9
  ---
@@ -25,318 +25,127 @@ The final research report appears **twice** in the UI output:
25
  1. First as streaming content (with `πŸ“‘ **STREAMING**:` prefix)
26
  2. Then again as a complete event (without prefix)
27
 
28
- Example:
29
- ```
30
- πŸ“‘ **STREAMING**:
31
- ### Summary of Drugs and Mechanisms of Action
32
- ...
33
- ### Conclusion
34
- Post-menopausal women experiencing libido issues can benefit from...
35
- ### Recommendations
36
- - Estrogen Therapy: Effective in enhancing...
37
-
38
- Based on the information gathered, we have identified... <-- DUPLICATE STARTS
39
- ### Summary of Drugs and Mechanisms of Action
40
- ...
41
- ### Conclusion
42
- Post-menopausal women experiencing libido issues can benefit from...
43
- ### Recommendations
44
- - Estrogen Therapy: Effective in enhancing...
45
- ```
46
-
47
  ---
48
 
49
- ## Root Cause Analysis
50
-
51
- ### Event Flow (Current - Buggy)
52
 
53
- ```
54
- 1. Reporter Agent streams content
55
- └─ MagenticAgentDeltaEvent Γ— N
56
- └─ Each yields AgentEvent(type="streaming", message=delta)
57
- └─ app.py: streaming_buffer += event.message
58
- └─ User sees: "πŸ“‘ **STREAMING**: [content building up]"
59
-
60
- 2. Reporter Agent completes
61
- └─ MagenticAgentMessageEvent
62
- └─ Yields truncated completion: "reporter: [first 200 chars]..."
63
- └─ app.py: flushes streaming_buffer to response_parts
64
-
65
- 3. Workflow ends
66
- └─ MagenticFinalResultEvent OR WorkflowOutputEvent
67
- └─ Contains FULL report content (same as streaming)
68
- └─ Yields AgentEvent(type="complete", message=FULL_CONTENT)
69
- └─ app.py: appends event.message to response_parts
70
- └─ User sees: [SAME CONTENT AGAIN]
71
- ```
72
 
73
- ### Bug Location
74
-
75
- **`src/orchestrators/advanced.py` lines 532-552:**
76
  ```python
77
- elif isinstance(event, MagenticFinalResultEvent):
78
- text = self._extract_text(event.message) if event.message else "No result"
79
- return AgentEvent(
80
- type="complete",
81
- message=text, # <-- FULL content, already streamed
82
- ...
83
- )
84
-
85
- elif isinstance(event, WorkflowOutputEvent):
86
- if event.data:
87
- text = self._extract_text(event.data)
88
- return AgentEvent(
89
- type="complete",
90
- message=text, # <-- FULL content, already streamed
91
- ...
92
- )
93
- ```
94
 
95
- **`src/app.py` lines 229-232:**
96
- ```python
97
- if event.type == "complete":
98
- response_parts.append(event.message) # <-- Appends duplicate
99
- yield "\n\n".join(response_parts)
100
  ```
101
 
102
- ### Why It Happens
103
-
104
- 1. **Streaming events** yield the full report character-by-character
105
- 2. **Final events** (`MagenticFinalResultEvent`, `WorkflowOutputEvent`) contain the same full content
106
- 3. **No deduplication** exists between streamed content and final event content
107
- 4. **app.py appends both** to the output
108
-
109
  ---
110
 
111
- ## Impact
112
 
113
- | Aspect | Impact |
114
- |--------|--------|
115
- | UX | Report appears twice, looks buggy |
116
- | Token usage | Renders same content twice |
117
- | Trust | Users may think system is broken |
118
 
119
- ---
120
 
121
- ## Proposed Fix Options
122
 
123
- ### Option 1: Skip Complete Event if Content Matches Streaming (Recommended)
 
 
 
 
124
 
125
- **Location**: `src/app.py` lines 229-232
126
 
127
- ```python
128
- if event.type == "complete":
129
- # Skip if content matches what we already streamed
130
- streaming_content = next(
131
- (p.replace("πŸ“‘ **STREAMING**: ", "") for p in response_parts if p.startswith("πŸ“‘ **STREAMING**:")),
132
- None
133
- )
134
- if streaming_content and event.message.strip() == streaming_content.strip():
135
- continue # Skip duplicate
136
- response_parts.append(event.message)
137
- yield "\n\n".join(response_parts)
138
- ```
139
-
140
- **Pros**: Simple, targets exact issue
141
- **Cons**: String comparison may be fragile
142
-
143
- ### Option 2: Track Streamed Content Hash
144
-
145
- **Location**: `src/app.py`
146
 
147
  ```python
148
- streaming_hash = None
149
- ...
150
- if streaming_buffer:
151
- streaming_hash = hash(streaming_buffer.strip())
152
- response_parts.append(f"πŸ“‘ **STREAMING**: {streaming_buffer}")
153
- streaming_buffer = ""
154
- ...
155
- if event.type == "complete":
156
- if streaming_hash and hash(event.message.strip()) == streaming_hash:
157
- continue # Skip duplicate
158
- response_parts.append(event.message)
159
- ```
160
 
161
- **Pros**: More robust comparison
162
- **Cons**: Hash collision possible (unlikely)
 
 
163
 
164
- ### Option 3: Don't Emit Complete Event Content from Orchestrator
 
 
165
 
166
- **Location**: `src/orchestrators/advanced.py` lines 532-552
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
- Replace full content with summary:
169
- ```python
170
- elif isinstance(event, MagenticFinalResultEvent):
171
- return AgentEvent(
172
- type="complete",
173
- message="Research complete.", # Don't repeat content
174
- data={"iterations": iteration},
175
- iteration=iteration,
176
- )
177
  ```
178
 
179
- **Pros**: Clean separation of streaming vs completion
180
- **Cons**: Loses fallback if streaming failed
181
-
182
- ### Option 4: Flag-Based Deduplication in Orchestrator
183
-
184
- **Location**: `src/orchestrators/advanced.py`
185
 
186
- Track if substantial streaming occurred:
187
- ```python
188
- has_substantial_streaming = len(current_message_buffer) > 100
189
-
190
- # In _process_event for final events:
191
- if has_substantial_streaming:
192
- return AgentEvent(
193
- type="complete",
194
- message="Research complete.", # Don't repeat
195
- ...
196
- )
197
- ```
198
 
199
  ---
200
 
201
- ## Recommended Fix
202
 
203
- **Option 3** is cleanest - the orchestrator should not re-emit content that was already streamed.
204
-
205
- **Implementation**:
206
- 1. Track `streamed_report_length` in the run loop
207
- 2. If substantial content was streamed (>500 chars), emit minimal complete message
208
- 3. If no streaming occurred, emit full content as fallback
209
 
210
  ---
211
 
212
- ## Files Involved
213
 
214
- | File | Role |
215
- |------|------|
216
- | `src/orchestrators/advanced.py:532-552` | Emits duplicate complete events |
217
- | `src/app.py:229-232` | Appends duplicate to output |
218
 
219
  ---
220
 
221
  ## Test Plan
222
 
223
- 1. Run Free Tier query: "What drugs improve female libido post-menopause?"
224
- 2. Verify report appears ONCE (with streaming prefix)
225
- 3. Verify `complete` event does NOT repeat content
226
- 4. Verify fallback works if streaming fails
227
 
228
  ---
229
 
230
- ## Deep Technical Analysis
231
-
232
- ### Microsoft Agent Framework Event Types
233
-
234
- The framework emits these event types (all inherit from `WorkflowEvent`):
235
-
236
- | Event Type | Purpose | Key Attributes |
237
- |------------|---------|----------------|
238
- | `MagenticAgentDeltaEvent` | Streaming tokens | `text`, `agent_id` |
239
- | `MagenticAgentMessageEvent` | Agent turn complete | `message` (ChatMessage), `agent_id` |
240
- | `MagenticFinalResultEvent` | Workflow final result | `message` (ChatMessage) |
241
- | `MagenticOrchestratorMessageEvent` | Manager bookkeeping | `message`, `kind`, `orchestrator_id` |
242
- | `WorkflowOutputEvent` | Workflow output | `data`, `source_executor_id` |
243
-
244
- ### Event Flow Trace
245
-
246
- ```
247
- PHASE 1: Agent Streaming (Reporter)
248
- ─────────────────────────────────────
249
- MagenticAgentDeltaEvent(text="##", agent_id="reporter") β†’ yields streaming event
250
- MagenticAgentDeltaEvent(text=" Summary", agent_id="reporter") β†’ yields streaming event
251
- MagenticAgentDeltaEvent(text="\n", agent_id="reporter") β†’ yields streaming event
252
- ... (hundreds more delta events)
253
- MagenticAgentDeltaEvent(text=".", agent_id="reporter") β†’ yields streaming event
254
-
255
- β†’ Result: Full report content in streaming_buffer (app.py) and current_message_buffer (orchestrator)
256
-
257
- PHASE 2: Agent Completion
258
- ─────────────────────────────────────
259
- MagenticAgentMessageEvent(message=ChatMessage(...), agent_id="reporter")
260
- β†’ _handle_completion_event() yields: "reporter: [first 200 chars]..."
261
- β†’ Clears current_message_buffer
262
- β†’ app.py flushes streaming_buffer to response_parts with "πŸ“‘ **STREAMING**:" prefix
263
-
264
- PHASE 3: Workflow Termination (THE BUG)
265
- ─────────────────────────────────────
266
- MagenticFinalResultEvent(message=ChatMessage(...)) ← Contains SAME full report!
267
- OR
268
- WorkflowOutputEvent(data=ChatMessage(...)) ← Contains SAME full report!
269
-
270
- β†’ _process_event() extracts text with _extract_text()
271
- β†’ Returns AgentEvent(type="complete", message=FULL_REPORT)
272
- β†’ app.py appends FULL_REPORT to response_parts (NO prefix)
273
-
274
- RESULT: Report appears twice:
275
- 1. "πŸ“‘ **STREAMING**: [full report]"
276
- 2. "[full report again]"
277
- ```
278
-
279
- ### Key Code Paths
280
-
281
- **`advanced.py` lines 299-345 (main loop):**
282
- ```python
283
- # Buffer is cleared HERE (line 337) after MagenticAgentMessageEvent
284
- current_message_buffer = ""
285
-
286
- # But MagenticFinalResultEvent comes AFTER and _process_event has no buffer context!
287
- agent_event = self._process_event(event, iteration) # line 341
288
- if agent_event:
289
- yield agent_event # line 345 - yields duplicate!
290
- ```
291
-
292
- **`advanced.py` lines 532-539 (_process_event):**
293
- ```python
294
- elif isinstance(event, MagenticFinalResultEvent):
295
- text = self._extract_text(event.message) # Extracts FULL content
296
- return AgentEvent(type="complete", message=text) # Returns FULL content
297
- ```
298
-
299
- **`app.py` lines 229-232 (UI handling):**
300
- ```python
301
- if event.type == "complete":
302
- response_parts.append(event.message) # Appends to existing streamed content!
303
- yield "\n\n".join(response_parts)
304
- ```
305
-
306
- ### Why Buffer Clearing Doesn't Help
307
-
308
- The `current_message_buffer` is cleared (line 337) BEFORE the final events arrive. So even if we wanted to compare, we've already lost the reference:
309
-
310
- ```python
311
- # Line 327-338: Handle MagenticAgentMessageEvent
312
- iteration += 1
313
- comp_event, prog_event = self._handle_completion_event(...)
314
- yield comp_event
315
- yield prog_event
316
- current_message_buffer = "" # CLEARED!
317
- continue
318
-
319
- # Line 341-345: Handle final events (buffer is empty now!)
320
- agent_event = self._process_event(event, iteration) # No buffer context
321
- ```
322
-
323
- ### Potential Edge Cases
324
-
325
- 1. **Tool-only turns**: If agent makes tool calls without text, buffer is empty β†’ fallback text used
326
- 2. **Multiple agents streaming**: Buffer clears on agent switch (line 311-313) β†’ OK
327
- 3. **Timeout**: Uses `_handle_timeout()` which invokes ReportAgent directly β†’ Different path
328
- 4. **No final event**: Falls back to "Research completed..." message (line 354-363) β†’ OK
329
 
330
- ### Verification Needed
331
 
332
- - [ ] Confirm `MagenticFinalResultEvent` vs `WorkflowOutputEvent` - which is emitted?
333
- - [ ] Confirm bug occurs on both Free and Paid tiers
334
- - [ ] Measure content length match between streaming and final event
335
 
336
  ---
337
 
338
  ## Related
339
 
340
- - **Not related to model quality** - This is a stack bug, not model limitation
341
- - P1 Free Tier fix (PR fix/P1-free-tier) enabled streaming, exposing this bug
342
- - SPEC-17 Accumulator Pattern addressed repr bug but created this side effect
 
3
  **Date**: 2025-12-03
4
  **Status**: OPEN
5
  **Severity**: P2 (UX - Duplicate content confuses users)
6
+ **Component**: `src/orchestrators/advanced.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
8
 
9
  ---
 
25
  1. First as streaming content (with `πŸ“‘ **STREAMING**:` prefix)
26
  2. Then again as a complete event (without prefix)
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
 
30
+ ## Root Cause
 
 
31
 
32
+ The `_process_event()` method handles final events but has **no access to buffer state**. The buffer was already cleared at line 337 before these events arrive.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
 
 
 
34
  ```python
35
+ # Line 337: Buffer cleared
36
+ current_message_buffer = ""
37
+ continue
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
+ # Line 341: Final events processed WITHOUT buffer context
40
+ agent_event = self._process_event(event, iteration) # No buffer info!
 
 
 
41
  ```
42
 
 
 
 
 
 
 
 
43
  ---
44
 
45
+ ## The Fix (Consensus: Stateful Orchestrator Logic)
46
 
47
+ **Location**: `src/orchestrators/advanced.py` `run()` method
 
 
 
 
48
 
49
+ **Strategy**: Handle final events **inline in the run() loop** where buffer state exists. Track streaming volume to decide whether to re-emit content.
50
 
51
+ ### Why This Is Correct
52
 
53
+ | Rejected Approach | Why Wrong |
54
+ |-------------------|-----------|
55
+ | UI-side string comparison | Wrong layer, fragile, treats symptom |
56
+ | Stateless `_process_event` fix | No state = can't know if streaming occurred |
57
+ | **Stateful run() loop** | βœ… Only place with full lifecycle visibility |
58
 
59
+ The `run()` loop is the **single source of truth** for the request lifecycle. It "saw" the content stream out. It must decide whether to re-emit.
60
 
61
+ ### Implementation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
  ```python
64
+ # In run() method, add tracking variable after line 302:
65
+ last_streamed_length: int = 0
 
 
 
 
 
 
 
 
 
 
66
 
67
+ # Before clearing buffer at line 337, save its length:
68
+ last_streamed_length = len(current_message_buffer)
69
+ current_message_buffer = ""
70
+ continue
71
 
72
+ # Replace lines 340-345 with inline handling of final events:
73
+ if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
74
+ final_event_received = True
75
 
76
+ # DECISION: Did we stream substantial content?
77
+ if last_streamed_length > 100:
78
+ # YES: Final event is a SIGNAL, not a payload
79
+ yield AgentEvent(
80
+ type="complete",
81
+ message="Research complete.",
82
+ data={"iterations": iteration, "streamed_chars": last_streamed_length},
83
+ iteration=iteration,
84
+ )
85
+ else:
86
+ # NO: Final event must carry the payload (tool-only turn, cache hit)
87
+ if isinstance(event, MagenticFinalResultEvent):
88
+ text = self._extract_text(event.message) if event.message else "No result"
89
+ else: # WorkflowOutputEvent
90
+ text = self._extract_text(event.data) if event.data else "Research complete"
91
+ yield AgentEvent(
92
+ type="complete",
93
+ message=text,
94
+ data={"iterations": iteration},
95
+ iteration=iteration,
96
+ )
97
+ continue
98
 
99
+ # Keep existing fallback for other events:
100
+ agent_event = self._process_event(event, iteration)
 
 
 
 
 
 
 
101
  ```
102
 
103
+ ### Why Threshold of 100 Chars?
 
 
 
 
 
104
 
105
+ - `> 0` is too aggressive (might catch single-word streams)
106
+ - `> 500` is too conservative (might miss short but complete responses)
107
+ - `> 100` distinguishes "real content was streamed" from "just status messages"
 
 
 
 
 
 
 
 
 
108
 
109
  ---
110
 
111
+ ## Edge Cases Handled
112
 
113
+ | Scenario | `last_streamed_length` | Action |
114
+ |----------|------------------------|--------|
115
+ | Normal streaming report | 5000+ | Emit "Research complete." |
116
+ | Tool call, no text | 0 | Emit full content from final event |
117
+ | Very short response | 50 | Emit full content (fallback) |
118
+ | Agent switch mid-stream | Reset on switch | Tracks only final agent |
119
 
120
  ---
121
 
122
+ ## Files to Modify
123
 
124
+ | File | Lines | Change |
125
+ |------|-------|--------|
126
+ | `src/orchestrators/advanced.py` | 296-345 | Add `last_streamed_length`, handle final events inline |
127
+ | `src/orchestrators/advanced.py` | 532-552 | Optional: remove dead code from `_process_event()` |
128
 
129
  ---
130
 
131
  ## Test Plan
132
 
133
+ 1. **Happy Path**: Run query, verify report appears ONCE
134
+ 2. **Fallback**: Mock tool-only turn (no streaming), verify full content emitted
135
+ 3. **Both Tiers**: Test Free Tier and Paid Tier
 
136
 
137
  ---
138
 
139
+ ## Validation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140
 
141
+ This fix was independently validated by two AI agents (Claude and Gemini) analyzing the architecture. Both concluded:
142
 
143
+ > "The Stateful Orchestrator Fix is the correct engineering solution. The 'Source of Truth' is the Orchestrator's runtime state."
 
 
144
 
145
  ---
146
 
147
  ## Related
148
 
149
+ - **Not related to model quality** - This is a stack bug
150
+ - P1 Free Tier fix enabled streaming, exposing this bug
151
+ - SPEC-17 Accumulator Pattern addressed repr bug but created this side effect
docs/bugs/P2_FIRST_TURN_TIMEOUT.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P2 Bug: First Agent Turn Exceeds Workflow Timeout
2
+
3
+ **Date**: 2025-12-03
4
+ **Status**: OPEN
5
+ **Severity**: P2 (UX - Workflow always times out on complex queries)
6
+ **Component**: `src/orchestrators/advanced.py` + `src/agents/search_agent.py`
7
+ **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
8
+
9
+ ---
10
+
11
+ ## Executive Summary
12
+
13
+ The search agent's first turn can exceed the 5-minute workflow timeout, causing:
14
+ 1. `iterations=0` at timeout (no agent completed a turn)
15
+ 2. `_handle_timeout()` synthesizes from partial evidence
16
+ 3. Users get incomplete research results
17
+
18
+ This is a **performance/architecture bug**, not a model issue.
19
+
20
+ ---
21
+
22
+ ## Symptom
23
+
24
+ ```
25
+ [warning] Workflow timed out iterations=0
26
+ ```
27
+
28
+ The workflow times out with `iterations=0` - meaning the first agent (search agent) never completed its turn before the 5-minute timeout.
29
+
30
+ ---
31
+
32
+ ## Root Cause
33
+
34
+ The search agent's first turn is **extremely expensive**:
35
+
36
+ ```
37
+ Search Agent First Turn:
38
+ β”œβ”€β”€ Manager assigns task
39
+ β”œβ”€β”€ Search agent starts
40
+ β”‚ β”œβ”€β”€ Calls PubMed search tool (10 results)
41
+ β”‚ β”œβ”€β”€ Calls ClinicalTrials search tool (10 results)
42
+ β”‚ β”œβ”€β”€ Calls EuropePMC search tool (10 results)
43
+ β”‚ └── For EACH result (30 total):
44
+ β”‚ β”œβ”€β”€ Generate embedding (OpenAI API call)
45
+ β”‚ β”œβ”€β”€ Check for duplicates (ChromaDB query)
46
+ β”‚ └── Store in ChromaDB
47
+ β”‚
48
+ β”‚ TOTAL: 30 results Γ— (embedding + dedup + store) = 90+ API/DB operations
49
+ β”‚
50
+ └── Agent turn completes (if timeout hasn't fired)
51
+ ```
52
+
53
+ **The timeout is on the WORKFLOW, not individual agent turns.** A single greedy agent can consume the entire timeout budget.
54
+
55
+ ---
56
+
57
+ ## Impact
58
+
59
+ | Aspect | Impact |
60
+ |--------|--------|
61
+ | UX | Queries always timeout on first turn |
62
+ | Research quality | Synthesis happens on partial evidence |
63
+ | Confusion | `iterations=0` looks like nothing happened |
64
+
65
+ ---
66
+
67
+ ## The Fix (Consensus)
68
+
69
+ **Reduce work per turn + increase timeout budget.**
70
+
71
+ ### Implementation
72
+
73
+ **1. Reduce results per tool (immediate)**
74
+
75
+ `src/agents/search_agent.py` line 70:
76
+ ```python
77
+ # Change from 10 to 5
78
+ result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)
79
+ ```
80
+
81
+ **2. Increase workflow timeout (immediate)**
82
+
83
+ `src/utils/config.py`:
84
+ ```python
85
+ advanced_timeout: float = Field(
86
+ default=600.0, # Was 300.0 (5 min), now 10 min
87
+ ge=60.0,
88
+ le=900.0,
89
+ description="Timeout for Advanced mode in seconds",
90
+ )
91
+ ```
92
+
93
+ ### Why NOT Per-Turn Timeout
94
+
95
+ **DANGER**: The SearchHandler uses `asyncio.gather()`:
96
+
97
+ ```python
98
+ # src/tools/search_handler.py line 163-164
99
+ results = await asyncio.gather(*tasks, return_exceptions=True)
100
+ ```
101
+
102
+ This is an **all-or-nothing** operation. If you wrap it with `asyncio.timeout()` and the timeout fires, you get **zero results**, not partial results.
103
+
104
+ ```python
105
+ # DON'T DO THIS - yields nothing on timeout
106
+ async with asyncio.timeout(60):
107
+ result = await self._handler.execute(query) # Cancelled = zero results
108
+ ```
109
+
110
+ Per-turn timeout requires `SearchHandler` to support cancellation with partial results. That's a separate architectural change (see Future Work).
111
+
112
+ ---
113
+
114
+ ## Future Work (Streaming Evidence Ingestion)
115
+
116
+ For proper fix, `SearchHandler.execute()` should:
117
+ 1. Yield results as they arrive (async generator)
118
+ 2. Support cancellation with partial results
119
+ 3. Allow agent to return "what we have so far" on timeout
120
+
121
+ ```python
122
+ # Future architecture
123
+ async def execute_streaming(self, query: str) -> AsyncIterator[Evidence]:
124
+ for tool in self.tools:
125
+ async for evidence in tool.search_streaming(query):
126
+ yield evidence # Can be cancelled at any point
127
+ ```
128
+
129
+ This is out of scope for the immediate fix.
130
+
131
+ ---
132
+
133
+ ## Test Plan
134
+
135
+ 1. Run query with 10-minute timeout
136
+ 2. Verify first agent turn completes before timeout
137
+ 3. Verify `iterations >= 1` at workflow end
138
+
139
+ ---
140
+
141
+ ## Verification Data
142
+
143
+ From diagnostic run:
144
+ ```
145
+ === RAW FRAMEWORK EVENTS ===
146
+ MagenticAgentDeltaEvent: 284
147
+ MagenticOrchestratorMessageEvent: 3
148
+ ...
149
+ NO MagenticAgentMessageEvent ← Agent never completed a turn!
150
+
151
+ [warning] Workflow timed out iterations=0
152
+ ```
153
+
154
+ ---
155
+
156
+ ## Related
157
+
158
+ - P2 Duplicate Report Bug (separate issue, happens after successful completion)
159
+ - `_handle_timeout()` correctly synthesizes, but with partial evidence
160
+ - Not related to model quality - this is infrastructure/performance