VibecoderMcSwaggins commited on
Commit
1bfc1df
Β·
1 Parent(s): 0b27b1c

fix(orchestrator): Force synthesis when ReportAgent doesn't run (P1)

Browse files

## Problem
The workflow terminates without ReportAgent producing a synthesis report.
Users see search/hypothesis/judge output but get "Research complete." with
no actual research report. This primarily affects Free Tier (HuggingFace 7B
manager model) which doesn't reliably delegate to ReportAgent.

## Solution
1. Track `reporter_ran` flag when ReportAgent produces output
2. On workflow termination, if ReportAgent never ran, force synthesis via
`_force_synthesis()` method (similar to `_handle_timeout()`)
3. Skip duplicate final events (both MagenticFinalResultEvent and
WorkflowOutputEvent were yielding "Research complete.")

## Testing
- 313 unit tests pass
- Linting and type checking pass

Fixes: P1 No Synthesis Free Tier

docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -9,32 +9,6 @@
9
 
10
  ## Currently Active Bugs
11
 
12
- ### P2 - Duplicate Report Content in Output
13
-
14
- **File:** `docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md`
15
- **Status:** OPEN - UX Bug
16
-
17
- **Problem:** The final research report appears twice in the UI - once as streaming content, then again as a complete event. This is a **stack bug**, not a model issue.
18
-
19
- **Root Cause:** Both `MagenticFinalResultEvent` and `WorkflowOutputEvent` emit the full report content that was already streamed. No deduplication exists.
20
-
21
- **Recommended Fix:** Handle final events inline in `run()` loop where buffer context exists. Track `last_streamed_length`; if > 100 chars, emit "Research complete." instead of full content.
22
-
23
- ---
24
-
25
- ### P2 - First Agent Turn Exceeds Workflow Timeout
26
-
27
- **File:** `docs/bugs/P2_FIRST_TURN_TIMEOUT.md`
28
- **Status:** OPEN - Performance Bug
29
-
30
- **Problem:** The search agent's first turn can exceed the 5-minute workflow timeout, causing `iterations=0` at timeout. Users get partial research results.
31
-
32
- **Root Cause:** Search agent does too much work in a single turn: 3 API searches β†’ 30 results β†’ 30 embedding calls β†’ 30 ChromaDB stores. The timeout is on the WORKFLOW, not individual agent turns.
33
-
34
- **Recommended Fix:** Reduce `max_results_per_tool` from 10 to 5; increase `advanced_timeout` to 600s (10 min).
35
-
36
- ---
37
-
38
  ### P3 - Progress Bar Positioning in ChatInterface
39
 
40
  **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
@@ -83,6 +57,7 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
83
  - **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
84
 
85
  ### P1 Bugs (All FIXED)
 
86
  - **P1 Free Tier Tool Execution Failure** - FIXED in PR fix/P1-free-tier-tool-execution, removed premature marker
87
  - **P1 Gradio Example Click Auto-Submits** - FIXED in PR #120, prevents auto-submit on example click
88
  - **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause
 
9
 
10
  ## Currently Active Bugs
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ### P3 - Progress Bar Positioning in ChatInterface
13
 
14
  **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
 
57
  - **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
58
 
59
  ### P1 Bugs (All FIXED)
60
+ - **P1 No Synthesis Free Tier** - FIXED in PR fix/p1-forced-synthesis, forced synthesis safety net when ReportAgent doesn't run
61
  - **P1 Free Tier Tool Execution Failure** - FIXED in PR fix/P1-free-tier-tool-execution, removed premature marker
62
  - **P1 Gradio Example Click Auto-Submits** - FIXED in PR #120, prevents auto-submit on example click
63
  - **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause
docs/bugs/P1_NO_SYNTHESIS_FREE_TIER.md ADDED
@@ -0,0 +1,165 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Bug: No Synthesis Report in Free Tier (Premature Workflow Termination)
2
+
3
+ **Date**: 2025-12-04
4
+ **Status**: FIXED (PR fix/p1-forced-synthesis)
5
+ **Severity**: P1 (Critical UX - No usable output from research)
6
+ **Component**: `src/orchestrators/advanced.py`
7
+ **Affects**: Free Tier (HuggingFace) primarily, potentially Paid Tier
8
+
9
+ ---
10
+
11
+ ## Executive Summary
12
+
13
+ The workflow terminates without the ReportAgent ever producing a synthesis report. Users see search results and hypotheses streaming, but the final output is just "Research complete." with no actual research report. This is caused by the 7B Manager model failing to properly delegate to ReportAgent before workflow termination.
14
+
15
+ ---
16
+
17
+ ## Symptom
18
+
19
+ ```
20
+ πŸ“š **SEARCH_COMPLETE**: searcher: [search results]
21
+ ⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)
22
+ πŸ”¬ **HYPOTHESIZING**: hypothesizer: [hypotheses]
23
+ ⏱️ **PROGRESS**: Round 2/5 (~2m 15s remaining)
24
+ βœ… **JUDGE_COMPLETE**: judge: [asks for more evidence]
25
+ ⏱️ **PROGRESS**: Round 4/5 (~45s remaining)
26
+ Research complete.
27
+ Research complete. ← NO SYNTHESIS REPORT!
28
+ ```
29
+
30
+ The workflow runs through multiple agents (Search, Hypothesis, Judge) but never reaches the ReportAgent. The user receives no usable research report.
31
+
32
+ ---
33
+
34
+ ## Root Cause Analysis
35
+
36
+ ### Primary Issue: Manager Model Failure
37
+
38
+ The `with_standard_manager()` in Microsoft Agent Framework uses the provided chat client (HuggingFace 7B model) to coordinate agents. The 7B model:
39
+
40
+ 1. **Cannot follow complex multi-step instructions** - The manager prompt instructs: "When JudgeAgent says SUFFICIENT EVIDENCE β†’ delegate to ReportAgent." The 7B model doesn't reliably follow this.
41
+
42
+ 2. **Triggers premature termination** - The framework has `max_stall_count=3` and `max_reset_count=2`. If the manager keeps making the same delegation or gets confused, the workflow terminates.
43
+
44
+ 3. **Emits final event without synthesis** - The framework sends `MagenticFinalResultEvent` or `WorkflowOutputEvent` without ReportAgent ever running.
45
+
46
+ ### Secondary Issue: Duplicate Complete Events
47
+
48
+ Both `MagenticFinalResultEvent` AND `WorkflowOutputEvent` are emitted when the workflow ends. The previous code handled both, yielding "Research complete." twice.
49
+
50
+ ---
51
+
52
+ ## The Fix
53
+
54
+ ### 1. Track ReportAgent Execution (Forced Synthesis)
55
+
56
+ Add a `reporter_ran` flag that tracks whether ReportAgent produced output:
57
+
58
+ ```python
59
+ reporter_ran = False # P1 FIX: Track if ReportAgent produced output
60
+
61
+ # In MagenticAgentMessageEvent handler:
62
+ agent_name = (event.agent_id or "").lower()
63
+ if "report" in agent_name:
64
+ reporter_ran = True
65
+ ```
66
+
67
+ ### 2. Force Synthesis on Final Event
68
+
69
+ If the workflow ends without ReportAgent running, force synthesis:
70
+
71
+ ```python
72
+ if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
73
+ if not reporter_ran:
74
+ logger.warning("ReportAgent never ran - forcing synthesis")
75
+ async for synth_event in self._force_synthesis(iteration):
76
+ yield synth_event
77
+ else:
78
+ yield self._handle_final_event(event, iteration, last_streamed_length)
79
+ ```
80
+
81
+ ### 3. `_force_synthesis()` Method
82
+
83
+ Similar to `_handle_timeout()`, invokes ReportAgent directly:
84
+
85
+ ```python
86
+ async def _force_synthesis(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
87
+ """Force synthesis when workflow ends without ReportAgent running."""
88
+ state = get_magentic_state()
89
+ evidence_summary = await state.memory.get_context_summary()
90
+ report_agent = create_report_agent(self._chat_client, domain=self.domain)
91
+
92
+ yield AgentEvent(type="synthesizing", message="Synthesizing research findings...")
93
+
94
+ synthesis_result = await report_agent.run(
95
+ f"Synthesize research report from this evidence.\n\n{evidence_summary}"
96
+ )
97
+
98
+ yield AgentEvent(type="complete", message=synthesis_result.text)
99
+ ```
100
+
101
+ ### 4. Skip Duplicate Final Events
102
+
103
+ Prevent "Research complete." appearing twice:
104
+
105
+ ```python
106
+ if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
107
+ if final_event_received:
108
+ continue # Skip duplicate final events
109
+ final_event_received = True
110
+ ```
111
+
112
+ ---
113
+
114
+ ## Why This Is The Correct Architecture
115
+
116
+ | Alternative | Why Wrong |
117
+ |-------------|-----------|
118
+ | Improve manager prompt | 7B models have fundamental reasoning limitations |
119
+ | Use larger model for manager | Defeats "free tier" purpose |
120
+ | Wait for upstream fix | Framework may never change; we control our code |
121
+ | **Forced synthesis safety net** | βœ… Guarantees output regardless of manager behavior |
122
+
123
+ The `_force_synthesis()` pattern is a **defensive architecture**. It guarantees users always get a research report, even if:
124
+ - The manager model fails to delegate properly
125
+ - The workflow hits stall/reset limits
126
+ - Any unexpected termination occurs
127
+
128
+ ---
129
+
130
+ ## Files Modified
131
+
132
+ | File | Change |
133
+ |------|--------|
134
+ | `src/orchestrators/advanced.py` | Added `reporter_ran` tracking |
135
+ | `src/orchestrators/advanced.py` | Added `_force_synthesis()` method |
136
+ | `src/orchestrators/advanced.py` | Added duplicate final event skipping |
137
+ | `src/orchestrators/advanced.py` | Added forced synthesis in final event handler |
138
+ | `src/orchestrators/advanced.py` | Added forced synthesis in max rounds fallback |
139
+
140
+ ---
141
+
142
+ ## Test Plan
143
+
144
+ 1. **Free Tier**: Run query, verify synthesis report is always generated
145
+ 2. **Paid Tier**: Run query, verify no regression in OpenAI behavior
146
+ 3. **Timeout**: Verify existing timeout synthesis still works
147
+ 4. **Max Rounds**: Verify synthesis happens even at max rounds
148
+
149
+ ---
150
+
151
+ ## Related
152
+
153
+ - P2 Duplicate Report Bug (separate issue, also fixed in this PR)
154
+ - P2 First Turn Timeout Bug (previously fixed)
155
+ - Manager model limitations are fundamental to 7B models
156
+ - OpenAI tier works because GPT-5 follows instructions better
157
+
158
+ ---
159
+
160
+ ## Lessons Learned
161
+
162
+ 1. **Defensive architecture** - Don't trust upstream components to always behave correctly
163
+ 2. **Tracking flags** - Simple boolean flags can enable powerful safety nets
164
+ 3. **AI-native challenges** - When using AI models as infrastructure components, build in fallbacks for model failures
165
+ 4. **Regression prevention** - This bug was likely introduced when we unified the architecture; comprehensive test coverage is critical
src/orchestrators/advanced.py CHANGED
@@ -247,7 +247,58 @@ The final output should be a structured research report."""
247
  iteration=iteration,
248
  )
249
 
250
- async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
251
  """
252
  Run the workflow.
253
 
@@ -295,6 +346,7 @@ The final output should be a structured research report."""
295
 
296
  iteration = 0
297
  final_event_received = False
 
298
 
299
  # ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
300
  # Upstream bug in _magentic.py flattens message.contents and sets message.text
@@ -328,6 +380,11 @@ The final output should be a structured research report."""
328
  if isinstance(event, MagenticAgentMessageEvent):
329
  iteration += 1
330
 
 
 
 
 
 
331
  comp_event, prog_event = self._handle_completion_event(
332
  event, current_message_buffer, iteration
333
  )
@@ -340,10 +397,22 @@ The final output should be a structured research report."""
340
  current_message_buffer = ""
341
  continue
342
 
343
- # 3. Handle Final Events Inline (P2 Duplicate Report Fix)
344
  if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
 
 
345
  final_event_received = True
346
- yield self._handle_final_event(event, iteration, last_streamed_length)
 
 
 
 
 
 
 
 
 
 
347
  continue
348
 
349
  # 4. Handle other events normally
@@ -358,16 +427,21 @@ The final output should be a structured research report."""
358
  "Workflow ended without final event",
359
  iterations=iteration,
360
  )
361
- yield AgentEvent(
362
- type="complete",
363
- message=(
364
- f"Research completed after {iteration} agent rounds. "
365
- "Max iterations reached - results may be partial. "
366
- "Try a more specific query for better results."
367
- ),
368
- data={"iterations": iteration, "reason": "max_rounds_reached"},
369
- iteration=iteration,
370
- )
 
 
 
 
 
371
 
372
  except TimeoutError:
373
  async for event in self._handle_timeout(iteration):
 
247
  iteration=iteration,
248
  )
249
 
250
+ async def _force_synthesis(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
251
+ """Force synthesis when workflow ends without ReportAgent running (P1 Fix).
252
+
253
+ This is a safety net for when the Manager agent (especially 7B models)
254
+ fails to properly delegate to ReportAgent before workflow termination.
255
+ """
256
+ try:
257
+ from src.agents.magentic_agents import create_report_agent
258
+ from src.agents.state import get_magentic_state
259
+
260
+ state = get_magentic_state()
261
+ memory = state.memory
262
+
263
+ # Get evidence summary from memory
264
+ evidence_summary = await memory.get_context_summary()
265
+
266
+ # Create and invoke ReportAgent for synthesis
267
+ report_agent = create_report_agent(self._chat_client, domain=self.domain)
268
+
269
+ yield AgentEvent(
270
+ type="synthesizing",
271
+ message="Synthesizing research findings...",
272
+ iteration=iteration,
273
+ )
274
+
275
+ # Invoke ReportAgent directly
276
+ synthesis_result = await report_agent.run(
277
+ "Synthesize research report from this evidence. "
278
+ f"If evidence is sparse, say so.\n\n{evidence_summary}"
279
+ )
280
+
281
+ yield AgentEvent(
282
+ type="complete",
283
+ message=synthesis_result.text,
284
+ data={"reason": "forced_synthesis", "iterations": iteration},
285
+ iteration=iteration,
286
+ )
287
+ except Exception as synth_error:
288
+ logger.error("Forced synthesis failed", error=str(synth_error))
289
+ yield AgentEvent(
290
+ type="complete",
291
+ message=(
292
+ f"Research completed after {iteration} rounds. "
293
+ f"Evidence gathered but synthesis failed: {synth_error}"
294
+ ),
295
+ data={"reason": "forced_synthesis_failed", "iterations": iteration},
296
+ iteration=iteration,
297
+ )
298
+
299
+ async def run( # noqa: PLR0915 - Complex but necessary for event stream handling
300
+ self, query: str
301
+ ) -> AsyncGenerator[AgentEvent, None]:
302
  """
303
  Run the workflow.
304
 
 
346
 
347
  iteration = 0
348
  final_event_received = False
349
+ reporter_ran = False # P1 FIX: Track if ReportAgent produced output
350
 
351
  # ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
352
  # Upstream bug in _magentic.py flattens message.contents and sets message.text
 
380
  if isinstance(event, MagenticAgentMessageEvent):
381
  iteration += 1
382
 
383
+ # P1 FIX: Track if ReportAgent produced output
384
+ agent_name = (event.agent_id or "").lower()
385
+ if "report" in agent_name:
386
+ reporter_ran = True
387
+
388
  comp_event, prog_event = self._handle_completion_event(
389
  event, current_message_buffer, iteration
390
  )
 
397
  current_message_buffer = ""
398
  continue
399
 
400
+ # 3. Handle Final Events Inline (P2 Duplicate Report Fix + P1 Forced Synthesis)
401
  if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
402
+ if final_event_received:
403
+ continue # Skip duplicate final events
404
  final_event_received = True
405
+
406
+ # P1 FIX: Force synthesis if ReportAgent never ran
407
+ if not reporter_ran:
408
+ logger.warning(
409
+ "ReportAgent never ran - forcing synthesis",
410
+ iterations=iteration,
411
+ )
412
+ async for synth_event in self._force_synthesis(iteration):
413
+ yield synth_event
414
+ else:
415
+ yield self._handle_final_event(event, iteration, last_streamed_length)
416
  continue
417
 
418
  # 4. Handle other events normally
 
427
  "Workflow ended without final event",
428
  iterations=iteration,
429
  )
430
+ # P1 FIX: Force synthesis if ReportAgent never ran
431
+ if not reporter_ran:
432
+ async for synth_event in self._force_synthesis(iteration):
433
+ yield synth_event
434
+ else:
435
+ yield AgentEvent(
436
+ type="complete",
437
+ message=(
438
+ f"Research completed after {iteration} agent rounds. "
439
+ "Max iterations reached - results may be partial. "
440
+ "Try a more specific query for better results."
441
+ ),
442
+ data={"iterations": iteration, "reason": "max_rounds_reached"},
443
+ iteration=iteration,
444
+ )
445
 
446
  except TimeoutError:
447
  async for event in self._handle_timeout(iteration):