Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 11 days ago

Commit

1bfc1df

1 Parent(s): 0b27b1c

fix(orchestrator): Force synthesis when ReportAgent doesn't run (P1)

## Problem
The workflow terminates without ReportAgent producing a synthesis report.
Users see search/hypothesis/judge output but get "Research complete." with
no actual research report. This primarily affects Free Tier (HuggingFace 7B
manager model) which doesn't reliably delegate to ReportAgent.

## Solution
1. Track `reporter_ran` flag when ReportAgent produces output
2. On workflow termination, if ReportAgent never ran, force synthesis via
`_force_synthesis()` method (similar to `_handle_timeout()`)
3. Skip duplicate final events (both MagenticFinalResultEvent and
WorkflowOutputEvent were yielding "Research complete.")

## Testing
- 313 unit tests pass
- Linting and type checking pass

Fixes: P1 No Synthesis Free Tier

Files changed (3) hide show

docs/bugs/ACTIVE_BUGS.md +1 -26
docs/bugs/P1_NO_SYNTHESIS_FREE_TIER.md +165 -0
src/orchestrators/advanced.py +87 -13

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -9,32 +9,6 @@
 ## Currently Active Bugs
-### P2 - Duplicate Report Content in Output
-**File:** `docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md`
-**Status:** OPEN - UX Bug
-**Problem:** The final research report appears twice in the UI - once as streaming content, then again as a complete event. This is a **stack bug**, not a model issue.
-**Root Cause:** Both `MagenticFinalResultEvent` and `WorkflowOutputEvent` emit the full report content that was already streamed. No deduplication exists.
-**Recommended Fix:** Handle final events inline in `run()` loop where buffer context exists. Track `last_streamed_length`; if > 100 chars, emit "Research complete." instead of full content.
----
-### P2 - First Agent Turn Exceeds Workflow Timeout
-**File:** `docs/bugs/P2_FIRST_TURN_TIMEOUT.md`
-**Status:** OPEN - Performance Bug
-**Problem:** The search agent's first turn can exceed the 5-minute workflow timeout, causing `iterations=0` at timeout. Users get partial research results.
-**Root Cause:** Search agent does too much work in a single turn: 3 API searches → 30 results → 30 embedding calls → 30 ChromaDB stores. The timeout is on the WORKFLOW, not individual agent turns.
-**Recommended Fix:** Reduce `max_results_per_tool` from 10 to 5; increase `advanced_timeout` to 600s (10 min).
----
 ### P3 - Progress Bar Positioning in ChatInterface
 **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
@@ -83,6 +57,7 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
 - **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
 ### P1 Bugs (All FIXED)
 - **P1 Free Tier Tool Execution Failure** - FIXED in PR fix/P1-free-tier-tool-execution, removed premature marker
 - **P1 Gradio Example Click Auto-Submits** - FIXED in PR #120, prevents auto-submit on example click
 - **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause

 ## Currently Active Bugs
 ### P3 - Progress Bar Positioning in ChatInterface
 **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
 - **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
 ### P1 Bugs (All FIXED)
+- **P1 No Synthesis Free Tier** - FIXED in PR fix/p1-forced-synthesis, forced synthesis safety net when ReportAgent doesn't run
 - **P1 Free Tier Tool Execution Failure** - FIXED in PR fix/P1-free-tier-tool-execution, removed premature marker
 - **P1 Gradio Example Click Auto-Submits** - FIXED in PR #120, prevents auto-submit on example click
 - **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause

docs/bugs/P1_NO_SYNTHESIS_FREE_TIER.md ADDED Viewed

	@@ -0,0 +1,165 @@

+# P1 Bug: No Synthesis Report in Free Tier (Premature Workflow Termination)
+**Date**: 2025-12-04
+**Status**: FIXED (PR fix/p1-forced-synthesis)
+**Severity**: P1 (Critical UX - No usable output from research)
+**Component**: `src/orchestrators/advanced.py`
+**Affects**: Free Tier (HuggingFace) primarily, potentially Paid Tier
+---
+## Executive Summary
+The workflow terminates without the ReportAgent ever producing a synthesis report. Users see search results and hypotheses streaming, but the final output is just "Research complete." with no actual research report. This is caused by the 7B Manager model failing to properly delegate to ReportAgent before workflow termination.
+---
+## Symptom
+```
+📚 **SEARCH_COMPLETE**: searcher: [search results]
+⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)
+🔬 **HYPOTHESIZING**: hypothesizer: [hypotheses]
+⏱️ **PROGRESS**: Round 2/5 (~2m 15s remaining)
+✅ **JUDGE_COMPLETE**: judge: [asks for more evidence]
+⏱️ **PROGRESS**: Round 4/5 (~45s remaining)
+Research complete.
+Research complete.   ← NO SYNTHESIS REPORT!
+```
+The workflow runs through multiple agents (Search, Hypothesis, Judge) but never reaches the ReportAgent. The user receives no usable research report.
+---
+## Root Cause Analysis
+### Primary Issue: Manager Model Failure
+The `with_standard_manager()` in Microsoft Agent Framework uses the provided chat client (HuggingFace 7B model) to coordinate agents. The 7B model:
+1. **Cannot follow complex multi-step instructions** - The manager prompt instructs: "When JudgeAgent says SUFFICIENT EVIDENCE → delegate to ReportAgent." The 7B model doesn't reliably follow this.
+2. **Triggers premature termination** - The framework has `max_stall_count=3` and `max_reset_count=2`. If the manager keeps making the same delegation or gets confused, the workflow terminates.
+3. **Emits final event without synthesis** - The framework sends `MagenticFinalResultEvent` or `WorkflowOutputEvent` without ReportAgent ever running.
+### Secondary Issue: Duplicate Complete Events
+Both `MagenticFinalResultEvent` AND `WorkflowOutputEvent` are emitted when the workflow ends. The previous code handled both, yielding "Research complete." twice.
+---
+## The Fix
+### 1. Track ReportAgent Execution (Forced Synthesis)
+Add a `reporter_ran` flag that tracks whether ReportAgent produced output:
+```python
+reporter_ran = False  # P1 FIX: Track if ReportAgent produced output
+# In MagenticAgentMessageEvent handler:
+agent_name = (event.agent_id or "").lower()
+if "report" in agent_name:
+    reporter_ran = True
+```
+### 2. Force Synthesis on Final Event
+If the workflow ends without ReportAgent running, force synthesis:
+```python
+if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
+    if not reporter_ran:
+        logger.warning("ReportAgent never ran - forcing synthesis")
+        async for synth_event in self._force_synthesis(iteration):
+            yield synth_event
+    else:
+        yield self._handle_final_event(event, iteration, last_streamed_length)
+```
+### 3. `_force_synthesis()` Method
+Similar to `_handle_timeout()`, invokes ReportAgent directly:
+```python
+async def _force_synthesis(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
+    """Force synthesis when workflow ends without ReportAgent running."""
+    state = get_magentic_state()
+    evidence_summary = await state.memory.get_context_summary()
+    report_agent = create_report_agent(self._chat_client, domain=self.domain)
+    yield AgentEvent(type="synthesizing", message="Synthesizing research findings...")
+    synthesis_result = await report_agent.run(
+        f"Synthesize research report from this evidence.\n\n{evidence_summary}"
+    )
+    yield AgentEvent(type="complete", message=synthesis_result.text)
+```
+### 4. Skip Duplicate Final Events
+Prevent "Research complete." appearing twice:
+```python
+if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
+    if final_event_received:
+        continue  # Skip duplicate final events
+    final_event_received = True
+```
+---
+## Why This Is The Correct Architecture
+| Alternative | Why Wrong |
+|-------------|-----------|
+| Improve manager prompt | 7B models have fundamental reasoning limitations |
+| Use larger model for manager | Defeats "free tier" purpose |
+| Wait for upstream fix | Framework may never change; we control our code |
+| **Forced synthesis safety net** | ✅ Guarantees output regardless of manager behavior |
+The `_force_synthesis()` pattern is a **defensive architecture**. It guarantees users always get a research report, even if:
+- The manager model fails to delegate properly
+- The workflow hits stall/reset limits
+- Any unexpected termination occurs
+---
+## Files Modified
+| File | Change |
+|------|--------|
+| `src/orchestrators/advanced.py` | Added `reporter_ran` tracking |
+| `src/orchestrators/advanced.py` | Added `_force_synthesis()` method |
+| `src/orchestrators/advanced.py` | Added duplicate final event skipping |
+| `src/orchestrators/advanced.py` | Added forced synthesis in final event handler |
+| `src/orchestrators/advanced.py` | Added forced synthesis in max rounds fallback |
+---
+## Test Plan
+1. **Free Tier**: Run query, verify synthesis report is always generated
+2. **Paid Tier**: Run query, verify no regression in OpenAI behavior
+3. **Timeout**: Verify existing timeout synthesis still works
+4. **Max Rounds**: Verify synthesis happens even at max rounds
+---
+## Related
+- P2 Duplicate Report Bug (separate issue, also fixed in this PR)
+- P2 First Turn Timeout Bug (previously fixed)
+- Manager model limitations are fundamental to 7B models
+- OpenAI tier works because GPT-5 follows instructions better
+---
+## Lessons Learned
+1. **Defensive architecture** - Don't trust upstream components to always behave correctly
+2. **Tracking flags** - Simple boolean flags can enable powerful safety nets
+3. **AI-native challenges** - When using AI models as infrastructure components, build in fallbacks for model failures
+4. **Regression prevention** - This bug was likely introduced when we unified the architecture; comprehensive test coverage is critical

src/orchestrators/advanced.py CHANGED Viewed

@@ -247,7 +247,58 @@ The final output should be a structured research report."""
                 iteration=iteration,
             )
-    async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
         """
         Run the workflow.
@@ -295,6 +346,7 @@ The final output should be a structured research report."""
         iteration = 0
         final_event_received = False
         # ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
         # Upstream bug in _magentic.py flattens message.contents and sets message.text
@@ -328,6 +380,11 @@ The final output should be a structured research report."""
                     if isinstance(event, MagenticAgentMessageEvent):
                         iteration += 1
                         comp_event, prog_event = self._handle_completion_event(
                             event, current_message_buffer, iteration
                         )
@@ -340,10 +397,22 @@ The final output should be a structured research report."""
                         current_message_buffer = ""
                         continue
-                    # 3. Handle Final Events Inline (P2 Duplicate Report Fix)
                     if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
                         final_event_received = True
-                        yield self._handle_final_event(event, iteration, last_streamed_length)
                         continue
                     # 4. Handle other events normally
@@ -358,16 +427,21 @@ The final output should be a structured research report."""
                     "Workflow ended without final event",
                     iterations=iteration,
                 )
-                yield AgentEvent(
-                    type="complete",
-                    message=(
-                        f"Research completed after {iteration} agent rounds. "
-                        "Max iterations reached - results may be partial. "
-                        "Try a more specific query for better results."
-                    ),
-                    data={"iterations": iteration, "reason": "max_rounds_reached"},
-                    iteration=iteration,
-                )
         except TimeoutError:
             async for event in self._handle_timeout(iteration):

                 iteration=iteration,
             )
+    async def _force_synthesis(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
+        """Force synthesis when workflow ends without ReportAgent running (P1 Fix).
+        This is a safety net for when the Manager agent (especially 7B models)
+        fails to properly delegate to ReportAgent before workflow termination.
+        """
+        try:
+            from src.agents.magentic_agents import create_report_agent
+            from src.agents.state import get_magentic_state
+            state = get_magentic_state()
+            memory = state.memory
+            # Get evidence summary from memory
+            evidence_summary = await memory.get_context_summary()
+            # Create and invoke ReportAgent for synthesis
+            report_agent = create_report_agent(self._chat_client, domain=self.domain)
+            yield AgentEvent(
+                type="synthesizing",
+                message="Synthesizing research findings...",
+                iteration=iteration,
+            )
+            # Invoke ReportAgent directly
+            synthesis_result = await report_agent.run(
+                "Synthesize research report from this evidence. "
+                f"If evidence is sparse, say so.\n\n{evidence_summary}"
+            )
+            yield AgentEvent(
+                type="complete",
+                message=synthesis_result.text,
+                data={"reason": "forced_synthesis", "iterations": iteration},
+                iteration=iteration,
+            )
+        except Exception as synth_error:
+            logger.error("Forced synthesis failed", error=str(synth_error))
+            yield AgentEvent(
+                type="complete",
+                message=(
+                    f"Research completed after {iteration} rounds. "
+                    f"Evidence gathered but synthesis failed: {synth_error}"
+                ),
+                data={"reason": "forced_synthesis_failed", "iterations": iteration},
+                iteration=iteration,
+            )
+    async def run(  # noqa: PLR0915 - Complex but necessary for event stream handling
+        self, query: str
+    ) -> AsyncGenerator[AgentEvent, None]:
         """
         Run the workflow.
         iteration = 0
         final_event_received = False
+        reporter_ran = False  # P1 FIX: Track if ReportAgent produced output
         # ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
         # Upstream bug in _magentic.py flattens message.contents and sets message.text
                     if isinstance(event, MagenticAgentMessageEvent):
                         iteration += 1
+                        # P1 FIX: Track if ReportAgent produced output
+                        agent_name = (event.agent_id or "").lower()
+                        if "report" in agent_name:
+                            reporter_ran = True
                         comp_event, prog_event = self._handle_completion_event(
                             event, current_message_buffer, iteration
                         )
                         current_message_buffer = ""
                         continue
+                    # 3. Handle Final Events Inline (P2 Duplicate Report Fix + P1 Forced Synthesis)
                     if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
+                        if final_event_received:
+                            continue  # Skip duplicate final events
                         final_event_received = True
+                        # P1 FIX: Force synthesis if ReportAgent never ran
+                        if not reporter_ran:
+                            logger.warning(
+                                "ReportAgent never ran - forcing synthesis",
+                                iterations=iteration,
+                            )
+                            async for synth_event in self._force_synthesis(iteration):
+                                yield synth_event
+                        else:
+                            yield self._handle_final_event(event, iteration, last_streamed_length)
                         continue
                     # 4. Handle other events normally
                     "Workflow ended without final event",
                     iterations=iteration,
                 )
+                # P1 FIX: Force synthesis if ReportAgent never ran
+                if not reporter_ran:
+                    async for synth_event in self._force_synthesis(iteration):
+                        yield synth_event
+                else:
+                    yield AgentEvent(
+                        type="complete",
+                        message=(
+                            f"Research completed after {iteration} agent rounds. "
+                            "Max iterations reached - results may be partial. "
+                            "Try a more specific query for better results."
+                        ),
+                        data={"iterations": iteration, "reason": "max_rounds_reached"},
+                        iteration=iteration,
+                    )
         except TimeoutError:
             async for event in self._handle_timeout(iteration):