Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on Dec 5, 2025

Commit

efd0997

unverified ·

1 Parent(s): 02a3c53

fix(orchestrator): P2 Round Counter Semantic Mismatch - Semantic Progress Tracking (#132)

Browse files

* docs: Add P2 bug doc for round counter semantic mismatch

Discovered during SPEC-18 testing that progress shows "Round 11/5"
which is confusing. Root cause: iteration counts agent completions
(ExecutorCompletedEvent) but display treats it as workflow rounds.

Additional issues found:
- Dead code: _get_progress_message method is defined but never used
- Hardcoded 45 instead of self._EST_SECONDS_PER_ROUND constant
- Time estimate becomes useless ("~0s") once iteration exceeds max_rounds

* docs: Add senior review findings to P2 round counter bug

External review confirmed our analysis and added nuances:
- Manager agent also fires ExecutorCompletedEvent (explains 11 events)
- Time estimation is doubly flawed (wrong unit + wrong calibration)
- API discovery: ORCH_MSG_KIND_USER_TASK could track actual rounds

Review status: CONFIRMED - Ready for implementation

* refactor(orchestrator): implement semantic progress tracking

- Remove misleading 'Round X/Y' counter and time estimates
- Remove dead code (_get_progress_message, _EST_SECONDS_PER_ROUND)
- Implement semantic agent naming (e.g., 'reporter' -> 'ReportAgent')
- Update progress events to show 'Step N: AgentName task completed'
- Update tests to use valid domain agent IDs
- Fix P2_ROUND_COUNTER_SEMANTIC_MISMATCH

* docs: Mark P2 round counter bug as FIXED

Add Resolution section documenting the semantic progress tracking
implementation that replaced the broken "Round X/Y" display with
honest "Step N: AgentName task completed" format.

* style: Address CodeRabbit nitpicks

- Add language specifier to markdown code blocks (MD040)
- Remove duplicate horizontal rule separator
- Use structured logging in fallback synthesis error handler
- Use _smart_truncate for completion messages (avoids unnecessary ellipsis)

* style: Fix markdown lint (MD031/MD032 blank lines)

* fix(deps): Update urllib3 to 2.6.0 for security fixes

- GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
- Fix CompiledStateGraph type annotation (langgraph API change)

* fix(types): Ignore CompiledStateGraph type-arg for cross-version compat

Files changed (7) hide show

docs/bugs/P2_ROUND_COUNTER_SEMANTIC_MISMATCH.md +321 -0
pyproject.toml +2 -2
requirements.txt +2 -2
src/agents/graph/workflow.py +1 -1
src/orchestrators/advanced.py +31 -32
tests/unit/orchestrators/test_accumulator_pattern.py +25 -18
uv.lock +4 -4

docs/bugs/P2_ROUND_COUNTER_SEMANTIC_MISMATCH.md ADDED Viewed

	@@ -0,0 +1,321 @@

+# P2 Bug: Round Counter Semantic Mismatch
+**Status**: ✅ FIXED
+**Discovered**: 2025-12-05
+**Fixed**: 2025-12-05
+**Severity**: P2 (Display bug, confusing UX but not blocking)
+**Component**: `src/orchestrators/advanced.py`
+**Commit**: `40ca236c refactor(orchestrator): implement semantic progress tracking`
+---
+## Symptom
+Progress display shows impossible values like "Round 11/5":
+```text
+⏱️ **PROGRESS**: Round 11/5 (~0s remaining)
+```
+This is confusing to users - how can we be on round 11 when max is 5?
+---
+## Root Cause Analysis
+### The Semantic Mismatch
+Two different concepts are being conflated:
+| Concept | What It Means | Variable |
+|---------|---------------|----------|
+| **Workflow Round** | One orchestration cycle where manager delegates to agents | `self._max_rounds` (5) |
+| **Agent Completion** | One agent finishes its task | `state.iteration` (incremented on each `ExecutorCompletedEvent`) |
+### The Bug
+```python
+# Line 348: Increments on EVERY agent completion
+if isinstance(event, ExecutorCompletedEvent):
+    state.iteration += 1
+# Line 467: Displays as if it's a workflow round
+message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
+```
+### Why It Happens
+In a multi-agent workflow with 4 agents (searcher, hypothesizer, judge, reporter):
+- Each "round" involves the manager delegating to multiple agents
+- Each agent completion fires an `ExecutorCompletedEvent`
+- With 4+ agents, we see 4+ events per workflow round
+**Math**: 5 workflow rounds × 4 agents = 20+ agent completions, displayed as "Round 20/5"
+---
+## Evidence From Logs
+The session showed this progression:
+```text
+Round 1/5   - First agent completed
+Round 2/5   - Second agent completed
+Round 3/5   - Third agent completed
+Round 4/5   - Fourth agent completed
+Round 5/5   - Fifth agent completed (still in workflow round 1!)
+Round 6/5   - Now exceeds max (workflow round 2 starting)
+...
+Round 11/5  - Multiple workflow rounds have passed
+```
+---
+## Impact
+1. **User Confusion**: "Round 11/5" makes no sense
+2. **Time Estimation Wrong**: `rounds_remaining = max(5 - 11, 0) = 0` → always shows "~0s remaining"
+3. **No Actual Bug in Logic**: The workflow still runs correctly, just the display is wrong
+---
+## Proposed Fixes
+### Option A: Rename to "Agent Step" (Quick Fix)
+Change the display to reflect what we're actually counting:
+```python
+# Before
+message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
+# After
+message=f"Agent step {iteration} (Round limit: {self._max_rounds})"
+```
+**Pros**: Accurate, minimal code change
+**Cons**: Still doesn't track actual workflow rounds
+### Option B: Track Actual Workflow Rounds (Proper Fix)
+Track workflow rounds separately from agent completions:
+```python
+@dataclass
+class WorkflowState:
+    iteration: int = 0           # Agent completions (for internal tracking)
+    workflow_round: int = 0      # Actual orchestration rounds
+    current_message_buffer: str = ""
+    # ...
+# Increment workflow_round when manager delegates (different event type)
+# Display workflow_round in progress messages
+```
+**Pros**: Semantically correct, accurate time estimates
+**Cons**: Requires understanding which event signals a new round
+### Option C: Use Estimated Agent Count (Compromise)
+Estimate agents per round and display accordingly:
+```python
+AGENTS_PER_ROUND = 4  # searcher, hypothesizer, judge, reporter
+estimated_round = (iteration // AGENTS_PER_ROUND) + 1
+message=f"Round ~{estimated_round}/{self._max_rounds}"
+```
+**Pros**: Roughly accurate, no API research needed
+**Cons**: Estimation may be off if some agents are skipped
+---
+## Recommendation
+**Short-term**: Apply Option A (rename to "Agent step") - fixes the confusion immediately
+**Long-term**: Investigate Option B - determine which event signals a new workflow round in Microsoft Agent Framework
+---
+## Related Code
+```python
+# src/orchestrators/advanced.py
+# Line 348: Where iteration is incremented
+if isinstance(event, ExecutorCompletedEvent):
+    state.iteration += 1
+# Line 459-467: Where progress message is generated
+rounds_remaining = max(self._max_rounds - iteration, 0)
+est_seconds = rounds_remaining * 45
+progress_event = AgentEvent(
+    type="progress",
+    message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
+    iteration=iteration,
+)
+```
+---
+## Test Case
+```python
+def test_progress_display_never_exceeds_max_rounds():
+    """Progress should show Round X/Y where X <= Y."""
+    # Simulate 20 agent completions across 5 workflow rounds
+    # Assert displayed round never exceeds max_rounds
+    pass
+```
+---
+## Additional Issues Found During Analysis
+### Issue 2: Dead Code - Unused `_get_progress_message` Method
+```python
+# Line 196-205: Method is defined but NEVER called
+def _get_progress_message(self, iteration: int) -> str:
+    """Generate progress message with time estimation."""
+    # ... logic duplicated in _handle_completion_event
+```
+The same logic is duplicated inline in `_handle_completion_event` (lines 458-469).
+**Fix**: Either use the method or delete it.
+### Issue 3: Hardcoded Constant
+```python
+# Line 87: Class constant defined
+_EST_SECONDS_PER_ROUND: int = 45
+# Line 199: Uses constant (correct)
+est_seconds = rounds_remaining * self._EST_SECONDS_PER_ROUND
+# Line 460: Uses hardcoded 45 (inconsistent)
+est_seconds = rounds_remaining * 45
+```
+**Fix**: Use `self._EST_SECONDS_PER_ROUND` consistently.
+### Issue 4: Time Estimate Always Shows "~0s remaining"
+Since `iteration` quickly exceeds `max_rounds`:
+```python
+rounds_remaining = max(self._max_rounds - iteration, 0)
+# When iteration=11, max_rounds=5: rounds_remaining = max(5-11, 0) = 0
+# est_seconds = 0 * 45 = 0
+# Display: "~0s remaining"
+```
+The time estimate becomes useless after the first few agent completions.
+---
+## Complete Fix Recommendation
+1. **Rename display** from "Round X/5" to "Agent step X"
+2. **Delete dead code** - remove unused `_get_progress_message` method
+3. **Use constant** - replace hardcoded `45` with `self._EST_SECONDS_PER_ROUND`
+4. **Fix time estimate** - base it on agent steps, not workflow rounds
+---
+## Senior Review Findings (2025-12-05)
+**Reviewer**: External Gemini CLI Agent
+**Status**: CONFIRMED - Analysis accurate and sufficient
+### Additional Nuances Identified
+1. **Manager Agent Also Fires Events**: The Manager itself is an agent. If `ExecutorCompletedEvent` fires for Manager's turn completion PLUS sub-agents' completions, the count accelerates 2-3x faster per logical round. This explains why we saw 11 events for ~2-3 workflow rounds.
+2. **Time Estimation Doubly Flawed**:
+   - Not just bottoming out at 0
+   - `_EST_SECONDS_PER_ROUND` (45s) is calibrated for a FULL workflow round, not a single agent step
+   - If we counted agent steps correctly: 10 steps × 45s = 450s (way overestimated)
+   - A full round of 4 agents might only take 60s total
+3. **API Discovery - Can Track Actual Rounds**:
+   ```python
+   # These constants exist in agent_framework:
+   ORCH_MSG_KIND_INSTRUCTION = 'instruction'
+   ORCH_MSG_KIND_USER_TASK = 'user_task'
+   ORCH_MSG_KIND_TASK_LEDGER = 'task_ledger'
+   ORCH_MSG_KIND_NOTICE = 'notice'
+   ```
+   Counting `user_task` events from `MagenticOrchestratorMessageEvent` would align iteration with `max_rounds` 1:1, since this signals "Manager is beginning a new evaluation cycle."
+### Reviewer Recommendations
+1. **Option A (Rename)**: APPROVED - Safest, most honest fix
+2. **Option B (Track Workflow Rounds)**: DEFER - Requires verifying framework behavior across versions, risks brittleness
+3. **Remove Denominator**: Display `Agent Step {iteration}` without `/5` to avoid confusion
+4. **Delete Dead Code**: Confirmed `_get_progress_message` is never called
+5. **Fix Constants**: Use `self._EST_SECONDS_PER_ROUND` consistently
+### Review Status: ✅ PASSED - Ready for Implementation
+---
+## Resolution (2025-12-05)
+**Implemented**: Domain-driven semantic progress tracking
+### What Was Done
+1. **Deleted Dead Code**:
+   - Removed unused `_get_progress_message` method
+   - Removed unused `_EST_SECONDS_PER_ROUND` constant
+2. **Added Semantic Agent Mapping** (`_get_agent_semantic_name`):
+   ```python
+   def _get_agent_semantic_name(self, agent_id: str) -> str:
+       """Map internal agent ID to user-facing semantic name."""
+       name = agent_id.lower()
+       if SEARCHER_AGENT_ID in name:
+           return "SearchAgent"
+       if JUDGE_AGENT_ID in name:
+           return "JudgeAgent"
+       if HYPOTHESIZER_AGENT_ID in name:
+           return "HypothesisAgent"
+       if REPORTER_AGENT_ID in name:
+           return "ReportAgent"
+       return "ManagerAgent"
+   ```
+3. **Changed Progress Display**:
+   - Before: `"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"`
+   - After: `"Step {iteration}: {semantic_name} task completed"`
+4. **Changed Initial Thinking Message**:
+   - Before: `"Multi-agent reasoning in progress (5 rounds max)... Estimated time: 3-5 minutes."`
+   - After: `"Multi-agent reasoning in progress (Limit: 5 Manager rounds)... Allocating time for deep research..."`
+5. **Updated Tests**: Changed test mocks to use domain-specific agent IDs (`searcher`, `judge`) instead of arbitrary strings.
+### Result
+- Before: `⏱️ **PROGRESS**: Round 11/5 (~0s remaining)` (confusing, broken math)
+- After: `⏱️ **PROGRESS**: Step 11: ReportAgent task completed` (accurate, professional)
+### Design Decision
+Rather than patching the counter display or trying to track "actual workflow rounds" (which requires deep framework integration), we chose **honest reporting**: Show exactly what happened (which agent completed) without making false promises about progress percentages or time estimates.
+This follows the Clean Code principle: "Don't lie to the user."
+---
+## References
+- SPEC-18: Agent Framework Core Upgrade (where ExecutorCompletedEvent was introduced)
+- Microsoft Agent Framework documentation on workflow rounds vs agent executions

pyproject.toml CHANGED Viewed

@@ -36,8 +36,8 @@ dependencies = [
     "langchain-core>=0.3.21,<1.0",
     "langchain-huggingface>=0.1.2,<1.0",
     "langgraph-checkpoint-sqlite>=3.0.0,<4.0",  # 3.0.0 required for GHSA-wwqv-p2pp-99h5 fix
-    # Security: Pin urllib3 to fix GHSA-48p4-8xcf-vxj5 and GHSA-pq67-6m6q-mj2v
-    "urllib3>=2.5.0",
 ]
 [project.optional-dependencies]

     "langchain-core>=0.3.21,<1.0",
     "langchain-huggingface>=0.1.2,<1.0",
     "langgraph-checkpoint-sqlite>=3.0.0,<4.0",  # 3.0.0 required for GHSA-wwqv-p2pp-99h5 fix
+    # Security: Pin urllib3 to fix GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
+    "urllib3>=2.6.0",
 ]
 [project.optional-dependencies]

requirements.txt CHANGED Viewed

@@ -42,8 +42,8 @@ langchain-core>=0.3.21,<1.0
 langchain-huggingface>=0.1.2,<1.0
 langgraph-checkpoint-sqlite>=3.0.0,<4.0
-# Security: Pin urllib3 to fix GHSA-48p4-8xcf-vxj5 and GHSA-pq67-6m6q-mj2v
-urllib3>=2.5.0
 # Multi-agent orchestration (Advanced mode) - from [magentic] optional
 agent-framework-core==1.0.0b251204

 langchain-huggingface>=0.1.2,<1.0
 langgraph-checkpoint-sqlite>=3.0.0,<4.0
+# Security: Pin urllib3 to fix GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
+urllib3>=2.6.0
 # Multi-agent orchestration (Advanced mode) - from [magentic] optional
 agent-framework-core==1.0.0b251204

src/agents/graph/workflow.py CHANGED Viewed

@@ -25,7 +25,7 @@ def create_research_graph(
     llm: BaseChatModel | None = None,
     checkpointer: BaseCheckpointSaver[Any] | None = None,
     embedding_service: EmbeddingServiceProtocol | None = None,
-) -> CompiledStateGraph[Any, Any, Any, Any]:
     """Build the research state graph.
     Args:

     llm: BaseChatModel | None = None,
     checkpointer: BaseCheckpointSaver[Any] | None = None,
     embedding_service: EmbeddingServiceProtocol | None = None,
+) -> CompiledStateGraph[Any]:  # type: ignore[type-arg]
     """Build the research state graph.
     Args:

src/orchestrators/advanced.py CHANGED Viewed

@@ -83,9 +83,6 @@ class AdvancedOrchestrator(OrchestratorProtocol):
     - Configurable timeouts and round limits
     """
-    # Estimated seconds per coordination round (for progress UI)
-    _EST_SECONDS_PER_ROUND: int = 45
     def __init__(
         self,
         max_rounds: int = 5,
@@ -193,16 +190,18 @@ Focus on:
 The final output should be a structured research report."""
-    def _get_progress_message(self, iteration: int) -> str:
-        """Generate progress message with time estimation."""
-        rounds_remaining = max(self._max_rounds - iteration, 0)
-        est_seconds = rounds_remaining * self._EST_SECONDS_PER_ROUND
-        if est_seconds >= 60:
-            est_display = f"{est_seconds // 60}m {est_seconds % 60}s"
-        else:
-            est_display = f"{est_seconds}s"
-        return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
     async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
         """Yield initialization events."""
@@ -219,7 +218,9 @@ The final output should be a structured research report."""
         )
     async def _synthesize_fallback(
-        self, iteration: int, reason: str
     ) -> AsyncGenerator[AgentEvent, None]:
         """
         Unified fallback synthesis for all termination scenarios.
@@ -263,7 +264,7 @@ The final output should be a structured research report."""
                 iteration=iteration,
             )
         except Exception as synth_error:
-            logger.error(f"{reason} synthesis failed", error=str(synth_error))
             yield AgentEvent(
                 type="complete",
                 message=f"Research completed. Synthesis failed: {synth_error}",
@@ -272,7 +273,8 @@ The final output should be a structured research report."""
             )
     async def run(  # noqa: PLR0915 - Complex but necessary for event stream handling
-        self, query: str
     ) -> AsyncGenerator[AgentEvent, None]:
         """
         Run the workflow.
@@ -312,9 +314,8 @@ The final output should be a structured research report."""
         yield AgentEvent(
             type="thinking",
             message=(
-                f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
-                f"Estimated time: {self._max_rounds * 45 // 60}-"
-                f"{self._max_rounds * 60 // 60} minutes."
             ),
             iteration=0,
         )
@@ -434,7 +435,10 @@ The final output should be a structured research report."""
             )
     def _handle_completion_event(
-        self, event: ExecutorCompletedEvent, buffer: str, iteration: int
     ) -> tuple[AgentEvent, AgentEvent]:
         """Handle an agent completion event using the accumulated buffer."""
         # Use buffer if available, otherwise fall back cautiously
@@ -446,25 +450,19 @@ The final output should be a structured research report."""
             # The result is often in event.result or similar, but buffering is safer
             text_content = "Action completed (Tool Call)"
-        agent_name = getattr(event, "executor_id", "unknown") or "unknown"
-        event_type = self._get_event_type_for_agent(agent_name)
         completion_event = AgentEvent(
             type=event_type,
-            message=f"{agent_name}: {text_content[:200]}...",
             iteration=iteration,
         )
-        # Progress update
-        rounds_remaining = max(self._max_rounds - iteration, 0)
-        est_seconds = rounds_remaining * 45
-        est_display = (
-            f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
-        )
         progress_event = AgentEvent(
             type="progress",
-            message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
             iteration=iteration,
         )
@@ -552,7 +550,8 @@ The final output should be a structured research report."""
         return ""
     def _get_event_type_for_agent(
-        self, agent_name: str
     ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
         """Map agent name to appropriate event type.

     - Configurable timeouts and round limits
     """
     def __init__(
         self,
         max_rounds: int = 5,
 The final output should be a structured research report."""
+    def _get_agent_semantic_name(self, agent_id: str) -> str:
+        """Map internal agent ID to user-facing semantic name."""
+        name = agent_id.lower()
+        if SEARCHER_AGENT_ID in name:
+            return "SearchAgent"
+        if JUDGE_AGENT_ID in name:
+            return "JudgeAgent"
+        if HYPOTHESIZER_AGENT_ID in name:
+            return "HypothesisAgent"
+        if REPORTER_AGENT_ID in name:
+            return "ReportAgent"
+        return "ManagerAgent"
     async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
         """Yield initialization events."""
         )
     async def _synthesize_fallback(
+        self,
+        iteration: int,
+        reason: str,
     ) -> AsyncGenerator[AgentEvent, None]:
         """
         Unified fallback synthesis for all termination scenarios.
                 iteration=iteration,
             )
         except Exception as synth_error:
+            logger.error("Fallback synthesis failed", reason=reason, error=str(synth_error))
             yield AgentEvent(
                 type="complete",
                 message=f"Research completed. Synthesis failed: {synth_error}",
             )
     async def run(  # noqa: PLR0915 - Complex but necessary for event stream handling
+        self,
+        query: str,
     ) -> AsyncGenerator[AgentEvent, None]:
         """
         Run the workflow.
         yield AgentEvent(
             type="thinking",
             message=(
+                f"Multi-agent reasoning in progress (Limit: {self._max_rounds} Manager rounds)... "
+                "Allocating time for deep research..."
             ),
             iteration=0,
         )
             )
     def _handle_completion_event(
+        self,
+        event: ExecutorCompletedEvent,
+        buffer: str,
+        iteration: int,
     ) -> tuple[AgentEvent, AgentEvent]:
         """Handle an agent completion event using the accumulated buffer."""
         # Use buffer if available, otherwise fall back cautiously
             # The result is often in event.result or similar, but buffering is safer
             text_content = "Action completed (Tool Call)"
+        agent_id = getattr(event, "executor_id", "unknown") or "unknown"
+        event_type = self._get_event_type_for_agent(agent_id)
+        semantic_name = self._get_agent_semantic_name(agent_id)
         completion_event = AgentEvent(
             type=event_type,
+            message=f"{semantic_name}: {self._smart_truncate(text_content)}",
             iteration=iteration,
         )
         progress_event = AgentEvent(
             type="progress",
+            message=f"Step {iteration}: {semantic_name} task completed",
             iteration=iteration,
         )
         return ""
     def _get_event_type_for_agent(
+        self,
+        agent_name: str,
     ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
         """Map agent name to appropriate event type.

tests/unit/orchestrators/test_accumulator_pattern.py CHANGED Viewed

@@ -174,10 +174,11 @@ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
     Input: Updates ("Hello", " World") -> Completed
     Expected: AgentEvent with "Hello World"
     """
     events = [
-        MockAgentRunUpdateEvent("Hello", author_name="ChatBot"),
-        MockAgentRunUpdateEvent(" World", author_name="ChatBot"),
-        MockExecutorCompletedEvent(executor_id="ChatBot"),
     ]
     async def mock_stream(*args, **kwargs):
@@ -192,13 +193,13 @@ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
-    # Find the completion event for ChatBot (non-streaming)
     chat_events = [
-        e for e in generated_events if "ChatBot" in str(e.message) and e.type != "streaming"
     ]
     assert len(chat_events) >= 1, (
-        f"Expected ChatBot events, got: {[e.message for e in generated_events]}"
     )
     final_event = chat_events[0]
@@ -214,8 +215,9 @@ async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
     Input: No Deltas -> Completed
     Expected: AgentEvent with fallback text
     """
     events = [
-        MockExecutorCompletedEvent(executor_id="SearchAgent"),
     ]
     async def mock_stream(*args, **kwargs):
@@ -251,11 +253,12 @@ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
     Verify buffer clears between agents.
     Agent B should NOT inherit Agent A's accumulated text.
     """
     events = [
-        MockAgentRunUpdateEvent("Agent A says hi", author_name="AgentA"),
-        MockExecutorCompletedEvent(executor_id="AgentA"),
-        MockAgentRunUpdateEvent("Agent B responds", author_name="AgentB"),
-        MockExecutorCompletedEvent(executor_id="AgentB"),
     ]
     async def mock_stream(*args, **kwargs):
@@ -272,18 +275,22 @@ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
     # Find non-streaming events for each agent
     agent_a_events = [
-        e for e in generated_events if "AgentA" in str(e.message) and e.type != "streaming"
     ]
     agent_b_events = [
-        e for e in generated_events if "AgentB" in str(e.message) and e.type != "streaming"
     ]
     # Both should have completion events
-    assert len(agent_a_events) >= 1, f"No AgentA events: {[e.message for e in generated_events]}"
-    assert len(agent_b_events) >= 1, f"No AgentB events: {[e.message for e in generated_events]}"
     # Agent A should have its own text
-    assert "Agent A" in agent_a_events[0].message
     # Agent B should have its own text, NOT Agent A's
-    assert "Agent B" in agent_b_events[0].message
-    assert "Agent A" not in agent_b_events[0].message, "Buffer not cleared between agents!"

     Input: Updates ("Hello", " World") -> Completed
     Expected: AgentEvent with "Hello World"
     """
+    # Use "searcher" to map to "SearchAgent"
     events = [
+        MockAgentRunUpdateEvent("Hello", author_name="searcher"),
+        MockAgentRunUpdateEvent(" World", author_name="searcher"),
+        MockExecutorCompletedEvent(executor_id="searcher"),
     ]
     async def mock_stream(*args, **kwargs):
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
+    # Find the completion event for SearchAgent (non-streaming)
     chat_events = [
+        e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
     ]
     assert len(chat_events) >= 1, (
+        f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
     )
     final_event = chat_events[0]
     Input: No Deltas -> Completed
     Expected: AgentEvent with fallback text
     """
+    # Use "searcher" to map to "SearchAgent"
     events = [
+        MockExecutorCompletedEvent(executor_id="searcher"),
     ]
     async def mock_stream(*args, **kwargs):
     Verify buffer clears between agents.
     Agent B should NOT inherit Agent A's accumulated text.
     """
+    # Use "searcher" (SearchAgent) and "judge" (JudgeAgent)
     events = [
+        MockAgentRunUpdateEvent("Searcher says hi", author_name="searcher"),
+        MockExecutorCompletedEvent(executor_id="searcher"),
+        MockAgentRunUpdateEvent("Judge responds", author_name="judge"),
+        MockExecutorCompletedEvent(executor_id="judge"),
     ]
     async def mock_stream(*args, **kwargs):
     # Find non-streaming events for each agent
     agent_a_events = [
+        e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
     ]
     agent_b_events = [
+        e for e in generated_events if "JudgeAgent" in str(e.message) and e.type != "streaming"
     ]
     # Both should have completion events
+    assert len(agent_a_events) >= 1, (
+        f"No SearchAgent events: {[e.message for e in generated_events]}"
+    )
+    assert len(agent_b_events) >= 1, (
+        f"No JudgeAgent events: {[e.message for e in generated_events]}"
+    )
     # Agent A should have its own text
+    assert "Searcher" in agent_a_events[0].message
     # Agent B should have its own text, NOT Agent A's
+    assert "Judge" in agent_b_events[0].message
+    assert "Searcher" not in agent_b_events[0].message, "Buffer not cleared between agents!"

uv.lock CHANGED Viewed

@@ -1169,7 +1169,7 @@ requires-dist = [
     { name = "structlog", specifier = ">=24.1" },
     { name = "tenacity", specifier = ">=8.2" },
     { name = "typer", marker = "extra == 'dev'", specifier = ">=0.9.0" },
-    { name = "urllib3", specifier = ">=2.5.0" },
     { name = "xmltodict", specifier = ">=0.13" },
 ]
 provides-extras = ["dev", "magentic", "rag"]
@@ -6175,11 +6175,11 @@ wheels = [
 [[package]]
 name = "urllib3"
-version = "2.5.0"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/15/22/9ee70a2574a4f4599c47dd506532914ce044817c7752a79b6a51286319bc/urllib3-2.5.0.tar.gz", hash = "sha256:3fc47733c7e419d4bc3f6b3dc2b4f890bb743906a30d56ba4a5bfa4bbff92760", size = 393185 }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl", hash = "sha256:e6b01673c0fa6a13e374b50871808eb3bf7046c4b125b216f6bf1cc604cff0dc", size = 129795 },
 ]
 [[package]]

     { name = "structlog", specifier = ">=24.1" },
     { name = "tenacity", specifier = ">=8.2" },
     { name = "typer", marker = "extra == 'dev'", specifier = ">=0.9.0" },
+    { name = "urllib3", specifier = ">=2.6.0" },
     { name = "xmltodict", specifier = ">=0.13" },
 ]
 provides-extras = ["dev", "magentic", "rag"]
 [[package]]
 name = "urllib3"
+version = "2.6.0"
 source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/1c/43/554c2569b62f49350597348fc3ac70f786e3c32e7f19d266e19817812dd3/urllib3-2.6.0.tar.gz", hash = "sha256:cb9bcef5a4b345d5da5d145dc3e30834f58e8018828cbc724d30b4cb7d4d49f1", size = 432585 }
 wheels = [
+    { url = "https://files.pythonhosted.org/packages/56/1a/9ffe814d317c5224166b23e7c47f606d6e473712a2fad0f704ea9b99f246/urllib3-2.6.0-py3-none-any.whl", hash = "sha256:c90f7a39f716c572c4e3e58509581ebd83f9b59cced005b7db7ad2d22b0db99f", size = 131083 },
 ]
 [[package]]