Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on Dec 6, 2025

Commit

fc78d2d

1 Parent(s): b6a1a09

fix(orchestrator): P2 - Silence ExecutorCompletedEvent UI noise

## Problem
After report synthesis, extra "JUDGING: ManagerAgent" and "PROGRESS: Step 11"
events appeared in the UI, confusing users. Root cause: we were treating
`ExecutorCompletedEvent` as a UI event when it's actually internal framework
bookkeeping (auto-emitted by MS Agent Framework for every executor).

## Solution
1. **Silence ExecutorCompletedEvent**: Remove UI event emission, keep only
internal state tracking (reporter_ran flag)
2. **Add metadata filtering**: Filter out `task_ledger` and `instruction`
messages from AgentRunUpdateEvent stream
3. **Remove dead code**: Delete unused `_handle_completion_event` and
`_get_event_type_for_agent` methods

## Changes
- src/orchestrators/advanced.py: Silence completion events, add metadata filter
- tests/unit/test_orchestrator_noise.py: New regression tests
- tests/unit/orchestrators/test_accumulator_pattern.py: Update expectations
- docs/bugs/P2_EXECUTOR_COMPLETED_EVENT_UI_NOISE.md: Full bug documentation

## Validation
- Senior review confirmed analysis (external agent audit)
- All 304 unit tests pass
- Aligns with MS Agent Framework sample patterns

Closes: P2 ExecutorCompletedEvent UI Noise bug

Files changed (4) hide show

docs/bugs/P2_EXECUTOR_COMPLETED_EVENT_UI_NOISE.md +351 -0
src/orchestrators/advanced.py +14 -68
tests/unit/orchestrators/test_accumulator_pattern.py +49 -42
tests/unit/test_orchestrator_noise.py +113 -0

docs/bugs/P2_EXECUTOR_COMPLETED_EVENT_UI_NOISE.md ADDED Viewed

	@@ -0,0 +1,351 @@

+# P2 Bug: ExecutorCompletedEvent UI Noise
+**Status**: VALIDATED - Ready for Implementation
+**Discovered**: 2025-12-05
+**Senior Review**: 2025-12-05 (External agent audit confirmed analysis)
+**Severity**: P2 (UX noise, confusing but not blocking)
+**Component**: `src/orchestrators/advanced.py`
+---
+## Symptom
+After the report synthesis completes, extra events appear in the UI:
+```text
+📝 **SYNTHESIZING**: Synthesizing research findings...
+[...full report content...]
+🧠 **JUDGING**: ManagerAgent: Action completed (Tool Call)
+⏱️ **PROGRESS**: Step 11: ManagerAgent task completed
+```
+The "JUDGING" and "PROGRESS" events appear AFTER the report is already displayed, creating confusion.
+---
+## Root Cause Analysis
+### The Misunderstanding
+We're treating `ExecutorCompletedEvent` as a **UI event** when it's actually an **internal framework bookkeeping event**.
+### Microsoft Agent Framework Design
+Looking at `agent_framework/_workflows/_executor.py` (lines 266-281):
+```python
+# This is auto-emitted by the framework - NOT for UI consumption
+with _framework_event_origin():
+    completed_event = ExecutorCompletedEvent(self.id, sent_messages if sent_messages else None)
+await context.add_event(completed_event)
+```
+The framework emits `ExecutorCompletedEvent` automatically after every executor handler completes. This includes:
+- SearchAgent completing a search
+- JudgeAgent completing evaluation
+- ReportAgent completing synthesis
+- **ManagerAgent completing coordination** (this is the problem)
+### What the MS Framework Sample Does
+From `samples/getting_started/workflows/orchestration/magentic.py`:
+```python
+async for event in workflow.run_stream(task):
+    if isinstance(event, AgentRunUpdateEvent):
+        # Handle streaming with metadata
+        props = event.data.additional_properties if event.data else None
+        event_type = props.get("magentic_event_type") if props else None
+        # ...
+    elif isinstance(event, WorkflowOutputEvent):
+        # Handle final output
+        output = output_messages[-1].text
+```
+They only handle:
+1. `AgentRunUpdateEvent` - for streaming content (with `magentic_event_type` metadata)
+2. `WorkflowOutputEvent` - for final output
+**They do NOT emit UI events for `ExecutorCompletedEvent`.**
+### Our Problematic Code
+In `src/orchestrators/advanced.py`:
+```python
+# Line 348-368: We emit UI events for EVERY ExecutorCompletedEvent
+if isinstance(event, ExecutorCompletedEvent):
+    state.iteration += 1
+    comp_event, prog_event = self._handle_completion_event(...)
+    yield comp_event   # <-- WRONG: UI event for internal framework event
+    yield prog_event   # <-- WRONG: More noise
+```
+### Why the Manager Fires a Completion Event
+The workflow execution order:
+1. ReportAgent streams its output (`AgentRunUpdateEvent`)
+2. ReportAgent handler completes → `ExecutorCompletedEvent(reporter)` (we display this)
+3. Manager orchestrator handler completes → `ExecutorCompletedEvent(manager)` (we display this too!)
+4. `WorkflowOutputEvent` (final)
+The Manager is also an executor in the framework. When it finishes coordinating (after ReportAgent returns), it fires its own `ExecutorCompletedEvent`. We're incorrectly emitting UI events for this.
+---
+## Impact
+1. **User Confusion**: Extra "JUDGING: ManagerAgent" events after the report
+2. **UX Noise**: Progress events that don't add value
+3. **Incorrect Semantics**: Manager completions displayed as agent activity
+4. **No Functional Bug**: The workflow completes correctly, just noisy
+---
+## The Fix
+### Stop Emitting UI Events for ExecutorCompletedEvent
+Remove UI event emission for `ExecutorCompletedEvent` entirely. Keep internal state tracking only.
+**Before (buggy):**
+```python
+if isinstance(event, ExecutorCompletedEvent):
+    state.iteration += 1
+    agent_name = getattr(event, "executor_id", "") or "unknown"
+    if REPORTER_AGENT_ID in agent_name.lower():
+        state.reporter_ran = True
+    comp_event, prog_event = self._handle_completion_event(...)
+    yield comp_event   # <-- REMOVE: Emits UI noise
+    yield prog_event   # <-- REMOVE: Emits UI noise
+```
+**After (correct):**
+```python
+if isinstance(event, ExecutorCompletedEvent):
+    # Internal state tracking only - NO UI events
+    agent_name = getattr(event, "executor_id", "") or "unknown"
+    if REPORTER_AGENT_ID in agent_name.lower():
+        state.reporter_ran = True
+    state.current_message_buffer = ""
+    continue  # Skip to next event - do not yield anything
+```
+**Key changes:**
+1. Remove `yield comp_event` and `yield prog_event`
+2. Remove `state.iteration += 1` (iteration counter becomes meaningless without UI events)
+3. Keep `state.reporter_ran` tracking (needed for fallback synthesis logic)
+4. Add `continue` to skip to next event
+**Why this is correct:**
+- Aligns with MS framework design (their sample ignores `ExecutorCompletedEvent`)
+- Eliminates all completion noise including trailing "ManagerAgent" events
+- The streaming events (`AgentRunUpdateEvent`) already provide real-time feedback
+- `WorkflowOutputEvent` signals completion
+### Additional Fix: Add Metadata Filtering to AgentRunUpdateEvent
+The senior review identified a gap: we're not filtering `AgentRunUpdateEvent` by `magentic_event_type`.
+**Current (incomplete):**
+```python
+if isinstance(event, AgentRunUpdateEvent):
+    if event.data and hasattr(event.data, "text") and event.data.text:
+        yield AgentEvent(type="streaming", message=event.data.text)
+```
+**Should be:**
+```python
+if isinstance(event, AgentRunUpdateEvent):
+    if event.data and hasattr(event.data, "text") and event.data.text:
+        # Check metadata to filter internal orchestrator messages
+        props = getattr(event.data, "additional_properties", None) or {}
+        event_type = props.get("magentic_event_type")
+        msg_kind = props.get("orchestrator_message_kind")
+        # Filter out internal orchestrator messages (task_ledger, instruction)
+        if event_type == MAGENTIC_EVENT_TYPE_ORCHESTRATOR:
+            if msg_kind in ("task_ledger", "instruction"):
+                continue  # Skip internal coordination messages
+        yield AgentEvent(type="streaming", message=event.data.text)
+```
+**Why this matters:**
+- Prevents internal JSON blobs from being displayed
+- Filters out raw planning/instruction prompts not meant for users
+- Aligns with how MS sample consumes events
+---
+## Related Code Locations
+- `src/orchestrators/advanced.py` line 348-368: ExecutorCompletedEvent handling
+- `src/orchestrators/advanced.py` line 437-469: `_handle_completion_event` method
+- MS Framework: `python/packages/core/agent_framework/_workflows/_executor.py` line 277-281
+- MS Framework: `python/packages/core/agent_framework/_workflows/_magentic.py` line 1962-1976
+---
+## Related Issues
+- P2 Round Counter Semantic Mismatch (FIXED) - Changed display from "Round X/Y" to "Step N"
+- This bug explains why step count was confusing - we count internal events too
+---
+## Framework Event Architecture Deep Dive
+### Event Categories in MS Agent Framework
+The framework has distinct event categories with different purposes:
+#### 1. Workflow Lifecycle Events (Framework-emitted, internal)
+| Event | Purpose | UI Relevant? |
+|-------|---------|--------------|
+| `WorkflowStartedEvent` | Run begins | No |
+| `WorkflowStatusEvent` | State transitions (IN_PROGRESS, IDLE, FAILED) | No |
+| `WorkflowFailedEvent` | Error with structured details | Maybe (errors) |
+#### 2. Superstep Events (Framework-emitted, internal)
+| Event | Purpose | UI Relevant? |
+|-------|---------|--------------|
+| `SuperStepStartedEvent` | Pregel superstep begins | No |
+| `SuperStepCompletedEvent` | Pregel superstep ends | No |
+#### 3. Executor Events (Framework-emitted automatically, internal)
+| Event | Purpose | UI Relevant? |
+|-------|---------|--------------|
+| `ExecutorInvokedEvent` | Handler starts | No |
+| `ExecutorCompletedEvent` | Handler completes | **NO** |
+| `ExecutorFailedEvent` | Handler errors | Maybe (errors) |
+#### 4. Application Events (User-code emitted via ctx.add_event, UI-facing)
+| Event | Purpose | UI Relevant? |
+|-------|---------|--------------|
+| `AgentRunUpdateEvent` | Streaming content | **YES** |
+| `AgentRunEvent` | Complete agent response | Yes |
+| `WorkflowOutputEvent` | Final workflow output | **YES** |
+| `RequestInfoEvent` | HITL request | Yes |
+### Metadata Pattern in AgentRunUpdateEvent
+The MS framework uses `additional_properties` in `AgentRunUpdateEvent.data` for classification:
+```python
+# Orchestrator message
+additional_properties={
+    "magentic_event_type": "orchestrator_message",
+    "orchestrator_message_kind": "user_task" | "task_ledger" | "instruction" | "notice",
+    "orchestrator_id": "...",
+}
+# Agent streaming
+additional_properties={
+    "magentic_event_type": "agent_delta",
+    "agent_id": "searcher" | "judge" | ...,
+}
+```
+### What We Should Handle for UI
+1. **`AgentRunUpdateEvent`** with metadata filtering:
+   - `magentic_event_type: "agent_delta"` → Display agent streaming
+   - `magentic_event_type: "orchestrator_message"` → Filter by `orchestrator_message_kind`:
+     - `"user_task"` → Show (task assignment)
+     - `"instruction"` → Filter out (internal)
+     - `"task_ledger"` → Filter out (internal)
+     - `"notice"` → Maybe show (warnings)
+2. **`WorkflowOutputEvent`** → Final output
+### What We Should NOT Handle for UI
+- `ExecutorCompletedEvent` - Internal bookkeeping
+- `ExecutorInvokedEvent` - Internal bookkeeping
+- `SuperStepStartedEvent/CompletedEvent` - Internal iteration
+- `WorkflowStatusEvent` - Internal state machine
+---
+## Required Import Changes
+**Current imports:**
+```python
+from agent_framework import (
+    MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
+    AgentRunUpdateEvent,
+    ExecutorCompletedEvent,  # Keep for internal tracking
+    MagenticBuilder,
+    WorkflowOutputEvent,
+)
+```
+**Add these imports for metadata filtering:**
+```python
+from agent_framework import (
+    MAGENTIC_EVENT_TYPE_AGENT_DELTA,  # For agent streaming detection
+    ORCH_MSG_KIND_INSTRUCTION,         # Filter internal messages
+    ORCH_MSG_KIND_TASK_LEDGER,         # Filter internal messages
+)
+```
+---
+## Test Cases
+```python
+def test_no_executor_completed_events_in_ui():
+    """UI should not emit any events from ExecutorCompletedEvent."""
+    # Run workflow to completion
+    # Collect all yielded AgentEvent objects
+    # Assert NONE have type "progress" with "task completed" message
+    # Assert NONE have type matching completion patterns
+    pass
+def test_internal_messages_filtered_from_streaming():
+    """Internal orchestrator messages should be filtered from UI stream."""
+    # Run workflow and collect all yielded events
+    # Assert no events contain "task_ledger" content
+    # Assert no events contain raw instruction prompts
+    # Assert no JSON blobs in streaming output
+    pass
+def test_reporter_ran_tracking_still_works():
+    """Internal state.reporter_ran should still be set correctly."""
+    # Run workflow to completion
+    # Verify fallback synthesis is NOT triggered (reporter did run)
+    # This ensures we didn't break internal tracking when removing UI events
+    pass
+```
+---
+## Why the Free Tier "Works"
+The user asked why the free tier seems to work despite expectations. The answer:
+1. **The framework handles orchestration** - The MS Agent Framework manages the workflow (planning, progress tracking, agent coordination)
+2. **The LLM just provides reasoning** - The model generates text, but the framework decides when to delegate, when to stop, etc.
+3. **The "bugs" are in our UI layer** - The orchestration works correctly; we're just displaying internal events
+The free tier works because:
+- `MagenticBuilder` creates the workflow graph
+- `StandardMagenticManager` handles planning and progress evaluation
+- The framework routes messages between executors
+- The LLM quality affects answer quality, not workflow execution
+Our UI noise (trailing events) is a bug in how we consume framework events, not a framework bug.

src/orchestrators/advanced.py CHANGED Viewed

@@ -18,11 +18,13 @@ Design Patterns:
 import asyncio
 from collections.abc import AsyncGenerator
 from dataclasses import dataclass
-from typing import TYPE_CHECKING, Any, Literal
 import structlog
 from agent_framework import (
     MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
     AgentRunUpdateEvent,
     ChatAgent,
     ExecutorCompletedEvent,
@@ -327,6 +329,16 @@ The final output should be a structured research report."""
                 async for event in workflow.run_stream(task):
                     # 1. Handle Streaming (Source of Truth for Content)
                     if isinstance(event, AgentRunUpdateEvent) and event.data:
                         author = getattr(event.data, "author_name", None)
                         # Detect agent switch to clear buffer
                         if author != state.current_agent_id:
@@ -346,21 +358,12 @@ The final output should be a structured research report."""
                     # 2. Handle Completion Signal
                     if isinstance(event, ExecutorCompletedEvent):
-                        state.iteration += 1
                         # P1 FIX: Track if ReportAgent produced output
-                        # Note: ExecutorCompletedEvent might not have agent_id directly accessible
-                        # The executor_id usually maps to the agent name
                         agent_name = getattr(event, "executor_id", "") or "unknown"
                         if REPORTER_AGENT_ID in agent_name.lower():
                             state.reporter_ran = True
-                        comp_event, prog_event = self._handle_completion_event(
-                            event, state.current_message_buffer, state.iteration
-                        )
-                        yield comp_event
-                        yield prog_event
                         # P2 BUG FIX: Save length before clearing
                         state.last_streamed_length = len(state.current_message_buffer)
                         # Clear buffer after consuming
@@ -434,40 +437,6 @@ The final output should be a structured research report."""
                 iteration=state.iteration,
             )
-    def _handle_completion_event(
-        self,
-        event: ExecutorCompletedEvent,
-        buffer: str,
-        iteration: int,
-    ) -> tuple[AgentEvent, AgentEvent]:
-        """Handle an agent completion event using the accumulated buffer."""
-        # Use buffer if available, otherwise fall back cautiously
-        # (Only fall back if buffer empty, which implies tool-only turn)
-        text_content = buffer
-        if not text_content:
-            # ExecutorCompletedEvent doesn't carry the message directly in the same way
-            # Try extraction but ignore repr strings AND empty strings
-            # The result is often in event.result or similar, but buffering is safer
-            text_content = "Action completed (Tool Call)"
-        agent_id = getattr(event, "executor_id", "unknown") or "unknown"
-        event_type = self._get_event_type_for_agent(agent_id)
-        semantic_name = self._get_agent_semantic_name(agent_id)
-        completion_event = AgentEvent(
-            type=event_type,
-            message=f"{semantic_name}: {self._smart_truncate(text_content)}",
-            iteration=iteration,
-        )
-        progress_event = AgentEvent(
-            type="progress",
-            message=f"Step {iteration}: {semantic_name} task completed",
-            iteration=iteration,
-        )
-        return completion_event, progress_event
     def _handle_final_event(
         self,
         event: WorkflowOutputEvent,
@@ -549,29 +518,6 @@ The final output should be a structured research report."""
         # The repr is useless for display purposes
         return ""
-    def _get_event_type_for_agent(
-        self,
-        agent_name: str,
-    ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
-        """Map agent name to appropriate event type.
-        Args:
-            agent_name: The agent ID from the workflow event
-        Returns:
-            Event type string matching AgentEvent.type Literal
-        """
-        agent_lower = agent_name.lower()
-        if SEARCHER_AGENT_ID in agent_lower:
-            return "search_complete"
-        if JUDGE_AGENT_ID in agent_lower:
-            return "judge_complete"
-        if HYPOTHESIZER_AGENT_ID in agent_lower:
-            return "hypothesizing"
-        if REPORTER_AGENT_ID in agent_lower:
-            return "synthesizing"
-        return "judging"  # Default for unknown agents
     def _smart_truncate(self, text: str, max_len: int = 200) -> str:
         """Truncate at sentence boundary to avoid cutting words."""
         if len(text) <= max_len:

 import asyncio
 from collections.abc import AsyncGenerator
 from dataclasses import dataclass
+from typing import TYPE_CHECKING, Any
 import structlog
 from agent_framework import (
     MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
+    ORCH_MSG_KIND_INSTRUCTION,
+    ORCH_MSG_KIND_TASK_LEDGER,
     AgentRunUpdateEvent,
     ChatAgent,
     ExecutorCompletedEvent,
                 async for event in workflow.run_stream(task):
                     # 1. Handle Streaming (Source of Truth for Content)
                     if isinstance(event, AgentRunUpdateEvent) and event.data:
+                        # Check metadata to filter internal orchestrator messages
+                        props = getattr(event.data, "additional_properties", None) or {}
+                        event_type = props.get("magentic_event_type")
+                        msg_kind = props.get("orchestrator_message_kind")
+                        # Filter out internal orchestrator messages (task_ledger, instruction)
+                        if event_type == MAGENTIC_EVENT_TYPE_ORCHESTRATOR:
+                            if msg_kind in (ORCH_MSG_KIND_TASK_LEDGER, ORCH_MSG_KIND_INSTRUCTION):
+                                continue  # Skip internal coordination messages
                         author = getattr(event.data, "author_name", None)
                         # Detect agent switch to clear buffer
                         if author != state.current_agent_id:
                     # 2. Handle Completion Signal
                     if isinstance(event, ExecutorCompletedEvent):
+                        # Internal state tracking only - NO UI events
                         # P1 FIX: Track if ReportAgent produced output
                         agent_name = getattr(event, "executor_id", "") or "unknown"
                         if REPORTER_AGENT_ID in agent_name.lower():
                             state.reporter_ran = True
                         # P2 BUG FIX: Save length before clearing
                         state.last_streamed_length = len(state.current_message_buffer)
                         # Clear buffer after consuming
                 iteration=state.iteration,
             )
     def _handle_final_event(
         self,
         event: WorkflowOutputEvent,
         # The repr is useless for display purposes
         return ""
     def _smart_truncate(self, text: str, max_len: int = 200) -> str:
         """Truncate at sentence boundary to avoid cutting words."""
         if len(text) <= max_len:

tests/unit/orchestrators/test_accumulator_pattern.py CHANGED Viewed

@@ -90,6 +90,9 @@ def mock_agent_framework():
     mock_af.MagenticOrchestratorMessageEvent = MockOrchestratorMessageEvent
     mock_af.AgentRunResponse = MagicMock
     mock_af.MAGENTIC_EVENT_TYPE_ORCHESTRATOR = "orchestrator_message"
     # Mock other classes
     mock_af.MagenticBuilder = MagicMock
@@ -170,9 +173,9 @@ def mock_orchestrator(mock_agent_framework):
 @pytest.mark.asyncio
 async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
     """
-    Scenario A: Standard Text Message
     Input: Updates ("Hello", " World") -> Completed
-    Expected: AgentEvent with "Hello World"
     """
     # Use "searcher" to map to "SearchAgent"
     events = [
@@ -193,27 +196,32 @@ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
-    # Find the completion event for SearchAgent (non-streaming)
-    chat_events = [
-        e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
-    ]
-    assert len(chat_events) >= 1, (
-        f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
     )
-    final_event = chat_events[0]
-    # Must contain accumulated text
-    assert "Hello World" in final_event.message or "Hello" in final_event.message
 @pytest.mark.unit
 @pytest.mark.asyncio
 async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
     """
-    Scenario B: Tool Call (No Text Deltas)
     Input: No Deltas -> Completed
-    Expected: AgentEvent with fallback text
     """
     # Use "searcher" to map to "SearchAgent"
     events = [
@@ -232,26 +240,27 @@ async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
-    # Find completion events for SearchAgent
     search_events = [
-        e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
     ]
-    assert len(search_events) >= 1, (
-        f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
     )
-    final_event = search_events[0]
-    # Should contain fallback or tool indicator
-    assert "Action completed" in final_event.message or "Tool" in final_event.message
 @pytest.mark.unit
 @pytest.mark.asyncio
 async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
     """
-    Verify buffer clears between agents.
-    Agent B should NOT inherit Agent A's accumulated text.
     """
     # Use "searcher" (SearchAgent) and "judge" (JudgeAgent)
     events = [
@@ -273,24 +282,22 @@ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
-    # Find non-streaming events for each agent
-    agent_a_events = [
-        e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
-    ]
-    agent_b_events = [
-        e for e in generated_events if "JudgeAgent" in str(e.message) and e.type != "streaming"
-    ]
-    # Both should have completion events
-    assert len(agent_a_events) >= 1, (
-        f"No SearchAgent events: {[e.message for e in generated_events]}"
-    )
-    assert len(agent_b_events) >= 1, (
-        f"No JudgeAgent events: {[e.message for e in generated_events]}"
     )
-    # Agent A should have its own text
-    assert "Searcher" in agent_a_events[0].message
-    # Agent B should have its own text, NOT Agent A's
-    assert "Judge" in agent_b_events[0].message
-    assert "Searcher" not in agent_b_events[0].message, "Buffer not cleared between agents!"

     mock_af.MagenticOrchestratorMessageEvent = MockOrchestratorMessageEvent
     mock_af.AgentRunResponse = MagicMock
     mock_af.MAGENTIC_EVENT_TYPE_ORCHESTRATOR = "orchestrator_message"
+    # P2 Fix: Add constants for metadata filtering
+    mock_af.ORCH_MSG_KIND_INSTRUCTION = "instruction"
+    mock_af.ORCH_MSG_KIND_TASK_LEDGER = "task_ledger"
     # Mock other classes
     mock_af.MagenticBuilder = MagicMock
 @pytest.mark.asyncio
 async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
     """
+    Scenario A: Standard Text Message (P2 Fix)
     Input: Updates ("Hello", " World") -> Completed
+    Expected: Streaming events for text, NO completion events (P2 fix silences them)
     """
     # Use "searcher" to map to "SearchAgent"
     events = [
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
+    # P2 FIX: ExecutorCompletedEvent is SILENCED - no non-streaming agent events
+    # We should have STREAMING events from AgentRunUpdateEvent
+    streaming_events = [e for e in generated_events if e.type == "streaming"]
+    assert len(streaming_events) >= 1, (
+        f"Expected streaming events, got: {[e.type for e in generated_events]}"
     )
+    # P2 FIX: No "SearchAgent" completion events should exist (silenced)
+    completion_events = [
+        e
+        for e in generated_events
+        if "SearchAgent" in str(e.message)
+        and e.type not in ("streaming", "started", "progress", "thinking")
+    ]
+    assert len(completion_events) == 0, (
+        f"P2 Fix: Should NOT emit completion events, got: {[e.message for e in completion_events]}"
+    )
 @pytest.mark.unit
 @pytest.mark.asyncio
 async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
     """
+    Scenario B: Tool Call (No Text Deltas) - P2 Fix
     Input: No Deltas -> Completed
+    Expected: NO completion events (P2 fix silences ExecutorCompletedEvent)
     """
     # Use "searcher" to map to "SearchAgent"
     events = [
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
+    # P2 FIX: ExecutorCompletedEvent is SILENCED - no agent completion events
     search_events = [
+        e
+        for e in generated_events
+        if "SearchAgent" in str(e.message)
+        and e.type not in ("streaming", "started", "progress", "thinking")
     ]
+    # P2 Fix: Should have NO completion events (they are silenced)
+    assert len(search_events) == 0, (
+        f"P2 Fix: Should NOT emit completion events, got: {[e.message for e in search_events]}"
     )
 @pytest.mark.unit
 @pytest.mark.asyncio
 async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
     """
+    Verify buffer clears between agents (P2 Fix).
+    P2 Fix: ExecutorCompletedEvent is silenced, so we verify via streaming events.
+    Agent B's streaming should NOT contain Agent A's text.
     """
     # Use "searcher" (SearchAgent) and "judge" (JudgeAgent)
     events = [
         async for event in mock_orchestrator.run("test query"):
             generated_events.append(event)
+    # P2 FIX: ExecutorCompletedEvent is SILENCED
+    # Verify via STREAMING events - each agent's stream is separate
+    streaming_events = [e for e in generated_events if e.type == "streaming"]
+    # Should have streaming events from both agents
+    assert len(streaming_events) >= 2, (
+        f"Expected streaming events, got: {[e.type for e in generated_events]}"
     )
+    # Verify content separation - each streaming event has its own content
+    searcher_streams = [e for e in streaming_events if "Searcher" in e.message]
+    judge_streams = [e for e in streaming_events if "Judge" in e.message]
+    assert len(searcher_streams) >= 1, "Missing searcher streaming events"
+    assert len(judge_streams) >= 1, "Missing judge streaming events"
+    # Buffer isolation: Judge stream should NOT contain Searcher text
+    for judge_event in judge_streams:
+        assert "Searcher" not in judge_event.message, "Buffer not cleared between agents!"

tests/unit/test_orchestrator_noise.py ADDED Viewed

	@@ -0,0 +1,113 @@

+from unittest.mock import MagicMock
+import pytest
+from agent_framework import (
+    MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
+    ORCH_MSG_KIND_INSTRUCTION,
+    ORCH_MSG_KIND_TASK_LEDGER,
+    AgentRunUpdateEvent,
+    ExecutorCompletedEvent,
+)
+from src.orchestrators.advanced import REPORTER_AGENT_ID, AdvancedOrchestrator
+@pytest.mark.asyncio
+async def test_executor_completed_event_is_silenced():
+    """Verify ExecutorCompletedEvent produces NO UI events."""
+    orchestrator = AdvancedOrchestrator()
+    # Mock the workflow build to return our custom event stream
+    mock_workflow = MagicMock()
+    # Create a stream of events: Start -> ExecutorCompleted -> End
+    async def event_stream(task):
+        # 1. Completion event (should be ignored)
+        yield ExecutorCompletedEvent(executor_id="ManagerAgent", data=None)
+        # 2. Reporter completion (should set flag but yield nothing)
+        yield ExecutorCompletedEvent(executor_id=REPORTER_AGENT_ID, data=None)
+    mock_workflow.run_stream = event_stream
+    orchestrator._build_workflow = MagicMock(return_value=mock_workflow)
+    # Mock init services to avoid side effects
+    async def mock_init_events(query):
+        if False:
+            yield
+    orchestrator._init_workflow_events = mock_init_events
+    orchestrator._init_embedding_service = MagicMock(return_value=None)
+    orchestrator._create_task_prompt = MagicMock(return_value="task")
+    events = []
+    async for event in orchestrator.run("query"):
+        events.append(event)
+    # Assertions
+    # We should have NO "progress" events with "task completed" message
+    for event in events:
+        if event.type == "progress":
+            assert "task completed" not in event.message
+        # We should have NO "judging" events from the manager completion
+        if event.type == "judging":
+            assert "ManagerAgent" not in event.message
+@pytest.mark.asyncio
+async def test_internal_messages_are_filtered():
+    """Verify internal task_ledger/instruction messages are filtered."""
+    orchestrator = AdvancedOrchestrator()
+    mock_workflow = MagicMock()
+    async def event_stream(task):
+        # 1. Task Ledger (Should be skipped)
+        ledger_update = AgentRunUpdateEvent(executor_id="Manager", data=MagicMock())
+        ledger_update.data.text = '{"some": "json"}'
+        ledger_update.data.additional_properties = {
+            "magentic_event_type": MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
+            "orchestrator_message_kind": ORCH_MSG_KIND_TASK_LEDGER,
+        }
+        yield ledger_update
+        # 2. Instruction (Should be skipped)
+        instruction = AgentRunUpdateEvent(executor_id="Manager", data=MagicMock())
+        instruction.data.text = "Internal instruction to agent"
+        instruction.data.additional_properties = {
+            "magentic_event_type": MAGENTIC_EVENT_TYPE_ORCHESTRATOR,
+            "orchestrator_message_kind": ORCH_MSG_KIND_INSTRUCTION,
+        }
+        yield instruction
+        # 3. Normal agent message (SHOULD pass through)
+        # The streaming block filters task_ledger/instruction but passes agent content.
+        normal_msg = AgentRunUpdateEvent(executor_id="Searcher", data=MagicMock())
+        normal_msg.data.text = "I found something"
+        normal_msg.data.author_name = "Searcher"
+        normal_msg.data.additional_properties = {}
+        yield normal_msg
+    mock_workflow.run_stream = event_stream
+    orchestrator._build_workflow = MagicMock(return_value=mock_workflow)
+    async def mock_init_events(query):
+        if False:
+            yield
+    orchestrator._init_workflow_events = mock_init_events
+    orchestrator._init_embedding_service = MagicMock(return_value=None)
+    events = []
+    async for event in orchestrator.run("query"):
+        events.append(event)
+    # Assertions
+    # 1. Verify we got the normal message
+    streaming_messages = [e.message for e in events if e.type == "streaming"]
+    assert "I found something" in streaming_messages
+    # 2. Verify we did NOT get the internal messages
+    all_messages = [e.message for e in events]
+    # The JSON from task_ledger should be filtered
+    assert not any('{"some": "json"}' in msg for msg in all_messages)
+    # The instruction text should be filtered
+    assert not any("Internal instruction to agent" in msg for msg in all_messages)