VibecoderMcSwaggins commited on
Commit
efd0997
·
unverified ·
1 Parent(s): 02a3c53

fix(orchestrator): P2 Round Counter Semantic Mismatch - Semantic Progress Tracking (#132)

Browse files

* docs: Add P2 bug doc for round counter semantic mismatch

Discovered during SPEC-18 testing that progress shows "Round 11/5"
which is confusing. Root cause: iteration counts agent completions
(ExecutorCompletedEvent) but display treats it as workflow rounds.

Additional issues found:
- Dead code: _get_progress_message method is defined but never used
- Hardcoded 45 instead of self._EST_SECONDS_PER_ROUND constant
- Time estimate becomes useless ("~0s") once iteration exceeds max_rounds

* docs: Add senior review findings to P2 round counter bug

External review confirmed our analysis and added nuances:
- Manager agent also fires ExecutorCompletedEvent (explains 11 events)
- Time estimation is doubly flawed (wrong unit + wrong calibration)
- API discovery: ORCH_MSG_KIND_USER_TASK could track actual rounds

Review status: CONFIRMED - Ready for implementation

* refactor(orchestrator): implement semantic progress tracking

- Remove misleading 'Round X/Y' counter and time estimates
- Remove dead code (_get_progress_message, _EST_SECONDS_PER_ROUND)
- Implement semantic agent naming (e.g., 'reporter' -> 'ReportAgent')
- Update progress events to show 'Step N: AgentName task completed'
- Update tests to use valid domain agent IDs
- Fix P2_ROUND_COUNTER_SEMANTIC_MISMATCH

* docs: Mark P2 round counter bug as FIXED

Add Resolution section documenting the semantic progress tracking
implementation that replaced the broken "Round X/Y" display with
honest "Step N: AgentName task completed" format.

* style: Address CodeRabbit nitpicks

- Add language specifier to markdown code blocks (MD040)
- Remove duplicate horizontal rule separator
- Use structured logging in fallback synthesis error handler
- Use _smart_truncate for completion messages (avoids unnecessary ellipsis)

* style: Fix markdown lint (MD031/MD032 blank lines)

* fix(deps): Update urllib3 to 2.6.0 for security fixes

- GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
- Fix CompiledStateGraph type annotation (langgraph API change)

* fix(types): Ignore CompiledStateGraph type-arg for cross-version compat

docs/bugs/P2_ROUND_COUNTER_SEMANTIC_MISMATCH.md ADDED
@@ -0,0 +1,321 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P2 Bug: Round Counter Semantic Mismatch
2
+
3
+ **Status**: ✅ FIXED
4
+ **Discovered**: 2025-12-05
5
+ **Fixed**: 2025-12-05
6
+ **Severity**: P2 (Display bug, confusing UX but not blocking)
7
+ **Component**: `src/orchestrators/advanced.py`
8
+ **Commit**: `40ca236c refactor(orchestrator): implement semantic progress tracking`
9
+
10
+ ---
11
+
12
+ ## Symptom
13
+
14
+ Progress display shows impossible values like "Round 11/5":
15
+
16
+ ```text
17
+ ⏱️ **PROGRESS**: Round 11/5 (~0s remaining)
18
+ ```
19
+
20
+ This is confusing to users - how can we be on round 11 when max is 5?
21
+
22
+ ---
23
+
24
+ ## Root Cause Analysis
25
+
26
+ ### The Semantic Mismatch
27
+
28
+ Two different concepts are being conflated:
29
+
30
+ | Concept | What It Means | Variable |
31
+ |---------|---------------|----------|
32
+ | **Workflow Round** | One orchestration cycle where manager delegates to agents | `self._max_rounds` (5) |
33
+ | **Agent Completion** | One agent finishes its task | `state.iteration` (incremented on each `ExecutorCompletedEvent`) |
34
+
35
+ ### The Bug
36
+
37
+ ```python
38
+ # Line 348: Increments on EVERY agent completion
39
+ if isinstance(event, ExecutorCompletedEvent):
40
+ state.iteration += 1
41
+
42
+ # Line 467: Displays as if it's a workflow round
43
+ message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
44
+ ```
45
+
46
+ ### Why It Happens
47
+
48
+ In a multi-agent workflow with 4 agents (searcher, hypothesizer, judge, reporter):
49
+
50
+ - Each "round" involves the manager delegating to multiple agents
51
+ - Each agent completion fires an `ExecutorCompletedEvent`
52
+ - With 4+ agents, we see 4+ events per workflow round
53
+
54
+ **Math**: 5 workflow rounds × 4 agents = 20+ agent completions, displayed as "Round 20/5"
55
+
56
+ ---
57
+
58
+ ## Evidence From Logs
59
+
60
+ The session showed this progression:
61
+
62
+ ```text
63
+ Round 1/5 - First agent completed
64
+ Round 2/5 - Second agent completed
65
+ Round 3/5 - Third agent completed
66
+ Round 4/5 - Fourth agent completed
67
+ Round 5/5 - Fifth agent completed (still in workflow round 1!)
68
+ Round 6/5 - Now exceeds max (workflow round 2 starting)
69
+ ...
70
+ Round 11/5 - Multiple workflow rounds have passed
71
+ ```
72
+
73
+ ---
74
+
75
+ ## Impact
76
+
77
+ 1. **User Confusion**: "Round 11/5" makes no sense
78
+ 2. **Time Estimation Wrong**: `rounds_remaining = max(5 - 11, 0) = 0` → always shows "~0s remaining"
79
+ 3. **No Actual Bug in Logic**: The workflow still runs correctly, just the display is wrong
80
+
81
+ ---
82
+
83
+ ## Proposed Fixes
84
+
85
+ ### Option A: Rename to "Agent Step" (Quick Fix)
86
+
87
+ Change the display to reflect what we're actually counting:
88
+
89
+ ```python
90
+ # Before
91
+ message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
92
+
93
+ # After
94
+ message=f"Agent step {iteration} (Round limit: {self._max_rounds})"
95
+ ```
96
+
97
+ **Pros**: Accurate, minimal code change
98
+ **Cons**: Still doesn't track actual workflow rounds
99
+
100
+ ### Option B: Track Actual Workflow Rounds (Proper Fix)
101
+
102
+ Track workflow rounds separately from agent completions:
103
+
104
+ ```python
105
+ @dataclass
106
+ class WorkflowState:
107
+ iteration: int = 0 # Agent completions (for internal tracking)
108
+ workflow_round: int = 0 # Actual orchestration rounds
109
+ current_message_buffer: str = ""
110
+ # ...
111
+
112
+ # Increment workflow_round when manager delegates (different event type)
113
+ # Display workflow_round in progress messages
114
+ ```
115
+
116
+ **Pros**: Semantically correct, accurate time estimates
117
+ **Cons**: Requires understanding which event signals a new round
118
+
119
+ ### Option C: Use Estimated Agent Count (Compromise)
120
+
121
+ Estimate agents per round and display accordingly:
122
+
123
+ ```python
124
+ AGENTS_PER_ROUND = 4 # searcher, hypothesizer, judge, reporter
125
+ estimated_round = (iteration // AGENTS_PER_ROUND) + 1
126
+ message=f"Round ~{estimated_round}/{self._max_rounds}"
127
+ ```
128
+
129
+ **Pros**: Roughly accurate, no API research needed
130
+ **Cons**: Estimation may be off if some agents are skipped
131
+
132
+ ---
133
+
134
+ ## Recommendation
135
+
136
+ **Short-term**: Apply Option A (rename to "Agent step") - fixes the confusion immediately
137
+
138
+ **Long-term**: Investigate Option B - determine which event signals a new workflow round in Microsoft Agent Framework
139
+
140
+ ---
141
+
142
+ ## Related Code
143
+
144
+ ```python
145
+ # src/orchestrators/advanced.py
146
+
147
+ # Line 348: Where iteration is incremented
148
+ if isinstance(event, ExecutorCompletedEvent):
149
+ state.iteration += 1
150
+
151
+ # Line 459-467: Where progress message is generated
152
+ rounds_remaining = max(self._max_rounds - iteration, 0)
153
+ est_seconds = rounds_remaining * 45
154
+ progress_event = AgentEvent(
155
+ type="progress",
156
+ message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
157
+ iteration=iteration,
158
+ )
159
+ ```
160
+
161
+ ---
162
+
163
+ ## Test Case
164
+
165
+ ```python
166
+ def test_progress_display_never_exceeds_max_rounds():
167
+ """Progress should show Round X/Y where X <= Y."""
168
+ # Simulate 20 agent completions across 5 workflow rounds
169
+ # Assert displayed round never exceeds max_rounds
170
+ pass
171
+ ```
172
+
173
+ ---
174
+
175
+ ## Additional Issues Found During Analysis
176
+
177
+ ### Issue 2: Dead Code - Unused `_get_progress_message` Method
178
+
179
+ ```python
180
+ # Line 196-205: Method is defined but NEVER called
181
+ def _get_progress_message(self, iteration: int) -> str:
182
+ """Generate progress message with time estimation."""
183
+ # ... logic duplicated in _handle_completion_event
184
+ ```
185
+
186
+ The same logic is duplicated inline in `_handle_completion_event` (lines 458-469).
187
+
188
+ **Fix**: Either use the method or delete it.
189
+
190
+ ### Issue 3: Hardcoded Constant
191
+
192
+ ```python
193
+ # Line 87: Class constant defined
194
+ _EST_SECONDS_PER_ROUND: int = 45
195
+
196
+ # Line 199: Uses constant (correct)
197
+ est_seconds = rounds_remaining * self._EST_SECONDS_PER_ROUND
198
+
199
+ # Line 460: Uses hardcoded 45 (inconsistent)
200
+ est_seconds = rounds_remaining * 45
201
+ ```
202
+
203
+ **Fix**: Use `self._EST_SECONDS_PER_ROUND` consistently.
204
+
205
+ ### Issue 4: Time Estimate Always Shows "~0s remaining"
206
+
207
+ Since `iteration` quickly exceeds `max_rounds`:
208
+
209
+ ```python
210
+ rounds_remaining = max(self._max_rounds - iteration, 0)
211
+ # When iteration=11, max_rounds=5: rounds_remaining = max(5-11, 0) = 0
212
+ # est_seconds = 0 * 45 = 0
213
+ # Display: "~0s remaining"
214
+ ```
215
+
216
+ The time estimate becomes useless after the first few agent completions.
217
+
218
+ ---
219
+
220
+ ## Complete Fix Recommendation
221
+
222
+ 1. **Rename display** from "Round X/5" to "Agent step X"
223
+ 2. **Delete dead code** - remove unused `_get_progress_message` method
224
+ 3. **Use constant** - replace hardcoded `45` with `self._EST_SECONDS_PER_ROUND`
225
+ 4. **Fix time estimate** - base it on agent steps, not workflow rounds
226
+
227
+ ---
228
+
229
+ ## Senior Review Findings (2025-12-05)
230
+
231
+ **Reviewer**: External Gemini CLI Agent
232
+ **Status**: CONFIRMED - Analysis accurate and sufficient
233
+
234
+ ### Additional Nuances Identified
235
+
236
+ 1. **Manager Agent Also Fires Events**: The Manager itself is an agent. If `ExecutorCompletedEvent` fires for Manager's turn completion PLUS sub-agents' completions, the count accelerates 2-3x faster per logical round. This explains why we saw 11 events for ~2-3 workflow rounds.
237
+
238
+ 2. **Time Estimation Doubly Flawed**:
239
+ - Not just bottoming out at 0
240
+ - `_EST_SECONDS_PER_ROUND` (45s) is calibrated for a FULL workflow round, not a single agent step
241
+ - If we counted agent steps correctly: 10 steps × 45s = 450s (way overestimated)
242
+ - A full round of 4 agents might only take 60s total
243
+
244
+ 3. **API Discovery - Can Track Actual Rounds**:
245
+
246
+ ```python
247
+ # These constants exist in agent_framework:
248
+ ORCH_MSG_KIND_INSTRUCTION = 'instruction'
249
+ ORCH_MSG_KIND_USER_TASK = 'user_task'
250
+ ORCH_MSG_KIND_TASK_LEDGER = 'task_ledger'
251
+ ORCH_MSG_KIND_NOTICE = 'notice'
252
+ ```
253
+
254
+ Counting `user_task` events from `MagenticOrchestratorMessageEvent` would align iteration with `max_rounds` 1:1, since this signals "Manager is beginning a new evaluation cycle."
255
+
256
+ ### Reviewer Recommendations
257
+
258
+ 1. **Option A (Rename)**: APPROVED - Safest, most honest fix
259
+ 2. **Option B (Track Workflow Rounds)**: DEFER - Requires verifying framework behavior across versions, risks brittleness
260
+ 3. **Remove Denominator**: Display `Agent Step {iteration}` without `/5` to avoid confusion
261
+ 4. **Delete Dead Code**: Confirmed `_get_progress_message` is never called
262
+ 5. **Fix Constants**: Use `self._EST_SECONDS_PER_ROUND` consistently
263
+
264
+ ### Review Status: ✅ PASSED - Ready for Implementation
265
+
266
+ ---
267
+
268
+ ## Resolution (2025-12-05)
269
+
270
+ **Implemented**: Domain-driven semantic progress tracking
271
+
272
+ ### What Was Done
273
+
274
+ 1. **Deleted Dead Code**:
275
+ - Removed unused `_get_progress_message` method
276
+ - Removed unused `_EST_SECONDS_PER_ROUND` constant
277
+
278
+ 2. **Added Semantic Agent Mapping** (`_get_agent_semantic_name`):
279
+
280
+ ```python
281
+ def _get_agent_semantic_name(self, agent_id: str) -> str:
282
+ """Map internal agent ID to user-facing semantic name."""
283
+ name = agent_id.lower()
284
+ if SEARCHER_AGENT_ID in name:
285
+ return "SearchAgent"
286
+ if JUDGE_AGENT_ID in name:
287
+ return "JudgeAgent"
288
+ if HYPOTHESIZER_AGENT_ID in name:
289
+ return "HypothesisAgent"
290
+ if REPORTER_AGENT_ID in name:
291
+ return "ReportAgent"
292
+ return "ManagerAgent"
293
+ ```
294
+
295
+ 3. **Changed Progress Display**:
296
+ - Before: `"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"`
297
+ - After: `"Step {iteration}: {semantic_name} task completed"`
298
+
299
+ 4. **Changed Initial Thinking Message**:
300
+ - Before: `"Multi-agent reasoning in progress (5 rounds max)... Estimated time: 3-5 minutes."`
301
+ - After: `"Multi-agent reasoning in progress (Limit: 5 Manager rounds)... Allocating time for deep research..."`
302
+
303
+ 5. **Updated Tests**: Changed test mocks to use domain-specific agent IDs (`searcher`, `judge`) instead of arbitrary strings.
304
+
305
+ ### Result
306
+
307
+ - Before: `⏱️ **PROGRESS**: Round 11/5 (~0s remaining)` (confusing, broken math)
308
+ - After: `⏱️ **PROGRESS**: Step 11: ReportAgent task completed` (accurate, professional)
309
+
310
+ ### Design Decision
311
+
312
+ Rather than patching the counter display or trying to track "actual workflow rounds" (which requires deep framework integration), we chose **honest reporting**: Show exactly what happened (which agent completed) without making false promises about progress percentages or time estimates.
313
+
314
+ This follows the Clean Code principle: "Don't lie to the user."
315
+
316
+ ---
317
+
318
+ ## References
319
+
320
+ - SPEC-18: Agent Framework Core Upgrade (where ExecutorCompletedEvent was introduced)
321
+ - Microsoft Agent Framework documentation on workflow rounds vs agent executions
pyproject.toml CHANGED
@@ -36,8 +36,8 @@ dependencies = [
36
  "langchain-core>=0.3.21,<1.0",
37
  "langchain-huggingface>=0.1.2,<1.0",
38
  "langgraph-checkpoint-sqlite>=3.0.0,<4.0", # 3.0.0 required for GHSA-wwqv-p2pp-99h5 fix
39
- # Security: Pin urllib3 to fix GHSA-48p4-8xcf-vxj5 and GHSA-pq67-6m6q-mj2v
40
- "urllib3>=2.5.0",
41
  ]
42
 
43
  [project.optional-dependencies]
 
36
  "langchain-core>=0.3.21,<1.0",
37
  "langchain-huggingface>=0.1.2,<1.0",
38
  "langgraph-checkpoint-sqlite>=3.0.0,<4.0", # 3.0.0 required for GHSA-wwqv-p2pp-99h5 fix
39
+ # Security: Pin urllib3 to fix GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
40
+ "urllib3>=2.6.0",
41
  ]
42
 
43
  [project.optional-dependencies]
requirements.txt CHANGED
@@ -42,8 +42,8 @@ langchain-core>=0.3.21,<1.0
42
  langchain-huggingface>=0.1.2,<1.0
43
  langgraph-checkpoint-sqlite>=3.0.0,<4.0
44
 
45
- # Security: Pin urllib3 to fix GHSA-48p4-8xcf-vxj5 and GHSA-pq67-6m6q-mj2v
46
- urllib3>=2.5.0
47
 
48
  # Multi-agent orchestration (Advanced mode) - from [magentic] optional
49
  agent-framework-core==1.0.0b251204
 
42
  langchain-huggingface>=0.1.2,<1.0
43
  langgraph-checkpoint-sqlite>=3.0.0,<4.0
44
 
45
+ # Security: Pin urllib3 to fix GHSA-gm62-xv2j-4w53 and GHSA-2xpw-w6gg-jr37
46
+ urllib3>=2.6.0
47
 
48
  # Multi-agent orchestration (Advanced mode) - from [magentic] optional
49
  agent-framework-core==1.0.0b251204
src/agents/graph/workflow.py CHANGED
@@ -25,7 +25,7 @@ def create_research_graph(
25
  llm: BaseChatModel | None = None,
26
  checkpointer: BaseCheckpointSaver[Any] | None = None,
27
  embedding_service: EmbeddingServiceProtocol | None = None,
28
- ) -> CompiledStateGraph[Any, Any, Any, Any]:
29
  """Build the research state graph.
30
 
31
  Args:
 
25
  llm: BaseChatModel | None = None,
26
  checkpointer: BaseCheckpointSaver[Any] | None = None,
27
  embedding_service: EmbeddingServiceProtocol | None = None,
28
+ ) -> CompiledStateGraph[Any]: # type: ignore[type-arg]
29
  """Build the research state graph.
30
 
31
  Args:
src/orchestrators/advanced.py CHANGED
@@ -83,9 +83,6 @@ class AdvancedOrchestrator(OrchestratorProtocol):
83
  - Configurable timeouts and round limits
84
  """
85
 
86
- # Estimated seconds per coordination round (for progress UI)
87
- _EST_SECONDS_PER_ROUND: int = 45
88
-
89
  def __init__(
90
  self,
91
  max_rounds: int = 5,
@@ -193,16 +190,18 @@ Focus on:
193
 
194
  The final output should be a structured research report."""
195
 
196
- def _get_progress_message(self, iteration: int) -> str:
197
- """Generate progress message with time estimation."""
198
- rounds_remaining = max(self._max_rounds - iteration, 0)
199
- est_seconds = rounds_remaining * self._EST_SECONDS_PER_ROUND
200
- if est_seconds >= 60:
201
- est_display = f"{est_seconds // 60}m {est_seconds % 60}s"
202
- else:
203
- est_display = f"{est_seconds}s"
204
-
205
- return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
 
 
206
 
207
  async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
208
  """Yield initialization events."""
@@ -219,7 +218,9 @@ The final output should be a structured research report."""
219
  )
220
 
221
  async def _synthesize_fallback(
222
- self, iteration: int, reason: str
 
 
223
  ) -> AsyncGenerator[AgentEvent, None]:
224
  """
225
  Unified fallback synthesis for all termination scenarios.
@@ -263,7 +264,7 @@ The final output should be a structured research report."""
263
  iteration=iteration,
264
  )
265
  except Exception as synth_error:
266
- logger.error(f"{reason} synthesis failed", error=str(synth_error))
267
  yield AgentEvent(
268
  type="complete",
269
  message=f"Research completed. Synthesis failed: {synth_error}",
@@ -272,7 +273,8 @@ The final output should be a structured research report."""
272
  )
273
 
274
  async def run( # noqa: PLR0915 - Complex but necessary for event stream handling
275
- self, query: str
 
276
  ) -> AsyncGenerator[AgentEvent, None]:
277
  """
278
  Run the workflow.
@@ -312,9 +314,8 @@ The final output should be a structured research report."""
312
  yield AgentEvent(
313
  type="thinking",
314
  message=(
315
- f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
316
- f"Estimated time: {self._max_rounds * 45 // 60}-"
317
- f"{self._max_rounds * 60 // 60} minutes."
318
  ),
319
  iteration=0,
320
  )
@@ -434,7 +435,10 @@ The final output should be a structured research report."""
434
  )
435
 
436
  def _handle_completion_event(
437
- self, event: ExecutorCompletedEvent, buffer: str, iteration: int
 
 
 
438
  ) -> tuple[AgentEvent, AgentEvent]:
439
  """Handle an agent completion event using the accumulated buffer."""
440
  # Use buffer if available, otherwise fall back cautiously
@@ -446,25 +450,19 @@ The final output should be a structured research report."""
446
  # The result is often in event.result or similar, but buffering is safer
447
  text_content = "Action completed (Tool Call)"
448
 
449
- agent_name = getattr(event, "executor_id", "unknown") or "unknown"
450
- event_type = self._get_event_type_for_agent(agent_name)
 
451
 
452
  completion_event = AgentEvent(
453
  type=event_type,
454
- message=f"{agent_name}: {text_content[:200]}...",
455
  iteration=iteration,
456
  )
457
 
458
- # Progress update
459
- rounds_remaining = max(self._max_rounds - iteration, 0)
460
- est_seconds = rounds_remaining * 45
461
- est_display = (
462
- f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
463
- )
464
-
465
  progress_event = AgentEvent(
466
  type="progress",
467
- message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
468
  iteration=iteration,
469
  )
470
 
@@ -552,7 +550,8 @@ The final output should be a structured research report."""
552
  return ""
553
 
554
  def _get_event_type_for_agent(
555
- self, agent_name: str
 
556
  ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
557
  """Map agent name to appropriate event type.
558
 
 
83
  - Configurable timeouts and round limits
84
  """
85
 
 
 
 
86
  def __init__(
87
  self,
88
  max_rounds: int = 5,
 
190
 
191
  The final output should be a structured research report."""
192
 
193
+ def _get_agent_semantic_name(self, agent_id: str) -> str:
194
+ """Map internal agent ID to user-facing semantic name."""
195
+ name = agent_id.lower()
196
+ if SEARCHER_AGENT_ID in name:
197
+ return "SearchAgent"
198
+ if JUDGE_AGENT_ID in name:
199
+ return "JudgeAgent"
200
+ if HYPOTHESIZER_AGENT_ID in name:
201
+ return "HypothesisAgent"
202
+ if REPORTER_AGENT_ID in name:
203
+ return "ReportAgent"
204
+ return "ManagerAgent"
205
 
206
  async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
207
  """Yield initialization events."""
 
218
  )
219
 
220
  async def _synthesize_fallback(
221
+ self,
222
+ iteration: int,
223
+ reason: str,
224
  ) -> AsyncGenerator[AgentEvent, None]:
225
  """
226
  Unified fallback synthesis for all termination scenarios.
 
264
  iteration=iteration,
265
  )
266
  except Exception as synth_error:
267
+ logger.error("Fallback synthesis failed", reason=reason, error=str(synth_error))
268
  yield AgentEvent(
269
  type="complete",
270
  message=f"Research completed. Synthesis failed: {synth_error}",
 
273
  )
274
 
275
  async def run( # noqa: PLR0915 - Complex but necessary for event stream handling
276
+ self,
277
+ query: str,
278
  ) -> AsyncGenerator[AgentEvent, None]:
279
  """
280
  Run the workflow.
 
314
  yield AgentEvent(
315
  type="thinking",
316
  message=(
317
+ f"Multi-agent reasoning in progress (Limit: {self._max_rounds} Manager rounds)... "
318
+ "Allocating time for deep research..."
 
319
  ),
320
  iteration=0,
321
  )
 
435
  )
436
 
437
  def _handle_completion_event(
438
+ self,
439
+ event: ExecutorCompletedEvent,
440
+ buffer: str,
441
+ iteration: int,
442
  ) -> tuple[AgentEvent, AgentEvent]:
443
  """Handle an agent completion event using the accumulated buffer."""
444
  # Use buffer if available, otherwise fall back cautiously
 
450
  # The result is often in event.result or similar, but buffering is safer
451
  text_content = "Action completed (Tool Call)"
452
 
453
+ agent_id = getattr(event, "executor_id", "unknown") or "unknown"
454
+ event_type = self._get_event_type_for_agent(agent_id)
455
+ semantic_name = self._get_agent_semantic_name(agent_id)
456
 
457
  completion_event = AgentEvent(
458
  type=event_type,
459
+ message=f"{semantic_name}: {self._smart_truncate(text_content)}",
460
  iteration=iteration,
461
  )
462
 
 
 
 
 
 
 
 
463
  progress_event = AgentEvent(
464
  type="progress",
465
+ message=f"Step {iteration}: {semantic_name} task completed",
466
  iteration=iteration,
467
  )
468
 
 
550
  return ""
551
 
552
  def _get_event_type_for_agent(
553
+ self,
554
+ agent_name: str,
555
  ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
556
  """Map agent name to appropriate event type.
557
 
tests/unit/orchestrators/test_accumulator_pattern.py CHANGED
@@ -174,10 +174,11 @@ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
174
  Input: Updates ("Hello", " World") -> Completed
175
  Expected: AgentEvent with "Hello World"
176
  """
 
177
  events = [
178
- MockAgentRunUpdateEvent("Hello", author_name="ChatBot"),
179
- MockAgentRunUpdateEvent(" World", author_name="ChatBot"),
180
- MockExecutorCompletedEvent(executor_id="ChatBot"),
181
  ]
182
 
183
  async def mock_stream(*args, **kwargs):
@@ -192,13 +193,13 @@ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
192
  async for event in mock_orchestrator.run("test query"):
193
  generated_events.append(event)
194
 
195
- # Find the completion event for ChatBot (non-streaming)
196
  chat_events = [
197
- e for e in generated_events if "ChatBot" in str(e.message) and e.type != "streaming"
198
  ]
199
 
200
  assert len(chat_events) >= 1, (
201
- f"Expected ChatBot events, got: {[e.message for e in generated_events]}"
202
  )
203
  final_event = chat_events[0]
204
 
@@ -214,8 +215,9 @@ async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
214
  Input: No Deltas -> Completed
215
  Expected: AgentEvent with fallback text
216
  """
 
217
  events = [
218
- MockExecutorCompletedEvent(executor_id="SearchAgent"),
219
  ]
220
 
221
  async def mock_stream(*args, **kwargs):
@@ -251,11 +253,12 @@ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
251
  Verify buffer clears between agents.
252
  Agent B should NOT inherit Agent A's accumulated text.
253
  """
 
254
  events = [
255
- MockAgentRunUpdateEvent("Agent A says hi", author_name="AgentA"),
256
- MockExecutorCompletedEvent(executor_id="AgentA"),
257
- MockAgentRunUpdateEvent("Agent B responds", author_name="AgentB"),
258
- MockExecutorCompletedEvent(executor_id="AgentB"),
259
  ]
260
 
261
  async def mock_stream(*args, **kwargs):
@@ -272,18 +275,22 @@ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
272
 
273
  # Find non-streaming events for each agent
274
  agent_a_events = [
275
- e for e in generated_events if "AgentA" in str(e.message) and e.type != "streaming"
276
  ]
277
  agent_b_events = [
278
- e for e in generated_events if "AgentB" in str(e.message) and e.type != "streaming"
279
  ]
280
 
281
  # Both should have completion events
282
- assert len(agent_a_events) >= 1, f"No AgentA events: {[e.message for e in generated_events]}"
283
- assert len(agent_b_events) >= 1, f"No AgentB events: {[e.message for e in generated_events]}"
 
 
 
 
284
 
285
  # Agent A should have its own text
286
- assert "Agent A" in agent_a_events[0].message
287
  # Agent B should have its own text, NOT Agent A's
288
- assert "Agent B" in agent_b_events[0].message
289
- assert "Agent A" not in agent_b_events[0].message, "Buffer not cleared between agents!"
 
174
  Input: Updates ("Hello", " World") -> Completed
175
  Expected: AgentEvent with "Hello World"
176
  """
177
+ # Use "searcher" to map to "SearchAgent"
178
  events = [
179
+ MockAgentRunUpdateEvent("Hello", author_name="searcher"),
180
+ MockAgentRunUpdateEvent(" World", author_name="searcher"),
181
+ MockExecutorCompletedEvent(executor_id="searcher"),
182
  ]
183
 
184
  async def mock_stream(*args, **kwargs):
 
193
  async for event in mock_orchestrator.run("test query"):
194
  generated_events.append(event)
195
 
196
+ # Find the completion event for SearchAgent (non-streaming)
197
  chat_events = [
198
+ e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
199
  ]
200
 
201
  assert len(chat_events) >= 1, (
202
+ f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
203
  )
204
  final_event = chat_events[0]
205
 
 
215
  Input: No Deltas -> Completed
216
  Expected: AgentEvent with fallback text
217
  """
218
+ # Use "searcher" to map to "SearchAgent"
219
  events = [
220
+ MockExecutorCompletedEvent(executor_id="searcher"),
221
  ]
222
 
223
  async def mock_stream(*args, **kwargs):
 
253
  Verify buffer clears between agents.
254
  Agent B should NOT inherit Agent A's accumulated text.
255
  """
256
+ # Use "searcher" (SearchAgent) and "judge" (JudgeAgent)
257
  events = [
258
+ MockAgentRunUpdateEvent("Searcher says hi", author_name="searcher"),
259
+ MockExecutorCompletedEvent(executor_id="searcher"),
260
+ MockAgentRunUpdateEvent("Judge responds", author_name="judge"),
261
+ MockExecutorCompletedEvent(executor_id="judge"),
262
  ]
263
 
264
  async def mock_stream(*args, **kwargs):
 
275
 
276
  # Find non-streaming events for each agent
277
  agent_a_events = [
278
+ e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
279
  ]
280
  agent_b_events = [
281
+ e for e in generated_events if "JudgeAgent" in str(e.message) and e.type != "streaming"
282
  ]
283
 
284
  # Both should have completion events
285
+ assert len(agent_a_events) >= 1, (
286
+ f"No SearchAgent events: {[e.message for e in generated_events]}"
287
+ )
288
+ assert len(agent_b_events) >= 1, (
289
+ f"No JudgeAgent events: {[e.message for e in generated_events]}"
290
+ )
291
 
292
  # Agent A should have its own text
293
+ assert "Searcher" in agent_a_events[0].message
294
  # Agent B should have its own text, NOT Agent A's
295
+ assert "Judge" in agent_b_events[0].message
296
+ assert "Searcher" not in agent_b_events[0].message, "Buffer not cleared between agents!"
uv.lock CHANGED
@@ -1169,7 +1169,7 @@ requires-dist = [
1169
  { name = "structlog", specifier = ">=24.1" },
1170
  { name = "tenacity", specifier = ">=8.2" },
1171
  { name = "typer", marker = "extra == 'dev'", specifier = ">=0.9.0" },
1172
- { name = "urllib3", specifier = ">=2.5.0" },
1173
  { name = "xmltodict", specifier = ">=0.13" },
1174
  ]
1175
  provides-extras = ["dev", "magentic", "rag"]
@@ -6175,11 +6175,11 @@ wheels = [
6175
 
6176
  [[package]]
6177
  name = "urllib3"
6178
- version = "2.5.0"
6179
  source = { registry = "https://pypi.org/simple" }
6180
- sdist = { url = "https://files.pythonhosted.org/packages/15/22/9ee70a2574a4f4599c47dd506532914ce044817c7752a79b6a51286319bc/urllib3-2.5.0.tar.gz", hash = "sha256:3fc47733c7e419d4bc3f6b3dc2b4f890bb743906a30d56ba4a5bfa4bbff92760", size = 393185 }
6181
  wheels = [
6182
- { url = "https://files.pythonhosted.org/packages/a7/c2/fe1e52489ae3122415c51f387e221dd0773709bad6c6cdaa599e8a2c5185/urllib3-2.5.0-py3-none-any.whl", hash = "sha256:e6b01673c0fa6a13e374b50871808eb3bf7046c4b125b216f6bf1cc604cff0dc", size = 129795 },
6183
  ]
6184
 
6185
  [[package]]
 
1169
  { name = "structlog", specifier = ">=24.1" },
1170
  { name = "tenacity", specifier = ">=8.2" },
1171
  { name = "typer", marker = "extra == 'dev'", specifier = ">=0.9.0" },
1172
+ { name = "urllib3", specifier = ">=2.6.0" },
1173
  { name = "xmltodict", specifier = ">=0.13" },
1174
  ]
1175
  provides-extras = ["dev", "magentic", "rag"]
 
6175
 
6176
  [[package]]
6177
  name = "urllib3"
6178
+ version = "2.6.0"
6179
  source = { registry = "https://pypi.org/simple" }
6180
+ sdist = { url = "https://files.pythonhosted.org/packages/1c/43/554c2569b62f49350597348fc3ac70f786e3c32e7f19d266e19817812dd3/urllib3-2.6.0.tar.gz", hash = "sha256:cb9bcef5a4b345d5da5d145dc3e30834f58e8018828cbc724d30b4cb7d4d49f1", size = 432585 }
6181
  wheels = [
6182
+ { url = "https://files.pythonhosted.org/packages/56/1a/9ffe814d317c5224166b23e7c47f606d6e473712a2fad0f704ea9b99f246/urllib3-2.6.0-py3-none-any.whl", hash = "sha256:c90f7a39f716c572c4e3e58509581ebd83f9b59cced005b7db7ad2d22b0db99f", size = 131083 },
6183
  ]
6184
 
6185
  [[package]]