SPEC_15: Advanced Mode Performance Optimization
Status: β IMPLEMENTED Priority: P1 GitHub Issue: #65 Estimated Effort: Medium (config changes + early termination logic) Last Updated: 2025-12-01
Implementation Commits:
dbf888c- P2 dead zones fix (granular init events + progress estimation)a31cea6- JudgeAgent termination test- Config:
settings.advanced_max_rounds=5,settings.advanced_timeout=300
Senior Review Verdict: β APPROVED Recommendation: Implement Solution A + B + C together. Solution B (Early Termination) is NOT "post-hackathon" - it's the core fix that solves the root cause. The patterns used are consistent with Microsoft Agent Framework best practices.
Problem Statement
Advanced (Multi-Agent) mode runs 10 rounds of multi-agent coordination which takes 10-15+ minutes.
For hackathon demos: No judge will wait this long. They'll close the tab before seeing results.
Observed Behavior
- System works correctly (no crashes)
- Produces detailed, high-quality research output
- Takes too long for practical demo use
- User had to manually terminate after ~10 minutes
Current Configuration
# src/orchestrators/advanced.py:133
.with_standard_manager(
chat_client=manager_client,
max_round_count=self._max_rounds, # Default: 10
max_stall_count=3,
max_reset_count=2,
)
Time Breakdown (Estimated)
| Component | Time per Round | Notes |
|---|---|---|
| Manager LLM call | 2-5s | Decides next agent |
| Search Agent | 10-20s | 4 API calls (PubMed, CT, EPMC, OA) |
| Hypothesis Agent | 5-10s | LLM reasoning |
| Judge Agent | 5-10s | LLM evaluation |
| Report Agent | 10-20s | LLM synthesis (when called) |
Total per round: ~30-60 seconds 10 rounds: 5-10 minutes minimum
Root Cause Analysis
Issue 1: Default max_rounds=10 is Too High
The Microsoft Agent Framework keeps iterating until:
max_roundsreached, OR- Manager decides workflow is complete
For research tasks, the manager often wants "more evidence" and keeps searching.
Issue 2: No Early Termination Heuristic
Even when the Judge says sufficient=True with high confidence, the workflow continues because the manager wants to be thorough.
Issue 3: No User Expectation Setting
Users don't know how long to expect. Progress indication is minimal.
Proposed Solutions
Solution A: Reduce Default max_rounds (QUICK FIX)
Change: Reduce max_rounds from 10 to 5 (or make configurable via env).
# src/orchestrators/advanced.py
def __init__(
self,
max_rounds: int | None = None, # Changed from 10
...
) -> None:
# Read from environment, default to 5 for faster demos
default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
self._max_rounds = max_rounds if max_rounds is not None else default_rounds
Pros:
- Simple, 2-line change
- Immediately halves demo time
Cons:
- Less thorough research
- Trade-off: speed vs. quality
Solution B: Early Termination on High-Confidence Judge (RECOMMENDED)
Change: Add workflow termination signal when Judge returns sufficient=True with confidence > 70%.
This requires modifying the JudgeAgent to signal completion:
# src/agents/magentic_agents.py - create_judge_agent()
@chat_agent.on_message
async def handle_judge_message(message: str, context: Context) -> ChatMessage:
"""Process judge request and potentially signal completion."""
# ... existing judge logic ...
assessment = await judge_handler.evaluate(evidence, query)
if assessment.sufficient and assessment.confidence >= 0.70:
# Signal to manager that we have enough evidence
# The manager prompt should respect this signal
return ChatMessage(
content=f"SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
f"Recommend immediate synthesis. {assessment.reasoning}",
metadata={"sufficient": True, "confidence": assessment.confidence},
)
return ChatMessage(content=f"INSUFFICIENT: {assessment.reasoning}")
And update the manager's system prompt to respect this:
# src/orchestrators/advanced.py - _build_workflow()
manager_system_prompt = """You are a research workflow manager.
IMPORTANT: When JudgeAgent returns "SUFFICIENT EVIDENCE", immediately
delegate to ReportAgent for final synthesis. Do NOT continue searching.
Workflow:
1. SearchAgent finds evidence
2. HypothesisAgent generates hypotheses
3. JudgeAgent evaluates sufficiency
4. IF sufficient β ReportAgent synthesizes (END)
5. IF insufficient β SearchAgent refines search (CONTINUE)
"""
Pros:
- Respects actual evidence quality
- Can terminate early (round 3-4) when evidence is strong
- Maintains quality for complex queries
Cons:
- Requires testing to ensure manager respects signal
- More complex change
Solution C: Better Progress Indication
Add estimated time remaining to UI:
# src/orchestrators/advanced.py - run()
yield AgentEvent(
type="progress",
message=f"Round {iteration}/{self._max_rounds} "
f"(~{(self._max_rounds - iteration) * 45}s remaining)",
iteration=iteration,
)
Pros:
- Sets user expectations
- Doesn't change workflow behavior
Cons:
- Doesn't actually speed up the workflow
Recommended Implementation
IMPLEMENT ALL THREE SOLUTIONS NOW:
- Solution A: Reduce
max_roundsto 5 via environment variable - Solution B: Early termination when Judge returns
sufficient=Truewith confidence β₯70% - Solution C: Better progress indication with time estimates
Why Solution B NOW? The Manager acting as a "termination condition" based on Judge feedback is a standard multi-agent pattern (Critique/Refine loop with exit). This aligns with Microsoft Agent Framework best practices and solves the ROOT CAUSE, not just a symptom.
Implementation Details
Phase 1: All Solutions Together (A + B + C)
1. Update Advanced Orchestrator Constructor
# src/orchestrators/advanced.py
import os
class AdvancedOrchestrator(OrchestratorProtocol):
def __init__(
self,
max_rounds: int | None = None,
chat_client: OpenAIChatClient | None = None,
api_key: str | None = None,
timeout_seconds: float = 300.0, # Reduced from 600 to 5 min
domain: ResearchDomain | str | None = None,
) -> None:
# Environment-configurable rounds (default 5 for demos)
default_rounds = int(os.getenv("ADVANCED_MAX_ROUNDS", "5"))
self._max_rounds = max_rounds if max_rounds is not None else default_rounds
self._timeout_seconds = timeout_seconds
# ... rest unchanged ...
2. Add Progress Estimation
# src/orchestrators/advanced.py - run()
# After processing MagenticAgentMessageEvent:
if isinstance(event, MagenticAgentMessageEvent):
iteration += 1
rounds_remaining = self._max_rounds - iteration
# Estimate ~45s per round based on observed timing
est_seconds = rounds_remaining * 45
est_display = f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
yield AgentEvent(
type="progress",
message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
iteration=iteration,
)
3. Update UI Message (Solution C)
# src/orchestrators/advanced.py - run()
# UX FIX: More accurate timing message
yield AgentEvent(
type="thinking",
message=(
f"Multi-agent reasoning in progress ({self._max_rounds} rounds max)... "
f"Estimated time: {self._max_rounds * 45 // 60}-{self._max_rounds * 60 // 60} minutes."
),
iteration=0,
)
4. Add Early Termination Signal (Solution B)
# src/agents/magentic_agents.py - Update create_judge_agent()
@chat_agent.on_message
async def handle_judge_message(message: str, context: Context) -> ChatMessage:
"""Process judge request and signal completion when evidence is sufficient."""
# ... existing parsing logic to extract evidence and query ...
assessment = await judge_handler.evaluate(evidence, query)
# NEW: Strong termination signal for high-confidence assessments
if assessment.sufficient and assessment.confidence >= 0.70:
# Clear, unambiguous signal that Manager should respect
return ChatMessage(
content=(
f"β
SUFFICIENT EVIDENCE (confidence: {assessment.confidence:.0%}). "
f"STOP SEARCHING. Delegate to ReportAgent NOW for final synthesis. "
f"Reasoning: {assessment.reasoning}"
),
metadata={"sufficient": True, "confidence": assessment.confidence},
)
# Insufficient - continue the loop
return ChatMessage(
content=(
f"β INSUFFICIENT: {assessment.reasoning}. "
f"Confidence: {assessment.confidence:.0%}. "
f"Suggested refinements: {', '.join(assessment.next_search_queries[:2])}"
)
)
5. Update Manager System Prompt (Solution B)
# src/orchestrators/advanced.py - _build_workflow()
MANAGER_SYSTEM_PROMPT = """You are a medical research workflow manager.
## CRITICAL RULE
When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
β IMMEDIATELY delegate to ReportAgent for synthesis
β Do NOT continue searching or gathering more evidence
β The Judge has determined evidence quality is adequate
## Standard Workflow
1. SearchAgent β finds evidence from PubMed, ClinicalTrials, etc.
2. HypothesisAgent β generates testable hypotheses
3. JudgeAgent β evaluates evidence sufficiency
4. IF sufficient β ReportAgent (DONE)
5. IF insufficient β SearchAgent with refined queries (CONTINUE)
## Your Role
- Coordinate agents efficiently
- Respect the Judge's termination signal
- Prioritize completing the task over perfection
"""
Test Plan
Unit Tests
# tests/unit/orchestrators/test_advanced_orchestrator.py
import os
from unittest.mock import patch
import pytest
from src.orchestrators.advanced import AdvancedOrchestrator
class TestAdvancedOrchestratorConfig:
"""Tests for configuration options."""
def test_default_max_rounds_is_five(self) -> None:
"""Default max_rounds should be 5 for faster demos."""
with patch.dict(os.environ, {}, clear=True):
# Clear any existing env var
os.environ.pop("ADVANCED_MAX_ROUNDS", None)
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
orch.__init__()
assert orch._max_rounds == 5
def test_max_rounds_from_env(self) -> None:
"""max_rounds should be configurable via environment."""
with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
orch.__init__()
assert orch._max_rounds == 3
def test_explicit_max_rounds_overrides_env(self) -> None:
"""Explicit parameter should override environment."""
with patch.dict(os.environ, {"ADVANCED_MAX_ROUNDS": "3"}):
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
orch.__init__(max_rounds=7)
assert orch._max_rounds == 7
def test_timeout_default_is_five_minutes(self) -> None:
"""Default timeout should be 300s (5 min) for faster failure."""
orch = AdvancedOrchestrator.__new__(AdvancedOrchestrator)
orch.__init__()
assert orch._timeout_seconds == 300.0
Integration Test (Manual)
# Run advanced mode with reduced rounds
ADVANCED_MAX_ROUNDS=3 uv run python -c "
import asyncio
from src.orchestrators.advanced import AdvancedOrchestrator
async def test():
orch = AdvancedOrchestrator()
print(f'Max rounds: {orch._max_rounds}') # Should be 3
async for event in orch.run('sildenafil mechanism'):
print(f'{event.type}: {event.message[:100]}...')
asyncio.run(test())
"
Timing Benchmark
Create a benchmark script to measure actual performance:
# examples/benchmark_advanced.py
"""Benchmark Advanced mode with different max_rounds settings."""
import asyncio
import os
import time
async def benchmark(max_rounds: int) -> float:
"""Run benchmark with specified rounds, return elapsed time."""
os.environ["ADVANCED_MAX_ROUNDS"] = str(max_rounds)
# Import after setting env
from src.orchestrators.advanced import AdvancedOrchestrator
orch = AdvancedOrchestrator()
start = time.time()
async for event in orch.run("sildenafil erectile dysfunction"):
if event.type == "complete":
break
return time.time() - start
async def main() -> None:
"""Run benchmarks for different configurations."""
for rounds in [3, 5, 7, 10]:
elapsed = await benchmark(rounds)
print(f"max_rounds={rounds}: {elapsed:.1f}s ({elapsed/60:.1f}min)")
if __name__ == "__main__":
asyncio.run(main())
Files to Modify
| File | Change |
|---|---|
src/orchestrators/advanced.py |
Add env-configurable max_rounds, reduce default to 5, add progress estimation, update Manager prompt |
src/agents/magentic_agents.py |
Add early termination signal in JudgeAgent |
tests/unit/orchestrators/test_advanced_orchestrator.py |
Add config tests |
tests/unit/agents/test_magentic_judge_termination.py |
Add termination signal tests |
examples/benchmark_advanced.py |
Add timing benchmark (optional) |
Acceptance Criteria
Solution A: Configuration
- Default
max_roundsis 5 (not 10) -settings.advanced_max_rounds=5 -
max_roundsconfigurable viaADVANCED_MAX_ROUNDSenv var - pydantic-settings auto-reads - Explicit
max_roundsparameter overrides env var -advanced.py:89 - Default timeout is 5 minutes (300s, not 600s) -
settings.advanced_timeout=300
Solution B: Early Termination
- JudgeAgent returns "SUFFICIENT EVIDENCE" message when confidence β₯70% -
magentic_agents.py:95-98 - JudgeAgent returns "STOP SEARCHING" in termination signal -
magentic_agents.py:97 - Manager system prompt includes explicit termination instructions -
advanced.py:146-152 - Workflow terminates early when Judge signals sufficiency - test:
test_magentic_judge_termination.py
Solution C: Progress Indication
- Progress events show current round / max rounds -
_get_progress_message() - Progress events show estimated time remaining -
_get_progress_message() - Initial "thinking" message shows estimated total time -
advanced.py:226-228
Overall
- Demo completes in <5 minutes with useful output - 5 rounds Γ 45s β 3-4 min
- Quality of output is maintained (no degradation from early termination)
Rollback Plan
If reduced rounds cause quality issues:
- Increase
ADVANCED_MAX_ROUNDSenvironment variable - No code changes needed
References
- GitHub Issue #65
- Microsoft Agent Framework: https://github.com/microsoft/agent-framework
- MagenticBuilder docs: Round configuration