SPEC 04: Magentic Mode UX Improvements
Priority: P1 (Demo Quality)
Problem Statement
Magentic (advanced) mode has several UX issues that degrade the user experience:
- P0: Chat history cleared on timeout - When timeout occurs, all progress events are erased
- P1: Timeout too short - 300s default insufficient for complex multi-agent workflows
- P1: Timeout not configurable - Users can't adjust based on their needs
- P2: No graceful degradation - System doesn't synthesize early when timeout approaches
Related Issues
- GitHub Issue #68: Magentic mode times out at 300s without completing
- GitHub Issue #65: Demo timing (predecessor, now closed)
- SPEC_01: Demo Termination (implemented the basic timeout)
Bug Analysis
Bug 1: Chat History Cleared on Timeout (P0)
Location: src/app.py:205-206
Current Code:
if event.type == "complete":
yield event.message # BUG: Discards all accumulated progress!
else:
event_md = event.to_markdown()
response_parts.append(event_md)
yield "\n\n".join(response_parts)
Problem: The complete event (including timeout) yields ONLY the completion message, discarding all the response_parts that show what the system actually did.
User Sees:
Research timed out. Synthesizing available evidence...
User Should See:
π STARTED: Starting research (Magentic mode)...
β³ THINKING: Multi-agent reasoning in progress...
π§ JUDGING: Manager (user_task): Research drug repurposing...
π§ JUDGING: Manager (task_ledger): We are working to address...
π§ JUDGING: Manager (instruction): Task: Retrieve human clinical...
β±οΈ Research timed out. Synthesizing available evidence...
Fix:
if event.type == "complete":
response_parts.append(event.message)
yield "\n\n".join(response_parts) # Preserves all progress
Bug 2: Timeout Too Short (P1)
Location: src/orchestrator_magentic.py:48
Current: timeout_seconds: float = 300.0 (5 minutes)
Problem: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.
Analysis of Per-Agent Latency:
| Agent | Typical Latency | Worst Case |
|---|---|---|
| SearchAgent | 30-60s | 120s (network issues) |
| HypothesisAgent | 60-90s | 180s (complex reasoning) |
| JudgeAgent | 30-60s | 120s |
| ReportAgent | 60-120s | 240s (long synthesis) |
With max_rounds=10: 10 Γ 4 Γ 90s = 60 minutes worst case.
Bug 3: Timeout Not Configurable (P1)
Problem: The factory doesn't pass timeout config to MagenticOrchestrator.
Location: src/orchestrator_factory.py:52-55
return orchestrator_cls(
max_rounds=config.max_iterations if config else 10,
api_key=api_key,
# Missing: timeout_seconds
)
Proposed Solutions
Fix 1: Preserve Chat History (P0)
# src/app.py - Replace lines 205-212
if event.type == "complete":
# Preserve accumulated progress + add completion message
response_parts.append(event.message)
yield "\n\n".join(response_parts)
else:
event_md = event.to_markdown()
response_parts.append(event_md)
yield "\n\n".join(response_parts)
Test:
@pytest.mark.asyncio
async def test_timeout_preserves_chat_history(mock_magentic_workflow):
"""Verify timeout doesn't erase progress events."""
# Mock workflow that yields events then times out
events = []
async for event in research_agent("test", [], "advanced", "sk-test"):
events.append(event)
# Should contain both progress AND timeout message
output = events[-1] # Final yield
assert "STARTED" in output
assert "timed out" in output.lower()
Fix 2: Increase Default Timeout (P1)
# src/orchestrator_magentic.py
def __init__(
self,
max_rounds: int = 10,
chat_client: OpenAIChatClient | None = None,
api_key: str | None = None,
timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5)
) -> None:
Fix 3: Make Timeout Configurable via Environment (P1)
# src/utils/config.py
class Settings(BaseSettings):
# ... existing fields ...
magentic_timeout: int = Field(
default=600,
description="Timeout for Magentic mode in seconds",
)
# src/orchestrator_factory.py
return orchestrator_cls(
max_rounds=config.max_iterations if config else 10,
api_key=api_key,
timeout_seconds=settings.magentic_timeout, # NEW
)
Fix 4: Graceful Degradation (P2 - Future)
# src/orchestrator_magentic.py - Inside run() loop
elapsed = time.time() - start_time
time_remaining = self._timeout_seconds - elapsed
# If 80% of time elapsed, force synthesis
if time_remaining < self._timeout_seconds * 0.2:
yield AgentEvent(
type="synthesizing",
message="Time limit approaching, synthesizing available evidence...",
iteration=iteration,
)
# TODO: Inject signal to trigger ReportAgent
break
Implementation Order
- Fix 1 (P0): Chat history preservation - 5 minutes, 1 line change
- Fix 2 (P1): Increase default timeout - 5 minutes, 1 line change
- Fix 3 (P1): Environment config - 15 minutes, 3 files
- Fix 4 (P2): Graceful degradation - 1 hour, research agent-framework signals
Acceptance Criteria
- Timeout shows ALL progress events, not just timeout message
- Default timeout increased to 600s (10 minutes)
- Timeout configurable via
MAGENTIC_TIMEOUTenv var - Tests verify chat history preserved on timeout
- (P2) System synthesizes early when timeout approaches (Future)
Status: IMPLEMENTED (commit cb46aac)
Files to Modify
src/app.py- Fix chat history clearing (lines 205-212)src/orchestrator_magentic.py- Increase default timeoutsrc/utils/config.py- Addmagentic_timeoutsettingsrc/orchestrator_factory.py- Pass timeout to MagenticOrchestratortests/unit/test_app_timeout.py- NEW: Test chat history preservation
Test Plan
# tests/unit/test_app_timeout.py
@pytest.mark.asyncio
async def test_complete_event_preserves_history():
"""Complete events should append to history, not replace it."""
from src.app import research_agent
# This requires mocking the orchestrator to emit events then complete
# Verify final output contains ALL events, not just completion message
pass
@pytest.mark.asyncio
async def test_timeout_configurable():
"""Verify MAGENTIC_TIMEOUT env var is respected."""
import os
os.environ["MAGENTIC_TIMEOUT"] = "120"
from src.utils.config import Settings
settings = Settings()
assert settings.magentic_timeout == 120
Risk Assessment
| Fix | Risk | Mitigation |
|---|---|---|
| Fix 1 | Low | Simple change, well-understood |
| Fix 2 | Low | Just a default value change |
| Fix 3 | Medium | New config, needs validation |
| Fix 4 | High | Requires understanding agent-framework internals |
Dependencies
- Fix 4 requires investigation of
agent-framework-coreto understand how to signal early termination to the workflow manager.