DeepBoner / docs /specs /SPEC_04_MAGENTIC_UX.md
VibecoderMcSwaggins's picture
docs: mark SPEC_03/04/05 as IMPLEMENTED with acceptance criteria
af7d422
|
raw
history blame
7.25 kB

SPEC 04: Magentic Mode UX Improvements

Priority: P1 (Demo Quality)

Problem Statement

Magentic (advanced) mode has several UX issues that degrade the user experience:

  1. P0: Chat history cleared on timeout - When timeout occurs, all progress events are erased
  2. P1: Timeout too short - 300s default insufficient for complex multi-agent workflows
  3. P1: Timeout not configurable - Users can't adjust based on their needs
  4. P2: No graceful degradation - System doesn't synthesize early when timeout approaches

Related Issues

  • GitHub Issue #68: Magentic mode times out at 300s without completing
  • GitHub Issue #65: Demo timing (predecessor, now closed)
  • SPEC_01: Demo Termination (implemented the basic timeout)

Bug Analysis

Bug 1: Chat History Cleared on Timeout (P0)

Location: src/app.py:205-206

Current Code:

if event.type == "complete":
    yield event.message  # BUG: Discards all accumulated progress!
else:
    event_md = event.to_markdown()
    response_parts.append(event_md)
    yield "\n\n".join(response_parts)

Problem: The complete event (including timeout) yields ONLY the completion message, discarding all the response_parts that show what the system actually did.

User Sees:

Research timed out. Synthesizing available evidence...

User Should See:

πŸš€ STARTED: Starting research (Magentic mode)...
⏳ THINKING: Multi-agent reasoning in progress...
🧠 JUDGING: Manager (user_task): Research drug repurposing...
🧠 JUDGING: Manager (task_ledger): We are working to address...
🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical...
⏱️ Research timed out. Synthesizing available evidence...

Fix:

if event.type == "complete":
    response_parts.append(event.message)
    yield "\n\n".join(response_parts)  # Preserves all progress

Bug 2: Timeout Too Short (P1)

Location: src/orchestrator_magentic.py:48

Current: timeout_seconds: float = 300.0 (5 minutes)

Problem: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.

Analysis of Per-Agent Latency:

Agent Typical Latency Worst Case
SearchAgent 30-60s 120s (network issues)
HypothesisAgent 60-90s 180s (complex reasoning)
JudgeAgent 30-60s 120s
ReportAgent 60-120s 240s (long synthesis)

With max_rounds=10: 10 Γ— 4 Γ— 90s = 60 minutes worst case.

Bug 3: Timeout Not Configurable (P1)

Problem: The factory doesn't pass timeout config to MagenticOrchestrator.

Location: src/orchestrator_factory.py:52-55

return orchestrator_cls(
    max_rounds=config.max_iterations if config else 10,
    api_key=api_key,
    # Missing: timeout_seconds
)

Proposed Solutions

Fix 1: Preserve Chat History (P0)

# src/app.py - Replace lines 205-212
if event.type == "complete":
    # Preserve accumulated progress + add completion message
    response_parts.append(event.message)
    yield "\n\n".join(response_parts)
else:
    event_md = event.to_markdown()
    response_parts.append(event_md)
    yield "\n\n".join(response_parts)

Test:

@pytest.mark.asyncio
async def test_timeout_preserves_chat_history(mock_magentic_workflow):
    """Verify timeout doesn't erase progress events."""
    # Mock workflow that yields events then times out
    events = []
    async for event in research_agent("test", [], "advanced", "sk-test"):
        events.append(event)

    # Should contain both progress AND timeout message
    output = events[-1]  # Final yield
    assert "STARTED" in output
    assert "timed out" in output.lower()

Fix 2: Increase Default Timeout (P1)

# src/orchestrator_magentic.py
def __init__(
    self,
    max_rounds: int = 10,
    chat_client: OpenAIChatClient | None = None,
    api_key: str | None = None,
    timeout_seconds: float = 600.0,  # Changed: 10 minutes (was 5)
) -> None:

Fix 3: Make Timeout Configurable via Environment (P1)

# src/utils/config.py
class Settings(BaseSettings):
    # ... existing fields ...
    magentic_timeout: int = Field(
        default=600,
        description="Timeout for Magentic mode in seconds",
    )
# src/orchestrator_factory.py
return orchestrator_cls(
    max_rounds=config.max_iterations if config else 10,
    api_key=api_key,
    timeout_seconds=settings.magentic_timeout,  # NEW
)

Fix 4: Graceful Degradation (P2 - Future)

# src/orchestrator_magentic.py - Inside run() loop
elapsed = time.time() - start_time
time_remaining = self._timeout_seconds - elapsed

# If 80% of time elapsed, force synthesis
if time_remaining < self._timeout_seconds * 0.2:
    yield AgentEvent(
        type="synthesizing",
        message="Time limit approaching, synthesizing available evidence...",
        iteration=iteration,
    )
    # TODO: Inject signal to trigger ReportAgent
    break

Implementation Order

  1. Fix 1 (P0): Chat history preservation - 5 minutes, 1 line change
  2. Fix 2 (P1): Increase default timeout - 5 minutes, 1 line change
  3. Fix 3 (P1): Environment config - 15 minutes, 3 files
  4. Fix 4 (P2): Graceful degradation - 1 hour, research agent-framework signals

Acceptance Criteria

  • Timeout shows ALL progress events, not just timeout message
  • Default timeout increased to 600s (10 minutes)
  • Timeout configurable via MAGENTIC_TIMEOUT env var
  • Tests verify chat history preserved on timeout
  • (P2) System synthesizes early when timeout approaches (Future)

Status: IMPLEMENTED (commit cb46aac)

Files to Modify

  1. src/app.py - Fix chat history clearing (lines 205-212)
  2. src/orchestrator_magentic.py - Increase default timeout
  3. src/utils/config.py - Add magentic_timeout setting
  4. src/orchestrator_factory.py - Pass timeout to MagenticOrchestrator
  5. tests/unit/test_app_timeout.py - NEW: Test chat history preservation

Test Plan

# tests/unit/test_app_timeout.py

@pytest.mark.asyncio
async def test_complete_event_preserves_history():
    """Complete events should append to history, not replace it."""
    from src.app import research_agent

    # This requires mocking the orchestrator to emit events then complete
    # Verify final output contains ALL events, not just completion message
    pass


@pytest.mark.asyncio
async def test_timeout_configurable():
    """Verify MAGENTIC_TIMEOUT env var is respected."""
    import os
    os.environ["MAGENTIC_TIMEOUT"] = "120"

    from src.utils.config import Settings
    settings = Settings()
    assert settings.magentic_timeout == 120

Risk Assessment

Fix Risk Mitigation
Fix 1 Low Simple change, well-understood
Fix 2 Low Just a default value change
Fix 3 Medium New config, needs validation
Fix 4 High Requires understanding agent-framework internals

Dependencies

  • Fix 4 requires investigation of agent-framework-core to understand how to signal early termination to the workflow manager.