| # SPEC 04: Magentic Mode UX Improvements |
|
|
| ## Priority: P1 (Demo Quality) |
|
|
| ## Problem Statement |
|
|
| Magentic (advanced) mode has several UX issues that degrade the user experience: |
|
|
| 1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased |
| 2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows |
| 3. **P1: Timeout not configurable** - Users can't adjust based on their needs |
| 4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches |
|
|
| ## Related Issues |
|
|
| - GitHub Issue #68: Magentic mode times out at 300s without completing |
| - GitHub Issue #65: Demo timing (predecessor, now closed) |
| - SPEC_01: Demo Termination (implemented the basic timeout) |
| |
| ## Bug Analysis |
| |
| ### Bug 1: Chat History Cleared on Timeout (P0) |
| |
| **Location**: `src/app.py:205-206` |
| |
| **Current Code**: |
| ```python |
| if event.type == "complete": |
| yield event.message # BUG: Discards all accumulated progress! |
| else: |
| event_md = event.to_markdown() |
| response_parts.append(event_md) |
| yield "\n\n".join(response_parts) |
| ``` |
| |
| **Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did. |
| |
| **User Sees**: |
| ``` |
| Research timed out. Synthesizing available evidence... |
| ``` |
| |
| **User Should See**: |
| ``` |
| 🚀 STARTED: Starting research (Magentic mode)... |
| ⏳ THINKING: Multi-agent reasoning in progress... |
| 🧠 JUDGING: Manager (user_task): Research drug repurposing... |
| 🧠 JUDGING: Manager (task_ledger): We are working to address... |
| 🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical... |
| ⏱️ Research timed out. Synthesizing available evidence... |
| ``` |
| |
| **Fix**: |
| ```python |
| if event.type == "complete": |
| response_parts.append(event.message) |
| yield "\n\n".join(response_parts) # Preserves all progress |
| ``` |
| |
| ### Bug 2: Timeout Too Short (P1) |
|
|
| **Location**: `src/orchestrator_magentic.py:48` |
|
|
| **Current**: `timeout_seconds: float = 300.0` (5 minutes) |
|
|
| **Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes. |
|
|
| **Analysis of Per-Agent Latency**: |
| | Agent | Typical Latency | Worst Case | |
| |-------|-----------------|------------| |
| | SearchAgent | 30-60s | 120s (network issues) | |
| | HypothesisAgent | 60-90s | 180s (complex reasoning) | |
| | JudgeAgent | 30-60s | 120s | |
| | ReportAgent | 60-120s | 240s (long synthesis) | |
|
|
| With `max_rounds=10`: 10 × 4 × 90s = 60 minutes worst case. |
|
|
| ### Bug 3: Timeout Not Configurable (P1) |
|
|
| **Problem**: The factory doesn't pass timeout config to MagenticOrchestrator. |
|
|
| **Location**: `src/orchestrator_factory.py:52-55` |
| ```python |
| return orchestrator_cls( |
| max_rounds=config.max_iterations if config else 10, |
| api_key=api_key, |
| # Missing: timeout_seconds |
| ) |
| ``` |
|
|
| ## Proposed Solutions |
|
|
| ### Fix 1: Preserve Chat History (P0) |
|
|
| ```python |
| # src/app.py - Replace lines 205-212 |
| if event.type == "complete": |
| # Preserve accumulated progress + add completion message |
| response_parts.append(event.message) |
| yield "\n\n".join(response_parts) |
| else: |
| event_md = event.to_markdown() |
| response_parts.append(event_md) |
| yield "\n\n".join(response_parts) |
| ``` |
|
|
| **Test**: |
| ```python |
| @pytest.mark.asyncio |
| async def test_timeout_preserves_chat_history(mock_magentic_workflow): |
| """Verify timeout doesn't erase progress events.""" |
| # Mock workflow that yields events then times out |
| events = [] |
| async for event in research_agent("test", [], "advanced", "sk-test"): |
| events.append(event) |
| |
| # Should contain both progress AND timeout message |
| output = events[-1] # Final yield |
| assert "STARTED" in output |
| assert "timed out" in output.lower() |
| ``` |
|
|
| ### Fix 2: Increase Default Timeout (P1) |
|
|
| ```python |
| # src/orchestrator_magentic.py |
| def __init__( |
| self, |
| max_rounds: int = 10, |
| chat_client: OpenAIChatClient | None = None, |
| api_key: str | None = None, |
| timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5) |
| ) -> None: |
| ``` |
|
|
| ### Fix 3: Make Timeout Configurable via Environment (P1) |
|
|
| ```python |
| # src/utils/config.py |
| class Settings(BaseSettings): |
| # ... existing fields ... |
| magentic_timeout: int = Field( |
| default=600, |
| description="Timeout for Magentic mode in seconds", |
| ) |
| ``` |
|
|
| ```python |
| # src/orchestrator_factory.py |
| return orchestrator_cls( |
| max_rounds=config.max_iterations if config else 10, |
| api_key=api_key, |
| timeout_seconds=settings.magentic_timeout, # NEW |
| ) |
| ``` |
|
|
| ### Fix 4: Graceful Degradation (P2 - Future) |
|
|
| ```python |
| # src/orchestrator_magentic.py - Inside run() loop |
| elapsed = time.time() - start_time |
| time_remaining = self._timeout_seconds - elapsed |
| |
| # If 80% of time elapsed, force synthesis |
| if time_remaining < self._timeout_seconds * 0.2: |
| yield AgentEvent( |
| type="synthesizing", |
| message="Time limit approaching, synthesizing available evidence...", |
| iteration=iteration, |
| ) |
| # TODO: Inject signal to trigger ReportAgent |
| break |
| ``` |
|
|
| ## Implementation Order |
|
|
| 1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change |
| 2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change |
| 3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files |
| 4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals |
|
|
| ## Acceptance Criteria |
|
|
| - [x] Timeout shows ALL progress events, not just timeout message |
| - [x] Default timeout increased to 600s (10 minutes) |
| - [x] Timeout configurable via `MAGENTIC_TIMEOUT` env var |
| - [x] Tests verify chat history preserved on timeout |
| - [ ] (P2) System synthesizes early when timeout approaches (Future) |
|
|
| **Status: IMPLEMENTED** (commit cb46aac) |
|
|
| ## Files to Modify |
|
|
| 1. `src/app.py` - Fix chat history clearing (lines 205-212) |
| 2. `src/orchestrator_magentic.py` - Increase default timeout |
| 3. `src/utils/config.py` - Add `magentic_timeout` setting |
| 4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator |
| 5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation |
|
|
| ## Test Plan |
|
|
| ```python |
| # tests/unit/test_app_timeout.py |
| |
| @pytest.mark.asyncio |
| async def test_complete_event_preserves_history(): |
| """Complete events should append to history, not replace it.""" |
| from src.app import research_agent |
| |
| # This requires mocking the orchestrator to emit events then complete |
| # Verify final output contains ALL events, not just completion message |
| pass |
| |
| |
| @pytest.mark.asyncio |
| async def test_timeout_configurable(): |
| """Verify MAGENTIC_TIMEOUT env var is respected.""" |
| import os |
| os.environ["MAGENTIC_TIMEOUT"] = "120" |
| |
| from src.utils.config import Settings |
| settings = Settings() |
| assert settings.magentic_timeout == 120 |
| ``` |
|
|
| ## Risk Assessment |
|
|
| | Fix | Risk | Mitigation | |
| |-----|------|------------| |
| | Fix 1 | Low | Simple change, well-understood | |
| | Fix 2 | Low | Just a default value change | |
| | Fix 3 | Medium | New config, needs validation | |
| | Fix 4 | High | Requires understanding agent-framework internals | |
|
|
| ## Dependencies |
|
|
| - Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager. |
|
|