| # SPEC 04: Magentic Mode UX Improvements | |
| ## Priority: P1 (Demo Quality) | |
| ## Problem Statement | |
| Magentic (advanced) mode has several UX issues that degrade the user experience: | |
| 1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased | |
| 2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows | |
| 3. **P1: Timeout not configurable** - Users can't adjust based on their needs | |
| 4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches | |
| ## Related Issues | |
| - GitHub Issue #68: Magentic mode times out at 300s without completing | |
| - GitHub Issue #65: Demo timing (predecessor, now closed) | |
| - SPEC_01: Demo Termination (implemented the basic timeout) | |
| ## Bug Analysis | |
| ### Bug 1: Chat History Cleared on Timeout (P0) | |
| **Location**: `src/app.py:205-206` | |
| **Current Code**: | |
| ```python | |
| if event.type == "complete": | |
| yield event.message # BUG: Discards all accumulated progress! | |
| else: | |
| event_md = event.to_markdown() | |
| response_parts.append(event_md) | |
| yield "\n\n".join(response_parts) | |
| ``` | |
| **Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did. | |
| **User Sees**: | |
| ``` | |
| Research timed out. Synthesizing available evidence... | |
| ``` | |
| **User Should See**: | |
| ``` | |
| π STARTED: Starting research (Magentic mode)... | |
| β³ THINKING: Multi-agent reasoning in progress... | |
| π§ JUDGING: Manager (user_task): Research drug repurposing... | |
| π§ JUDGING: Manager (task_ledger): We are working to address... | |
| π§ JUDGING: Manager (instruction): Task: Retrieve human clinical... | |
| β±οΈ Research timed out. Synthesizing available evidence... | |
| ``` | |
| **Fix**: | |
| ```python | |
| if event.type == "complete": | |
| response_parts.append(event.message) | |
| yield "\n\n".join(response_parts) # Preserves all progress | |
| ``` | |
| ### Bug 2: Timeout Too Short (P1) | |
| **Location**: `src/orchestrator_magentic.py:48` | |
| **Current**: `timeout_seconds: float = 300.0` (5 minutes) | |
| **Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes. | |
| **Analysis of Per-Agent Latency**: | |
| | Agent | Typical Latency | Worst Case | | |
| |-------|-----------------|------------| | |
| | SearchAgent | 30-60s | 120s (network issues) | | |
| | HypothesisAgent | 60-90s | 180s (complex reasoning) | | |
| | JudgeAgent | 30-60s | 120s | | |
| | ReportAgent | 60-120s | 240s (long synthesis) | | |
| With `max_rounds=10`: 10 Γ 4 Γ 90s = 60 minutes worst case. | |
| ### Bug 3: Timeout Not Configurable (P1) | |
| **Problem**: The factory doesn't pass timeout config to MagenticOrchestrator. | |
| **Location**: `src/orchestrator_factory.py:52-55` | |
| ```python | |
| return orchestrator_cls( | |
| max_rounds=config.max_iterations if config else 10, | |
| api_key=api_key, | |
| # Missing: timeout_seconds | |
| ) | |
| ``` | |
| ## Proposed Solutions | |
| ### Fix 1: Preserve Chat History (P0) | |
| ```python | |
| # src/app.py - Replace lines 205-212 | |
| if event.type == "complete": | |
| # Preserve accumulated progress + add completion message | |
| response_parts.append(event.message) | |
| yield "\n\n".join(response_parts) | |
| else: | |
| event_md = event.to_markdown() | |
| response_parts.append(event_md) | |
| yield "\n\n".join(response_parts) | |
| ``` | |
| **Test**: | |
| ```python | |
| @pytest.mark.asyncio | |
| async def test_timeout_preserves_chat_history(mock_magentic_workflow): | |
| """Verify timeout doesn't erase progress events.""" | |
| # Mock workflow that yields events then times out | |
| events = [] | |
| async for event in research_agent("test", [], "advanced", "sk-test"): | |
| events.append(event) | |
| # Should contain both progress AND timeout message | |
| output = events[-1] # Final yield | |
| assert "STARTED" in output | |
| assert "timed out" in output.lower() | |
| ``` | |
| ### Fix 2: Increase Default Timeout (P1) | |
| ```python | |
| # src/orchestrator_magentic.py | |
| def __init__( | |
| self, | |
| max_rounds: int = 10, | |
| chat_client: OpenAIChatClient | None = None, | |
| api_key: str | None = None, | |
| timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5) | |
| ) -> None: | |
| ``` | |
| ### Fix 3: Make Timeout Configurable via Environment (P1) | |
| ```python | |
| # src/utils/config.py | |
| class Settings(BaseSettings): | |
| # ... existing fields ... | |
| magentic_timeout: int = Field( | |
| default=600, | |
| description="Timeout for Magentic mode in seconds", | |
| ) | |
| ``` | |
| ```python | |
| # src/orchestrator_factory.py | |
| return orchestrator_cls( | |
| max_rounds=config.max_iterations if config else 10, | |
| api_key=api_key, | |
| timeout_seconds=settings.magentic_timeout, # NEW | |
| ) | |
| ``` | |
| ### Fix 4: Graceful Degradation (P2 - Future) | |
| ```python | |
| # src/orchestrator_magentic.py - Inside run() loop | |
| elapsed = time.time() - start_time | |
| time_remaining = self._timeout_seconds - elapsed | |
| # If 80% of time elapsed, force synthesis | |
| if time_remaining < self._timeout_seconds * 0.2: | |
| yield AgentEvent( | |
| type="synthesizing", | |
| message="Time limit approaching, synthesizing available evidence...", | |
| iteration=iteration, | |
| ) | |
| # TODO: Inject signal to trigger ReportAgent | |
| break | |
| ``` | |
| ## Implementation Order | |
| 1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change | |
| 2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change | |
| 3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files | |
| 4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals | |
| ## Acceptance Criteria | |
| - [x] Timeout shows ALL progress events, not just timeout message | |
| - [x] Default timeout increased to 600s (10 minutes) | |
| - [x] Timeout configurable via `MAGENTIC_TIMEOUT` env var | |
| - [x] Tests verify chat history preserved on timeout | |
| - [ ] (P2) System synthesizes early when timeout approaches (Future) | |
| **Status: IMPLEMENTED** (commit cb46aac) | |
| ## Files to Modify | |
| 1. `src/app.py` - Fix chat history clearing (lines 205-212) | |
| 2. `src/orchestrator_magentic.py` - Increase default timeout | |
| 3. `src/utils/config.py` - Add `magentic_timeout` setting | |
| 4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator | |
| 5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation | |
| ## Test Plan | |
| ```python | |
| # tests/unit/test_app_timeout.py | |
| @pytest.mark.asyncio | |
| async def test_complete_event_preserves_history(): | |
| """Complete events should append to history, not replace it.""" | |
| from src.app import research_agent | |
| # This requires mocking the orchestrator to emit events then complete | |
| # Verify final output contains ALL events, not just completion message | |
| pass | |
| @pytest.mark.asyncio | |
| async def test_timeout_configurable(): | |
| """Verify MAGENTIC_TIMEOUT env var is respected.""" | |
| import os | |
| os.environ["MAGENTIC_TIMEOUT"] = "120" | |
| from src.utils.config import Settings | |
| settings = Settings() | |
| assert settings.magentic_timeout == 120 | |
| ``` | |
| ## Risk Assessment | |
| | Fix | Risk | Mitigation | | |
| |-----|------|------------| | |
| | Fix 1 | Low | Simple change, well-understood | | |
| | Fix 2 | Low | Just a default value change | | |
| | Fix 3 | Medium | New config, needs validation | | |
| | Fix 4 | High | Requires understanding agent-framework internals | | |
| ## Dependencies | |
| - Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager. | |