File size: 7,250 Bytes
c99c9c2 af7d422 c99c9c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 |
# SPEC 04: Magentic Mode UX Improvements
## Priority: P1 (Demo Quality)
## Problem Statement
Magentic (advanced) mode has several UX issues that degrade the user experience:
1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased
2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows
3. **P1: Timeout not configurable** - Users can't adjust based on their needs
4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches
## Related Issues
- GitHub Issue #68: Magentic mode times out at 300s without completing
- GitHub Issue #65: Demo timing (predecessor, now closed)
- SPEC_01: Demo Termination (implemented the basic timeout)
## Bug Analysis
### Bug 1: Chat History Cleared on Timeout (P0)
**Location**: `src/app.py:205-206`
**Current Code**:
```python
if event.type == "complete":
yield event.message # BUG: Discards all accumulated progress!
else:
event_md = event.to_markdown()
response_parts.append(event_md)
yield "\n\n".join(response_parts)
```
**Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did.
**User Sees**:
```
Research timed out. Synthesizing available evidence...
```
**User Should See**:
```
π STARTED: Starting research (Magentic mode)...
β³ THINKING: Multi-agent reasoning in progress...
π§ JUDGING: Manager (user_task): Research drug repurposing...
π§ JUDGING: Manager (task_ledger): We are working to address...
π§ JUDGING: Manager (instruction): Task: Retrieve human clinical...
β±οΈ Research timed out. Synthesizing available evidence...
```
**Fix**:
```python
if event.type == "complete":
response_parts.append(event.message)
yield "\n\n".join(response_parts) # Preserves all progress
```
### Bug 2: Timeout Too Short (P1)
**Location**: `src/orchestrator_magentic.py:48`
**Current**: `timeout_seconds: float = 300.0` (5 minutes)
**Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.
**Analysis of Per-Agent Latency**:
| Agent | Typical Latency | Worst Case |
|-------|-----------------|------------|
| SearchAgent | 30-60s | 120s (network issues) |
| HypothesisAgent | 60-90s | 180s (complex reasoning) |
| JudgeAgent | 30-60s | 120s |
| ReportAgent | 60-120s | 240s (long synthesis) |
With `max_rounds=10`: 10 Γ 4 Γ 90s = 60 minutes worst case.
### Bug 3: Timeout Not Configurable (P1)
**Problem**: The factory doesn't pass timeout config to MagenticOrchestrator.
**Location**: `src/orchestrator_factory.py:52-55`
```python
return orchestrator_cls(
max_rounds=config.max_iterations if config else 10,
api_key=api_key,
# Missing: timeout_seconds
)
```
## Proposed Solutions
### Fix 1: Preserve Chat History (P0)
```python
# src/app.py - Replace lines 205-212
if event.type == "complete":
# Preserve accumulated progress + add completion message
response_parts.append(event.message)
yield "\n\n".join(response_parts)
else:
event_md = event.to_markdown()
response_parts.append(event_md)
yield "\n\n".join(response_parts)
```
**Test**:
```python
@pytest.mark.asyncio
async def test_timeout_preserves_chat_history(mock_magentic_workflow):
"""Verify timeout doesn't erase progress events."""
# Mock workflow that yields events then times out
events = []
async for event in research_agent("test", [], "advanced", "sk-test"):
events.append(event)
# Should contain both progress AND timeout message
output = events[-1] # Final yield
assert "STARTED" in output
assert "timed out" in output.lower()
```
### Fix 2: Increase Default Timeout (P1)
```python
# src/orchestrator_magentic.py
def __init__(
self,
max_rounds: int = 10,
chat_client: OpenAIChatClient | None = None,
api_key: str | None = None,
timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5)
) -> None:
```
### Fix 3: Make Timeout Configurable via Environment (P1)
```python
# src/utils/config.py
class Settings(BaseSettings):
# ... existing fields ...
magentic_timeout: int = Field(
default=600,
description="Timeout for Magentic mode in seconds",
)
```
```python
# src/orchestrator_factory.py
return orchestrator_cls(
max_rounds=config.max_iterations if config else 10,
api_key=api_key,
timeout_seconds=settings.magentic_timeout, # NEW
)
```
### Fix 4: Graceful Degradation (P2 - Future)
```python
# src/orchestrator_magentic.py - Inside run() loop
elapsed = time.time() - start_time
time_remaining = self._timeout_seconds - elapsed
# If 80% of time elapsed, force synthesis
if time_remaining < self._timeout_seconds * 0.2:
yield AgentEvent(
type="synthesizing",
message="Time limit approaching, synthesizing available evidence...",
iteration=iteration,
)
# TODO: Inject signal to trigger ReportAgent
break
```
## Implementation Order
1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change
2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change
3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files
4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals
## Acceptance Criteria
- [x] Timeout shows ALL progress events, not just timeout message
- [x] Default timeout increased to 600s (10 minutes)
- [x] Timeout configurable via `MAGENTIC_TIMEOUT` env var
- [x] Tests verify chat history preserved on timeout
- [ ] (P2) System synthesizes early when timeout approaches (Future)
**Status: IMPLEMENTED** (commit cb46aac)
## Files to Modify
1. `src/app.py` - Fix chat history clearing (lines 205-212)
2. `src/orchestrator_magentic.py` - Increase default timeout
3. `src/utils/config.py` - Add `magentic_timeout` setting
4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator
5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation
## Test Plan
```python
# tests/unit/test_app_timeout.py
@pytest.mark.asyncio
async def test_complete_event_preserves_history():
"""Complete events should append to history, not replace it."""
from src.app import research_agent
# This requires mocking the orchestrator to emit events then complete
# Verify final output contains ALL events, not just completion message
pass
@pytest.mark.asyncio
async def test_timeout_configurable():
"""Verify MAGENTIC_TIMEOUT env var is respected."""
import os
os.environ["MAGENTIC_TIMEOUT"] = "120"
from src.utils.config import Settings
settings = Settings()
assert settings.magentic_timeout == 120
```
## Risk Assessment
| Fix | Risk | Mitigation |
|-----|------|------------|
| Fix 1 | Low | Simple change, well-understood |
| Fix 2 | Low | Just a default value change |
| Fix 3 | Medium | New config, needs validation |
| Fix 4 | High | Requires understanding agent-framework internals |
## Dependencies
- Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager.
|