File size: 7,250 Bytes
c99c9c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
af7d422
 
 
 
 
 
 
c99c9c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# SPEC 04: Magentic Mode UX Improvements

## Priority: P1 (Demo Quality)

## Problem Statement

Magentic (advanced) mode has several UX issues that degrade the user experience:

1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased
2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows
3. **P1: Timeout not configurable** - Users can't adjust based on their needs
4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches

## Related Issues

- GitHub Issue #68: Magentic mode times out at 300s without completing
- GitHub Issue #65: Demo timing (predecessor, now closed)
- SPEC_01: Demo Termination (implemented the basic timeout)

## Bug Analysis

### Bug 1: Chat History Cleared on Timeout (P0)

**Location**: `src/app.py:205-206`

**Current Code**:
```python
if event.type == "complete":
    yield event.message  # BUG: Discards all accumulated progress!
else:
    event_md = event.to_markdown()
    response_parts.append(event_md)
    yield "\n\n".join(response_parts)
```

**Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did.

**User Sees**:
```
Research timed out. Synthesizing available evidence...
```

**User Should See**:
```
πŸš€ STARTED: Starting research (Magentic mode)...
⏳ THINKING: Multi-agent reasoning in progress...
🧠 JUDGING: Manager (user_task): Research drug repurposing...
🧠 JUDGING: Manager (task_ledger): We are working to address...
🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical...
⏱️ Research timed out. Synthesizing available evidence...
```

**Fix**:
```python
if event.type == "complete":
    response_parts.append(event.message)
    yield "\n\n".join(response_parts)  # Preserves all progress
```

### Bug 2: Timeout Too Short (P1)

**Location**: `src/orchestrator_magentic.py:48`

**Current**: `timeout_seconds: float = 300.0` (5 minutes)

**Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.

**Analysis of Per-Agent Latency**:
| Agent | Typical Latency | Worst Case |
|-------|-----------------|------------|
| SearchAgent | 30-60s | 120s (network issues) |
| HypothesisAgent | 60-90s | 180s (complex reasoning) |
| JudgeAgent | 30-60s | 120s |
| ReportAgent | 60-120s | 240s (long synthesis) |

With `max_rounds=10`: 10 Γ— 4 Γ— 90s = 60 minutes worst case.

### Bug 3: Timeout Not Configurable (P1)

**Problem**: The factory doesn't pass timeout config to MagenticOrchestrator.

**Location**: `src/orchestrator_factory.py:52-55`
```python
return orchestrator_cls(
    max_rounds=config.max_iterations if config else 10,
    api_key=api_key,
    # Missing: timeout_seconds
)
```

## Proposed Solutions

### Fix 1: Preserve Chat History (P0)

```python
# src/app.py - Replace lines 205-212
if event.type == "complete":
    # Preserve accumulated progress + add completion message
    response_parts.append(event.message)
    yield "\n\n".join(response_parts)
else:
    event_md = event.to_markdown()
    response_parts.append(event_md)
    yield "\n\n".join(response_parts)
```

**Test**:
```python
@pytest.mark.asyncio
async def test_timeout_preserves_chat_history(mock_magentic_workflow):
    """Verify timeout doesn't erase progress events."""
    # Mock workflow that yields events then times out
    events = []
    async for event in research_agent("test", [], "advanced", "sk-test"):
        events.append(event)

    # Should contain both progress AND timeout message
    output = events[-1]  # Final yield
    assert "STARTED" in output
    assert "timed out" in output.lower()
```

### Fix 2: Increase Default Timeout (P1)

```python
# src/orchestrator_magentic.py
def __init__(
    self,
    max_rounds: int = 10,
    chat_client: OpenAIChatClient | None = None,
    api_key: str | None = None,
    timeout_seconds: float = 600.0,  # Changed: 10 minutes (was 5)
) -> None:
```

### Fix 3: Make Timeout Configurable via Environment (P1)

```python
# src/utils/config.py
class Settings(BaseSettings):
    # ... existing fields ...
    magentic_timeout: int = Field(
        default=600,
        description="Timeout for Magentic mode in seconds",
    )
```

```python
# src/orchestrator_factory.py
return orchestrator_cls(
    max_rounds=config.max_iterations if config else 10,
    api_key=api_key,
    timeout_seconds=settings.magentic_timeout,  # NEW
)
```

### Fix 4: Graceful Degradation (P2 - Future)

```python
# src/orchestrator_magentic.py - Inside run() loop
elapsed = time.time() - start_time
time_remaining = self._timeout_seconds - elapsed

# If 80% of time elapsed, force synthesis
if time_remaining < self._timeout_seconds * 0.2:
    yield AgentEvent(
        type="synthesizing",
        message="Time limit approaching, synthesizing available evidence...",
        iteration=iteration,
    )
    # TODO: Inject signal to trigger ReportAgent
    break
```

## Implementation Order

1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change
2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change
3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files
4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals

## Acceptance Criteria

- [x] Timeout shows ALL progress events, not just timeout message
- [x] Default timeout increased to 600s (10 minutes)
- [x] Timeout configurable via `MAGENTIC_TIMEOUT` env var
- [x] Tests verify chat history preserved on timeout
- [ ] (P2) System synthesizes early when timeout approaches (Future)

**Status: IMPLEMENTED** (commit cb46aac)

## Files to Modify

1. `src/app.py` - Fix chat history clearing (lines 205-212)
2. `src/orchestrator_magentic.py` - Increase default timeout
3. `src/utils/config.py` - Add `magentic_timeout` setting
4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator
5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation

## Test Plan

```python
# tests/unit/test_app_timeout.py

@pytest.mark.asyncio
async def test_complete_event_preserves_history():
    """Complete events should append to history, not replace it."""
    from src.app import research_agent

    # This requires mocking the orchestrator to emit events then complete
    # Verify final output contains ALL events, not just completion message
    pass


@pytest.mark.asyncio
async def test_timeout_configurable():
    """Verify MAGENTIC_TIMEOUT env var is respected."""
    import os
    os.environ["MAGENTIC_TIMEOUT"] = "120"

    from src.utils.config import Settings
    settings = Settings()
    assert settings.magentic_timeout == 120
```

## Risk Assessment

| Fix | Risk | Mitigation |
|-----|------|------------|
| Fix 1 | Low | Simple change, well-understood |
| Fix 2 | Low | Just a default value change |
| Fix 3 | Medium | New config, needs validation |
| Fix 4 | High | Requires understanding agent-framework internals |

## Dependencies

- Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager.