File size: 3,851 Bytes
0257d2f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# SPEC 01: Demo Termination & Timing Fix
## Priority: P0 (Hackathon Blocker)
## Problem Statement
Advanced (Magentic) mode runs indefinitely from user perspective. The demo was manually terminated after ~10 minutes without reaching synthesis.
**Root Cause Hypothesis**: We're trusting `agent_framework.MagenticBuilder.max_round_count` to enforce termination, but:
1. We don't know how the framework counts "rounds"
2. Our `iteration` counter only tracks `MagenticAgentMessageEvent`, not all framework rounds
3. Manager coordination messages (JUDGING) happen between rounds and don't count
## Investigation Required
### Question 1: Does max_round_count actually work?
```python
# Current code (src/orchestrator_magentic.py:111)
.with_standard_manager(
chat_client=manager_client,
max_round_count=self._max_rounds, # Default: 10
max_stall_count=3,
max_reset_count=2,
)
```
**Test**: Set `max_round_count=2` and verify termination.
### Question 2: What counts as a "round"?
From demo output:
- `JUDGING` (Manager) - many of these
- `SEARCH_COMPLETE` (Agent)
- `HYPOTHESIZING` (Agent)
- `JUDGE_COMPLETE` (Agent)
- `STREAMING` (Delta events)
Is one "round" = one full cycle of all agents? Or one agent message?
### Question 3: Why no final synthesis?
The demo showed lots of evidence gathering but never reached `ReportAgent`. Either:
1. JudgeAgent never said "sufficient=True"
2. Framework terminated before synthesis (unlikely given time)
3. Something else broke the flow
## Proposed Solutions
### Option A: Add Hard Timeout (Recommended for Hackathon)
```python
# src/orchestrator_magentic.py
import asyncio
async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
# ...existing setup...
DEMO_TIMEOUT_SECONDS = 300 # 5 minutes max
try:
async with asyncio.timeout(DEMO_TIMEOUT_SECONDS):
async for event in workflow.run_stream(task):
# ...existing processing...
except TimeoutError:
yield AgentEvent(
type="complete",
message="Research timed out. Synthesizing available evidence...",
data={"reason": "timeout", "iterations": iteration},
iteration=iteration,
)
# Attempt to synthesize whatever we have
```
### Option B: Reduce max_rounds AND Add Progress
```python
# Lower the round count AND show which round we're on
max_round_count=5, # Was 10
```
Plus yield round number:
```python
yield AgentEvent(
type="progress",
message=f"Round {round_num}/{max_rounds}...",
iteration=round_num,
)
```
### Option C: Force Synthesis After N Evidence Items
```python
# In judge logic
if len(evidence) >= 20:
return "synthesize" # We have enough, stop searching
```
## Acceptance Criteria
- [x] Demo completes in <5 minutes with visible progress
- [x] User sees round count (e.g., "Round 3/5")
- [x] Always produces SOME output (even if partial)
- [x] Timeout prevents infinite running
**Status: IMPLEMENTED** (commit b1d094d)
## Test Plan
```python
@pytest.mark.asyncio
async def test_magentic_terminates_within_timeout():
"""Verify demo completes in reasonable time."""
orchestrator = MagenticOrchestrator(max_rounds=3)
events = []
start = time.time()
async for event in orchestrator.run("simple test query"):
events.append(event)
if time.time() - start > 120: # 2 min max for test
pytest.fail("Orchestrator did not terminate")
# Must have a completion event
assert any(e.type == "complete" for e in events)
```
## Related Issues
- #65: P1: Advanced Mode takes too long for hackathon demo
- #47: E2E Testing
## Files to Modify
1. `src/orchestrator_magentic.py` - Add timeout and progress
2. `src/app.py` - Display round progress in UI
3. `tests/unit/test_magentic_termination.py` - Add timeout test
|