Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /specs /SPEC_04_MAGENTIC_UX.md

VibecoderMcSwaggins

docs: mark SPEC_03/04/05 as IMPLEMENTED with acceptance criteria

af7d422 5 months ago

preview code

raw

history blame

7.25 kB

	# SPEC 04: Magentic Mode UX Improvements

	## Priority: P1 (Demo Quality)

	## Problem Statement

	Magentic (advanced) mode has several UX issues that degrade the user experience:

	1. P0: Chat history cleared on timeout - When timeout occurs, all progress events are erased
	2. P1: Timeout too short - 300s default insufficient for complex multi-agent workflows
	3. P1: Timeout not configurable - Users can't adjust based on their needs
	4. P2: No graceful degradation - System doesn't synthesize early when timeout approaches

	## Related Issues

	- GitHub Issue #68: Magentic mode times out at 300s without completing
	- GitHub Issue #65: Demo timing (predecessor, now closed)
	- SPEC_01: Demo Termination (implemented the basic timeout)

	## Bug Analysis

	### Bug 1: Chat History Cleared on Timeout (P0)

	Location: `src/app.py:205-206`

	Current Code:
	```python
	if event.type == "complete":
	yield event.message # BUG: Discards all accumulated progress!
	else:
	event_md = event.to_markdown()
	response_parts.append(event_md)
	yield "\n\n".join(response_parts)
	```

	Problem: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did.

	User Sees:
	```
	Research timed out. Synthesizing available evidence...
	```

	User Should See:
	```
	🚀 STARTED: Starting research (Magentic mode)...
	⏳ THINKING: Multi-agent reasoning in progress...
	🧠 JUDGING: Manager (user_task): Research drug repurposing...
	🧠 JUDGING: Manager (task_ledger): We are working to address...
	🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical...
	⏱️ Research timed out. Synthesizing available evidence...
	```

	Fix:
	```python
	if event.type == "complete":
	response_parts.append(event.message)
	yield "\n\n".join(response_parts) # Preserves all progress
	```

	### Bug 2: Timeout Too Short (P1)

	Location: `src/orchestrator_magentic.py:48`

	Current: `timeout_seconds: float = 300.0` (5 minutes)

	Problem: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.

	Analysis of Per-Agent Latency:
	\| Agent \| Typical Latency \| Worst Case \|
	\|-------\|-----------------\|------------\|
	\| SearchAgent \| 30-60s \| 120s (network issues) \|
	\| HypothesisAgent \| 60-90s \| 180s (complex reasoning) \|
	\| JudgeAgent \| 30-60s \| 120s \|
	\| ReportAgent \| 60-120s \| 240s (long synthesis) \|

	With `max_rounds=10`: 10 × 4 × 90s = 60 minutes worst case.

	### Bug 3: Timeout Not Configurable (P1)

	Problem: The factory doesn't pass timeout config to MagenticOrchestrator.

	Location: `src/orchestrator_factory.py:52-55`
	```python
	return orchestrator_cls(
	max_rounds=config.max_iterations if config else 10,
	api_key=api_key,
	# Missing: timeout_seconds
	)
	```

	## Proposed Solutions

	### Fix 1: Preserve Chat History (P0)

	```python
	# src/app.py - Replace lines 205-212
	if event.type == "complete":
	# Preserve accumulated progress + add completion message
	response_parts.append(event.message)
	yield "\n\n".join(response_parts)
	else:
	event_md = event.to_markdown()
	response_parts.append(event_md)
	yield "\n\n".join(response_parts)
	```

	Test:
	```python
	@pytest.mark.asyncio
	async def test_timeout_preserves_chat_history(mock_magentic_workflow):
	"""Verify timeout doesn't erase progress events."""
	# Mock workflow that yields events then times out
	events = []
	async for event in research_agent("test", [], "advanced", "sk-test"):
	events.append(event)

	# Should contain both progress AND timeout message
	output = events[-1] # Final yield
	assert "STARTED" in output
	assert "timed out" in output.lower()
	```

	### Fix 2: Increase Default Timeout (P1)

	```python
	# src/orchestrator_magentic.py
	def __init__(
	self,
	max_rounds: int = 10,
	chat_client: OpenAIChatClient \| None = None,
	api_key: str \| None = None,
	timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5)
	) -> None:
	```

	### Fix 3: Make Timeout Configurable via Environment (P1)

	```python
	# src/utils/config.py
	class Settings(BaseSettings):
	# ... existing fields ...
	magentic_timeout: int = Field(
	default=600,
	description="Timeout for Magentic mode in seconds",
	)
	```

	```python
	# src/orchestrator_factory.py
	return orchestrator_cls(
	max_rounds=config.max_iterations if config else 10,
	api_key=api_key,
	timeout_seconds=settings.magentic_timeout, # NEW
	)
	```

	### Fix 4: Graceful Degradation (P2 - Future)

	```python
	# src/orchestrator_magentic.py - Inside run() loop
	elapsed = time.time() - start_time
	time_remaining = self._timeout_seconds - elapsed

	# If 80% of time elapsed, force synthesis
	if time_remaining < self._timeout_seconds * 0.2:
	yield AgentEvent(
	type="synthesizing",
	message="Time limit approaching, synthesizing available evidence...",
	iteration=iteration,
	)
	# TODO: Inject signal to trigger ReportAgent
	break
	```

	## Implementation Order

	1. Fix 1 (P0): Chat history preservation - 5 minutes, 1 line change
	2. Fix 2 (P1): Increase default timeout - 5 minutes, 1 line change
	3. Fix 3 (P1): Environment config - 15 minutes, 3 files
	4. Fix 4 (P2): Graceful degradation - 1 hour, research agent-framework signals

	## Acceptance Criteria

	- [x] Timeout shows ALL progress events, not just timeout message
	- [x] Default timeout increased to 600s (10 minutes)
	- [x] Timeout configurable via `MAGENTIC_TIMEOUT` env var
	- [x] Tests verify chat history preserved on timeout
	- [ ] (P2) System synthesizes early when timeout approaches (Future)

	Status: IMPLEMENTED (commit cb46aac)

	## Files to Modify

	1. `src/app.py` - Fix chat history clearing (lines 205-212)
	2. `src/orchestrator_magentic.py` - Increase default timeout
	3. `src/utils/config.py` - Add `magentic_timeout` setting
	4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator
	5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation

	## Test Plan

	```python
	# tests/unit/test_app_timeout.py

	@pytest.mark.asyncio
	async def test_complete_event_preserves_history():
	"""Complete events should append to history, not replace it."""
	from src.app import research_agent

	# This requires mocking the orchestrator to emit events then complete
	# Verify final output contains ALL events, not just completion message
	pass


	@pytest.mark.asyncio
	async def test_timeout_configurable():
	"""Verify MAGENTIC_TIMEOUT env var is respected."""
	import os
	os.environ["MAGENTIC_TIMEOUT"] = "120"

	from src.utils.config import Settings
	settings = Settings()
	assert settings.magentic_timeout == 120
	```

	## Risk Assessment

	\| Fix \| Risk \| Mitigation \|
	\|-----\|------\|------------\|
	\| Fix 1 \| Low \| Simple change, well-understood \|
	\| Fix 2 \| Low \| Just a default value change \|
	\| Fix 3 \| Medium \| New config, needs validation \|
	\| Fix 4 \| High \| Requires understanding agent-framework internals \|

	## Dependencies

	- Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager.