P2: Advanced Mode Cold Start Has No User Feedback
Priority: P2 (UX Friction)
Component: src/orchestrators/advanced.py
Status: β
FIXED (All Phases Complete)
Issue: #108
Created: 2025-12-01
Summary
When Advanced Mode starts, users experience three significant "dead zones" with no visual feedback:
- Initialization delay (5-15 seconds): Between "STARTED" and "THINKING" events
- First LLM call delay (10-30+ seconds): Between "THINKING" and first "PROGRESS" event
- Agent execution delay (30-90+ seconds): After "PROGRESS" while SearchAgent executes
Users see the UI freeze with no indication of what's happening, leading to confusion about whether the system is working.
Visual Timeline
π STARTED: Starting research (Advanced mode)...
β
β β DEAD ZONE #1: 5-15 seconds of nothing
β - Loading LlamaIndex/ChromaDB
β - Initializing embedding service
β - Building 4 agents + manager
β
β³ THINKING: Multi-agent reasoning in progress...
β
β β DEAD ZONE #2: 10-30+ seconds of nothing
β - Manager agent's first OpenAI API call
β - Cold connection to OpenAI
β
β±οΈ PROGRESS: Manager assigning research task...
β
β β DEAD ZONE #3: 30-90+ seconds of nothing
β - SearchAgent executing PubMed/ClinicalTrials/EuropePMC queries
β - Embedding and storing results in ChromaDB
β - No streaming events during search execution
β
π SEARCH_COMPLETE / PROGRESS: Round 1/5...
Root Cause Analysis
Dead Zone #1: Initialization (Lines 162-165)
yield AgentEvent(type="started", ...) # User sees this
# === BLOCKING OPERATIONS (no events yielded) ===
embedding_service = self._init_embedding_service() # ChromaDB, embeddings
init_magentic_state(query, embedding_service) # Shared state
workflow = self._build_workflow() # 4 agents + manager
yield AgentEvent(type="thinking", ...) # User finally sees this
What's happening:
_init_embedding_service()β Loads LlamaIndex, connects to ChromaDB, initializes OpenAI embeddingsinit_magentic_state()β Creates ResearchMemory, sets up context_build_workflow()β Instantiates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent, Manager
Dead Zone #2: First LLM Call (Line 206)
yield AgentEvent(type="thinking", ...) # User sees this
async for event in workflow.run_stream(task): # BLOCKING until first event
# Manager makes first OpenAI call here
# No events until manager responds and starts delegating
What's happening:
- Microsoft Agent Framework's manager agent receives the task
- Makes synchronous(ish) call to OpenAI for orchestration planning
- Only after response does it emit
MagenticOrchestratorMessageEvent
Dead Zone #3: Agent Execution (After PROGRESS event)
After "Manager assigning research task...", the SearchAgent executes but emits no events until complete:
What's happening:
- SearchAgent receives task from manager
- Executes parallel queries to PubMed, ClinicalTrials.gov, Europe PMC
- Each result is embedded and stored in ChromaDB
- Only after ALL searches complete does it emit
MagenticAgentMessageEvent
Why no streaming:
- The agent's internal tool calls (search APIs, embeddings) don't emit framework events
- Microsoft Agent Framework only emits events at agent message boundaries
- 3 databases Γ multiple queries Γ embedding each result = long silent period
Potential fix: Add progress callbacks to SearchAgent tools:
# In search_agent.py - hypothetical
async def search_pubmed(query: str, on_progress: Callable = None):
results = await pubmed_client.search(query)
if on_progress:
on_progress(f"Found {len(results)} PubMed results")
# ... embed and store
Impact
- User Confusion: "Is it frozen? Should I refresh?"
- Perceived Slowness: Dead time feels longer than active progress
- No Cancel Option: Users can't abort during these zones
- Support Burden: Users report "it's not working" when it's actually initializing
Proposed Solutions
Option A: Granular Initialization Events (Quick Win)
Add progress events during initialization:
yield AgentEvent(type="started", ...)
yield AgentEvent(
type="progress",
message="Loading embedding service...",
iteration=0,
)
embedding_service = self._init_embedding_service()
yield AgentEvent(
type="progress",
message="Initializing research memory...",
iteration=0,
)
init_magentic_state(query, embedding_service)
yield AgentEvent(
type="progress",
message="Building agent team (Search, Judge, Hypothesis, Report)...",
iteration=0,
)
workflow = self._build_workflow()
yield AgentEvent(type="thinking", ...)
Pros: Simple, immediate feedback Cons: Still sequential, doesn't speed up actual time
Option B: Parallel Initialization (Performance + UX)
Use asyncio.gather() for independent operations:
yield AgentEvent(type="progress", message="Initializing agents...", iteration=0)
# These could potentially run in parallel
embedding_task = asyncio.create_task(self._init_embedding_service_async())
workflow_task = asyncio.create_task(self._build_workflow_async())
embedding_service, workflow = await asyncio.gather(embedding_task, workflow_task)
init_magentic_state(query, embedding_service)
Pros: Faster initialization, better UX Cons: Need to verify thread safety, more complex
Option C: Pre-warming / Singleton Services
Initialize expensive services once at app startup, not per-request:
# In app.py startup
global_embedding_service = init_embedding_service()
global_workflow_template = build_workflow_template()
# In orchestrator
workflow = global_workflow_template.clone() # Fast
Pros: Near-instant start after first request Cons: Memory overhead, cold start on first request still slow
Option D: Animated Progress Indicator (UI-Only)
Add a Gradio progress bar or spinner that animates during the dead zones:
# In app.py
with gr.Blocks() as demo:
progress = gr.Progress()
async def research(query):
progress(0.1, desc="Initializing...")
# ...
progress(0.2, desc="Building agents...")
Pros: User sees activity even if nothing to report Cons: Doesn't solve the actual blocking, Gradio-specific
Recommended Approach
Phase 1 (Quick Win): Option A - Add granular events β COMPLETE Phase 2 (Performance): Option C - Pre-warm services at startup β COMPLETE Phase 3 (Polish): Option D - Gradio progress bar β COMPLETE
Related Considerations
Parallel Agent Orchestration
The current Microsoft Agent Framework runs agents sequentially through the manager. True parallel execution would require:
- Breaking out of the framework's
run_stream()pattern - Implementing our own parallel task dispatch
- Managing agent coordination manually
This is a larger architectural change (P1 scope) and should be tracked separately if desired.
Files to Modify
src/orchestrators/advanced.py:155-210- Add initialization events inrun()methodsrc/utils/service_loader.py- Pre-warming logicsrc/app.py- Gradio progress integration
Testing the Issue
import asyncio
import time
from src.orchestrators.advanced import AdvancedOrchestrator
async def test():
orch = AdvancedOrchestrator(max_rounds=3)
start = time.time()
async for event in orch.run("test query"):
elapsed = time.time() - start
print(f"[{elapsed:.1f}s] {event.type}: {event.message[:50]}...")
if event.type == "complete":
break
asyncio.run(test())
Expected output showing the gaps:
[0.0s] started: Starting research (Advanced mode)...
[8.2s] thinking: Multi-agent reasoning in progress... β 8 second gap!
[22.5s] progress: Manager assigning research task... β 14 second gap!
References
- Advanced orchestrator:
src/orchestrators/advanced.py - Embedding service loader:
src/utils/service_loader.py - LlamaIndex RAG:
src/services/llamaindex_rag.py - Microsoft Agent Framework:
agent-framework-core