DeepBoner / docs /bugs /P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md
VibecoderMcSwaggins's picture
fix(perf): Implement P2 Phases 2 & 3 (Pre-warming + Gradio Progress)
cc5dfc8
|
raw
history blame
8.34 kB

P2: Advanced Mode Cold Start Has No User Feedback

Priority: P2 (UX Friction) Component: src/orchestrators/advanced.py Status: βœ… FIXED (All Phases Complete) Issue: #108 Created: 2025-12-01

Summary

When Advanced Mode starts, users experience three significant "dead zones" with no visual feedback:

  1. Initialization delay (5-15 seconds): Between "STARTED" and "THINKING" events
  2. First LLM call delay (10-30+ seconds): Between "THINKING" and first "PROGRESS" event
  3. Agent execution delay (30-90+ seconds): After "PROGRESS" while SearchAgent executes

Users see the UI freeze with no indication of what's happening, leading to confusion about whether the system is working.

Visual Timeline

πŸš€ STARTED: Starting research (Advanced mode)...
   β”‚
   β”‚  ← DEAD ZONE #1: 5-15 seconds of nothing
   β”‚     - Loading LlamaIndex/ChromaDB
   β”‚     - Initializing embedding service
   β”‚     - Building 4 agents + manager
   β”‚
⏳ THINKING: Multi-agent reasoning in progress...
   β”‚
   β”‚  ← DEAD ZONE #2: 10-30+ seconds of nothing
   β”‚     - Manager agent's first OpenAI API call
   β”‚     - Cold connection to OpenAI
   β”‚
⏱️ PROGRESS: Manager assigning research task...
   β”‚
   β”‚  ← DEAD ZONE #3: 30-90+ seconds of nothing
   β”‚     - SearchAgent executing PubMed/ClinicalTrials/EuropePMC queries
   β”‚     - Embedding and storing results in ChromaDB
   β”‚     - No streaming events during search execution
   β”‚
πŸ“Š SEARCH_COMPLETE / PROGRESS: Round 1/5...

Root Cause Analysis

Dead Zone #1: Initialization (Lines 162-165)

yield AgentEvent(type="started", ...)  # User sees this

# === BLOCKING OPERATIONS (no events yielded) ===
embedding_service = self._init_embedding_service()  # ChromaDB, embeddings
init_magentic_state(query, embedding_service)       # Shared state
workflow = self._build_workflow()                   # 4 agents + manager

yield AgentEvent(type="thinking", ...)  # User finally sees this

What's happening:

  1. _init_embedding_service() β†’ Loads LlamaIndex, connects to ChromaDB, initializes OpenAI embeddings
  2. init_magentic_state() β†’ Creates ResearchMemory, sets up context
  3. _build_workflow() β†’ Instantiates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent, Manager

Dead Zone #2: First LLM Call (Line 206)

yield AgentEvent(type="thinking", ...)  # User sees this

async for event in workflow.run_stream(task):  # BLOCKING until first event
    # Manager makes first OpenAI call here
    # No events until manager responds and starts delegating

What's happening:

  • Microsoft Agent Framework's manager agent receives the task
  • Makes synchronous(ish) call to OpenAI for orchestration planning
  • Only after response does it emit MagenticOrchestratorMessageEvent

Dead Zone #3: Agent Execution (After PROGRESS event)

After "Manager assigning research task...", the SearchAgent executes but emits no events until complete:

What's happening:

  • SearchAgent receives task from manager
  • Executes parallel queries to PubMed, ClinicalTrials.gov, Europe PMC
  • Each result is embedded and stored in ChromaDB
  • Only after ALL searches complete does it emit MagenticAgentMessageEvent

Why no streaming:

  • The agent's internal tool calls (search APIs, embeddings) don't emit framework events
  • Microsoft Agent Framework only emits events at agent message boundaries
  • 3 databases Γ— multiple queries Γ— embedding each result = long silent period

Potential fix: Add progress callbacks to SearchAgent tools:

# In search_agent.py - hypothetical
async def search_pubmed(query: str, on_progress: Callable = None):
    results = await pubmed_client.search(query)
    if on_progress:
        on_progress(f"Found {len(results)} PubMed results")
    # ... embed and store

Impact

  1. User Confusion: "Is it frozen? Should I refresh?"
  2. Perceived Slowness: Dead time feels longer than active progress
  3. No Cancel Option: Users can't abort during these zones
  4. Support Burden: Users report "it's not working" when it's actually initializing

Proposed Solutions

Option A: Granular Initialization Events (Quick Win)

Add progress events during initialization:

yield AgentEvent(type="started", ...)

yield AgentEvent(
    type="progress",
    message="Loading embedding service...",
    iteration=0,
)
embedding_service = self._init_embedding_service()

yield AgentEvent(
    type="progress",
    message="Initializing research memory...",
    iteration=0,
)
init_magentic_state(query, embedding_service)

yield AgentEvent(
    type="progress",
    message="Building agent team (Search, Judge, Hypothesis, Report)...",
    iteration=0,
)
workflow = self._build_workflow()

yield AgentEvent(type="thinking", ...)

Pros: Simple, immediate feedback Cons: Still sequential, doesn't speed up actual time

Option B: Parallel Initialization (Performance + UX)

Use asyncio.gather() for independent operations:

yield AgentEvent(type="progress", message="Initializing agents...", iteration=0)

# These could potentially run in parallel
embedding_task = asyncio.create_task(self._init_embedding_service_async())
workflow_task = asyncio.create_task(self._build_workflow_async())

embedding_service, workflow = await asyncio.gather(embedding_task, workflow_task)
init_magentic_state(query, embedding_service)

Pros: Faster initialization, better UX Cons: Need to verify thread safety, more complex

Option C: Pre-warming / Singleton Services

Initialize expensive services once at app startup, not per-request:

# In app.py startup
global_embedding_service = init_embedding_service()
global_workflow_template = build_workflow_template()

# In orchestrator
workflow = global_workflow_template.clone()  # Fast

Pros: Near-instant start after first request Cons: Memory overhead, cold start on first request still slow

Option D: Animated Progress Indicator (UI-Only)

Add a Gradio progress bar or spinner that animates during the dead zones:

# In app.py
with gr.Blocks() as demo:
    progress = gr.Progress()

    async def research(query):
        progress(0.1, desc="Initializing...")
        # ...
        progress(0.2, desc="Building agents...")

Pros: User sees activity even if nothing to report Cons: Doesn't solve the actual blocking, Gradio-specific

Recommended Approach

Phase 1 (Quick Win): Option A - Add granular events βœ… COMPLETE Phase 2 (Performance): Option C - Pre-warm services at startup βœ… COMPLETE Phase 3 (Polish): Option D - Gradio progress bar βœ… COMPLETE

Related Considerations

Parallel Agent Orchestration

The current Microsoft Agent Framework runs agents sequentially through the manager. True parallel execution would require:

  1. Breaking out of the framework's run_stream() pattern
  2. Implementing our own parallel task dispatch
  3. Managing agent coordination manually

This is a larger architectural change (P1 scope) and should be tracked separately if desired.

Files to Modify

  1. src/orchestrators/advanced.py:155-210 - Add initialization events in run() method
  2. src/utils/service_loader.py - Pre-warming logic
  3. src/app.py - Gradio progress integration

Testing the Issue

import asyncio
import time
from src.orchestrators.advanced import AdvancedOrchestrator

async def test():
    orch = AdvancedOrchestrator(max_rounds=3)
    start = time.time()
    async for event in orch.run("test query"):
        elapsed = time.time() - start
        print(f"[{elapsed:.1f}s] {event.type}: {event.message[:50]}...")
        if event.type == "complete":
            break

asyncio.run(test())

Expected output showing the gaps:

[0.0s] started: Starting research (Advanced mode)...
[8.2s] thinking: Multi-agent reasoning in progress...  ← 8 second gap!
[22.5s] progress: Manager assigning research task...   ← 14 second gap!

References

  • Advanced orchestrator: src/orchestrators/advanced.py
  • Embedding service loader: src/utils/service_loader.py
  • LlamaIndex RAG: src/services/llamaindex_rag.py
  • Microsoft Agent Framework: agent-framework-core