Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /bugs /P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md

VibecoderMcSwaggins

fix(perf): Implement P2 Phases 2 & 3 (Pre-warming + Gradio Progress)

cc5dfc8 14 days ago

preview code

raw

history blame

8.34 kB

P2: Advanced Mode Cold Start Has No User Feedback

Priority: P2 (UX Friction) Component: src/orchestrators/advanced.py Status: ✅ FIXED (All Phases Complete) Issue: #108 Created: 2025-12-01

Summary

When Advanced Mode starts, users experience three significant "dead zones" with no visual feedback:

Initialization delay (5-15 seconds): Between "STARTED" and "THINKING" events
First LLM call delay (10-30+ seconds): Between "THINKING" and first "PROGRESS" event
Agent execution delay (30-90+ seconds): After "PROGRESS" while SearchAgent executes

Users see the UI freeze with no indication of what's happening, leading to confusion about whether the system is working.

Visual Timeline

🚀 STARTED: Starting research (Advanced mode)...
   │
   │  ← DEAD ZONE #1: 5-15 seconds of nothing
   │     - Loading LlamaIndex/ChromaDB
   │     - Initializing embedding service
   │     - Building 4 agents + manager
   │
⏳ THINKING: Multi-agent reasoning in progress...
   │
   │  ← DEAD ZONE #2: 10-30+ seconds of nothing
   │     - Manager agent's first OpenAI API call
   │     - Cold connection to OpenAI
   │
⏱️ PROGRESS: Manager assigning research task...
   │
   │  ← DEAD ZONE #3: 30-90+ seconds of nothing
   │     - SearchAgent executing PubMed/ClinicalTrials/EuropePMC queries
   │     - Embedding and storing results in ChromaDB
   │     - No streaming events during search execution
   │
📊 SEARCH_COMPLETE / PROGRESS: Round 1/5...

Root Cause Analysis

Dead Zone #1: Initialization (Lines 162-165)

yield AgentEvent(type="started", ...)  # User sees this

# === BLOCKING OPERATIONS (no events yielded) ===
embedding_service = self._init_embedding_service()  # ChromaDB, embeddings
init_magentic_state(query, embedding_service)       # Shared state
workflow = self._build_workflow()                   # 4 agents + manager

yield AgentEvent(type="thinking", ...)  # User finally sees this

What's happening:

_init_embedding_service() → Loads LlamaIndex, connects to ChromaDB, initializes OpenAI embeddings
init_magentic_state() → Creates ResearchMemory, sets up context
_build_workflow() → Instantiates SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent, Manager

Dead Zone #2: First LLM Call (Line 206)

yield AgentEvent(type="thinking", ...)  # User sees this

async for event in workflow.run_stream(task):  # BLOCKING until first event
    # Manager makes first OpenAI call here
    # No events until manager responds and starts delegating

What's happening:

Microsoft Agent Framework's manager agent receives the task
Makes synchronous(ish) call to OpenAI for orchestration planning
Only after response does it emit MagenticOrchestratorMessageEvent

Dead Zone #3: Agent Execution (After PROGRESS event)

After "Manager assigning research task...", the SearchAgent executes but emits no events until complete:

What's happening:

SearchAgent receives task from manager
Executes parallel queries to PubMed, ClinicalTrials.gov, Europe PMC
Each result is embedded and stored in ChromaDB
Only after ALL searches complete does it emit MagenticAgentMessageEvent

Why no streaming:

The agent's internal tool calls (search APIs, embeddings) don't emit framework events
Microsoft Agent Framework only emits events at agent message boundaries
3 databases × multiple queries × embedding each result = long silent period

Potential fix: Add progress callbacks to SearchAgent tools:

# In search_agent.py - hypothetical
async def search_pubmed(query: str, on_progress: Callable = None):
    results = await pubmed_client.search(query)
    if on_progress:
        on_progress(f"Found {len(results)} PubMed results")
    # ... embed and store

Impact

User Confusion: "Is it frozen? Should I refresh?"
Perceived Slowness: Dead time feels longer than active progress
No Cancel Option: Users can't abort during these zones
Support Burden: Users report "it's not working" when it's actually initializing

Proposed Solutions

Option A: Granular Initialization Events (Quick Win)

Add progress events during initialization:

yield AgentEvent(type="started", ...)

yield AgentEvent(
    type="progress",
    message="Loading embedding service...",
    iteration=0,
)
embedding_service = self._init_embedding_service()

yield AgentEvent(
    type="progress",
    message="Initializing research memory...",
    iteration=0,
)
init_magentic_state(query, embedding_service)

yield AgentEvent(
    type="progress",
    message="Building agent team (Search, Judge, Hypothesis, Report)...",
    iteration=0,
)
workflow = self._build_workflow()

yield AgentEvent(type="thinking", ...)

Pros: Simple, immediate feedback Cons: Still sequential, doesn't speed up actual time

Option B: Parallel Initialization (Performance + UX)

Use asyncio.gather() for independent operations:

yield AgentEvent(type="progress", message="Initializing agents...", iteration=0)

# These could potentially run in parallel
embedding_task = asyncio.create_task(self._init_embedding_service_async())
workflow_task = asyncio.create_task(self._build_workflow_async())

embedding_service, workflow = await asyncio.gather(embedding_task, workflow_task)
init_magentic_state(query, embedding_service)

Pros: Faster initialization, better UX Cons: Need to verify thread safety, more complex

Option C: Pre-warming / Singleton Services

Initialize expensive services once at app startup, not per-request:

# In app.py startup
global_embedding_service = init_embedding_service()
global_workflow_template = build_workflow_template()

# In orchestrator
workflow = global_workflow_template.clone()  # Fast

Pros: Near-instant start after first request Cons: Memory overhead, cold start on first request still slow

Option D: Animated Progress Indicator (UI-Only)

Add a Gradio progress bar or spinner that animates during the dead zones:

# In app.py
with gr.Blocks() as demo:
    progress = gr.Progress()

    async def research(query):
        progress(0.1, desc="Initializing...")
        # ...
        progress(0.2, desc="Building agents...")

Pros: User sees activity even if nothing to report Cons: Doesn't solve the actual blocking, Gradio-specific

Recommended Approach

Phase 1 (Quick Win): Option A - Add granular events ✅ COMPLETE Phase 2 (Performance): Option C - Pre-warm services at startup ✅ COMPLETE Phase 3 (Polish): Option D - Gradio progress bar ✅ COMPLETE

Related Considerations

Parallel Agent Orchestration

The current Microsoft Agent Framework runs agents sequentially through the manager. True parallel execution would require:

Breaking out of the framework's run_stream() pattern
Implementing our own parallel task dispatch
Managing agent coordination manually

This is a larger architectural change (P1 scope) and should be tracked separately if desired.

Files to Modify

src/orchestrators/advanced.py:155-210 - Add initialization events in run() method
src/utils/service_loader.py - Pre-warming logic
src/app.py - Gradio progress integration

Testing the Issue

import asyncio
import time
from src.orchestrators.advanced import AdvancedOrchestrator

async def test():
    orch = AdvancedOrchestrator(max_rounds=3)
    start = time.time()
    async for event in orch.run("test query"):
        elapsed = time.time() - start
        print(f"[{elapsed:.1f}s] {event.type}: {event.message[:50]}...")
        if event.type == "complete":
            break

asyncio.run(test())

Expected output showing the gaps:

[0.0s] started: Starting research (Advanced mode)...
[8.2s] thinking: Multi-agent reasoning in progress...  ← 8 second gap!
[22.5s] progress: Manager assigning research task...   ← 14 second gap!

References

Advanced orchestrator: src/orchestrators/advanced.py
Embedding service loader: src/utils/service_loader.py
LlamaIndex RAG: src/services/llamaindex_rag.py
Microsoft Agent Framework: agent-framework-core