Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on 13 days ago

Commit

cc5dfc8

1 Parent(s): e6163d4

fix(perf): Implement P2 Phases 2 & 3 (Pre-warming + Gradio Progress)

Browse files

Files changed (4) hide show

docs/bugs/ACTIVE_BUGS.md +7 -4
docs/bugs/P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md +4 -4
src/app.py +63 -25
src/utils/service_loader.py +23 -0

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -13,22 +13,25 @@ _No active P0 bugs._
 ## P2 - UX Friction
-### P2 - Advanced Mode Cold Start Has No User Feedback (Phase 1 Complete)
 **File:** `docs/bugs/P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md`
 **Issue:** [#108](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/108)
 **Found:** 2025-12-01 (Gradio Testing)
 **Problem:** Three "dead zones" with no visual feedback during Advanced Mode startup:
 1. **Dead Zone #1** (5-15s): Between STARTED → THINKING ✅ FIXED (granular events)
-2. **Dead Zone #2** (10-30s): Between THINKING → PROGRESS (first LLM call)
-3. **Dead Zone #3** (30-90s): After PROGRESS (SearchAgent executing)
 **Phase 1 Fix (commit dbf888c):**
 - Added granular progress events during initialization
 - Users now see "Loading embedding service...", "Initializing research memory...", "Building agent team..."
 - Significantly improves perceived responsiveness
-**Remaining:** Phase 2 (pre-warm services), Phase 3 (Gradio progress bar)
 ---

 ## P2 - UX Friction
+### P2 - Advanced Mode Cold Start Has No User Feedback (✅ FIXED)
 **File:** `docs/bugs/P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md`
 **Issue:** [#108](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/108)
 **Found:** 2025-12-01 (Gradio Testing)
 **Problem:** Three "dead zones" with no visual feedback during Advanced Mode startup:
 1. **Dead Zone #1** (5-15s): Between STARTED → THINKING ✅ FIXED (granular events)
+2. **Dead Zone #2** (10-30s): Between THINKING → PROGRESS (first LLM call) ✅ FIXED (Progress Bar)
+3. **Dead Zone #3** (30-90s): After PROGRESS (SearchAgent executing) ✅ FIXED (Pre-warming + Progress Bar)
 **Phase 1 Fix (commit dbf888c):**
 - Added granular progress events during initialization
 - Users now see "Loading embedding service...", "Initializing research memory...", "Building agent team..."
 - Significantly improves perceived responsiveness
+**Phase 2/3 Fix (Latest):**
+- Implemented service pre-warming (`service_loader.warmup_services`)
+- Added native Gradio progress bar (`gr.Progress`) to `research_agent`
+- Visual feedback is now continuous throughout the entire lifecycle
 ---

docs/bugs/P2_ADVANCED_MODE_COLD_START_NO_FEEDBACK.md CHANGED Viewed

@@ -2,7 +2,7 @@
 **Priority**: P2 (UX Friction)
 **Component**: `src/orchestrators/advanced.py`
-**Status**: Phase 1 Complete (Granular Init Events)
 **Issue**: [#108](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/108)
 **Created**: 2025-12-01
@@ -199,9 +199,9 @@ with gr.Blocks() as demo:
 ## Recommended Approach
-**Phase 1 (Quick Win)**: Option A - Add granular events ✅ COMPLETE (commit dbf888c)
-**Phase 2 (Performance)**: Option C - Pre-warm services at startup
-**Phase 3 (Polish)**: Option D - Gradio progress bar
 ## Related Considerations

 **Priority**: P2 (UX Friction)
 **Component**: `src/orchestrators/advanced.py`
+**Status**: ✅ FIXED (All Phases Complete)
 **Issue**: [#108](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/108)
 **Created**: 2025-12-01
 ## Recommended Approach
+**Phase 1 (Quick Win)**: Option A - Add granular events ✅ COMPLETE
+**Phase 2 (Performance)**: Option C - Pre-warm services at startup ✅ COMPLETE
+**Phase 3 (Polish)**: Option D - Gradio progress bar ✅ COMPLETE
 ## Related Considerations

src/app.py CHANGED Viewed

@@ -21,6 +21,7 @@ from src.tools.search_handler import SearchHandler
 from src.utils.config import settings
 from src.utils.exceptions import ConfigurationError
 from src.utils.models import OrchestratorConfig
 OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
@@ -137,6 +138,38 @@ def configure_orchestrator(
     return orchestrator, backend_info
 async def research_agent(
     message: str,
     history: list[dict[str, Any]],
@@ -144,6 +177,7 @@ async def research_agent(
     domain: str = "sexual_health",
     api_key: str = "",
     api_key_state: str = "",
 ) -> AsyncGenerator[str, None]:
     """
     Gradio chat function that runs the research agent.
@@ -155,6 +189,7 @@ async def research_agent(
         domain: Research domain
         api_key: Optional user-provided API key (BYOK - auto-detects provider)
         api_key_state: Persistent API key state (survives example clicks)
     Yields:
         Markdown-formatted responses for streaming
@@ -164,38 +199,19 @@ async def research_agent(
         return
     # BUG FIX: Handle None values from Gradio example caching
-    # Gradio passes None for missing example columns, overriding defaults
-    api_key_str = api_key or ""
-    api_key_state_str = api_key_state or ""
     domain_str = domain or "sexual_health"
-    # Validate and cast mode to proper type
-    valid_modes: set[str] = {"simple", "magentic", "advanced", "hierarchical"}
-    mode_validated: OrchestratorMode = mode if mode in valid_modes else "simple"  # type: ignore[assignment]
-    # BUG FIX: Prefer freshly-entered key, then persisted state
-    user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
-    # Check available keys
-    has_openai = settings.has_openai_key
-    has_anthropic = settings.has_anthropic_key
-    # Check for OpenAI user key
-    is_openai_user_key = (
-        user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")
-    )
-    has_paid_key = has_openai or has_anthropic or bool(user_api_key)
-    # Advanced mode requires OpenAI specifically (due to agent-framework binding)
-    if mode_validated == "advanced" and not (has_openai or is_openai_user_key):
         yield (
             "⚠️ **Warning**: Advanced mode currently requires OpenAI API key. "
             "Anthropic keys only work in Simple mode. Falling back to Simple.\n\n"
         )
-        mode_validated = "simple"
-    # Inform user about fallback if no keys
     if not has_paid_key:
-        # No paid keys - will use FREE HuggingFace Inference
         yield (
             "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
             "For premium models, enter an OpenAI or Anthropic API key below.\n\n"
@@ -207,9 +223,8 @@ async def research_agent(
     try:
         # use_mock=False - let configure_orchestrator decide based on available keys
-        # It will use: Paid API > HF Inference (free tier)
         orchestrator, backend_name = configure_orchestrator(
-            use_mock=False,  # Never use mock in production - HF Inference is the free fallback
             mode=mode_validated,
             user_api_key=user_api_key,
             domain=domain_str,
@@ -224,6 +239,28 @@ async def research_agent(
         )
         async for event in orchestrator.run(message):
             # BUG FIX: Handle streaming events separately to avoid token-by-token spam
             if event.type == "streaming":
                 # Accumulate streaming tokens without emitting individual events
@@ -349,6 +386,7 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
 def main() -> None:
     """Run the Gradio app with MCP server enabled."""
     demo, _ = create_demo()
     demo.launch(
         server_name=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),  # nosec B104

 from src.utils.config import settings
 from src.utils.exceptions import ConfigurationError
 from src.utils.models import OrchestratorConfig
+from src.utils.service_loader import warmup_services
 OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
     return orchestrator, backend_info
+def _validate_inputs(
+    mode: str,
+    api_key: str | None,
+    api_key_state: str | None,
+) -> tuple[OrchestratorMode, str | None, bool]:
+    """Validate inputs and determine mode/key status.
+    Returns:
+        Tuple of (validated_mode, effective_user_key, has_paid_key)
+    """
+    # Validate mode
+    valid_modes: set[str] = {"simple", "magentic", "advanced", "hierarchical"}
+    mode_validated: OrchestratorMode = mode if mode in valid_modes else "simple"  # type: ignore[assignment]
+    # Determine effective key
+    user_api_key = (api_key or api_key_state or "").strip() or None
+    # Check available keys
+    has_openai = settings.has_openai_key
+    has_anthropic = settings.has_anthropic_key
+    is_openai_user_key = (
+        user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")
+    )
+    has_paid_key = has_openai or has_anthropic or bool(user_api_key)
+    # Fallback logic for Advanced mode
+    if mode_validated == "advanced" and not (has_openai or is_openai_user_key):
+        mode_validated = "simple"
+    return mode_validated, user_api_key, has_paid_key
 async def research_agent(
     message: str,
     history: list[dict[str, Any]],
     domain: str = "sexual_health",
     api_key: str = "",
     api_key_state: str = "",
+    progress: gr.Progress = gr.Progress(),  # noqa: B008
 ) -> AsyncGenerator[str, None]:
     """
     Gradio chat function that runs the research agent.
         domain: Research domain
         api_key: Optional user-provided API key (BYOK - auto-detects provider)
         api_key_state: Persistent API key state (survives example clicks)
+        progress: Gradio progress tracker
     Yields:
         Markdown-formatted responses for streaming
         return
     # BUG FIX: Handle None values from Gradio example caching
     domain_str = domain or "sexual_health"
+    # Validate inputs using helper to reduce complexity
+    mode_validated, user_api_key, has_paid_key = _validate_inputs(mode, api_key, api_key_state)
+    # Inform user about fallback/tier status
+    if mode == "advanced" and mode_validated == "simple":
         yield (
             "⚠️ **Warning**: Advanced mode currently requires OpenAI API key. "
             "Anthropic keys only work in Simple mode. Falling back to Simple.\n\n"
         )
     if not has_paid_key:
         yield (
             "🤗 **Free Tier**: Using HuggingFace Inference (Llama 3.1 / Mistral) for AI analysis.\n"
             "For premium models, enter an OpenAI or Anthropic API key below.\n\n"
     try:
         # use_mock=False - let configure_orchestrator decide based on available keys
         orchestrator, backend_name = configure_orchestrator(
+            use_mock=False,
             mode=mode_validated,
             user_api_key=user_api_key,
             domain=domain_str,
         )
         async for event in orchestrator.run(message):
+            # Update progress bar
+            if event.type == "started":
+                progress(0, desc="Starting research...")
+            elif event.type == "thinking":
+                progress(0.1, desc="Multi-agent reasoning...")
+            elif event.type == "progress":
+                # Try to calculate percentage based on max rounds/iterations
+                p = None
+                max_iters = 10  # default
+                if hasattr(orchestrator, "_max_rounds"):
+                    max_iters = orchestrator._max_rounds
+                elif hasattr(orchestrator, "config") and hasattr(
+                    orchestrator.config, "max_iterations"
+                ):
+                    max_iters = orchestrator.config.max_iterations
+                if event.iteration:
+                    # Map 0..max to 0.2..0.9
+                    p = 0.2 + (0.7 * (min(event.iteration, max_iters) / max_iters))
+                progress(p, desc=event.message)
             # BUG FIX: Handle streaming events separately to avoid token-by-token spam
             if event.type == "streaming":
                 # Accumulate streaming tokens without emitting individual events
 def main() -> None:
     """Run the Gradio app with MCP server enabled."""
+    warmup_services()  # Phase 2: Pre-warm services
     demo, _ = create_demo()
     demo.launch(
         server_name=os.getenv("GRADIO_SERVER_NAME", "0.0.0.0"),  # nosec B104

src/utils/service_loader.py CHANGED Viewed

@@ -9,6 +9,7 @@ Design Patterns:
 - Strategy Pattern: Selects between EmbeddingService and LlamaIndexRAGService
 """
 from typing import TYPE_CHECKING
 import structlog
@@ -22,6 +23,28 @@ if TYPE_CHECKING:
 logger = structlog.get_logger()
 def get_embedding_service() -> "EmbeddingServiceProtocol":
     """Get the best available embedding service.

 - Strategy Pattern: Selects between EmbeddingService and LlamaIndexRAGService
 """
+import threading
 from typing import TYPE_CHECKING
 import structlog
 logger = structlog.get_logger()
+def warmup_services() -> None:
+    """Pre-warm expensive services in a background thread.
+    This reduces the "cold start" latency for the first user request by
+    loading heavy models (like SentenceTransformer or LlamaIndex) into memory
+    during application startup.
+    """
+    def _warmup() -> None:
+        logger.info("🔥 Warmup: Starting background service initialization...")
+        try:
+            # Trigger model loading (cached globally)
+            get_embedding_service_if_available()
+            logger.info("🔥 Warmup: Embedding service ready")
+        except Exception as e:
+            logger.warning("🔥 Warmup: Failed to warm up services", error=str(e))
+    # Run in daemon thread so it doesn't block shutdown
+    thread = threading.Thread(target=_warmup, daemon=True)
+    thread.start()
 def get_embedding_service() -> "EmbeddingServiceProtocol":
     """Get the best available embedding service.