Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on Nov 29, 2025

Commit

6b5e05b

unverified ·

1 Parent(s): 8f45b69

fix: P0/P1/P2 - Gradio crash, UX improvements, thinking state (#61)

Browse files

* fix: handle None parameters from Gradio example caching (P0)

Root cause: Gradio passes None for missing example columns during
startup caching, overriding Python default values. Line 131 called
.strip() on None, crashing the HuggingFace Space.

Fix: Add defensive None handling before .strip():
api_key_str = api_key or ""
api_key_state_str = api_key_state or ""

Added tests to prevent regression.

* fix: multiple UX improvements (P1 bugs)

1. HSDD acronym spelled out (Hypoactive Sexual Desire Disorder)
2. Added loading indicator ("Processing...") for immediate feedback
3. Removed temperature settings from magentic agents for reasoning
model compatibility (o3, o1 only support temperature=1)
4. Bug report documenting remaining issues (API key persistence)

140 tests passing.

* fix: enhance UX with "thinking" state and API key persistence

1. Added a "thinking" state yield before blocking calls in Magentic orchestrator to improve user feedback during long processing times.
2. Updated Gradio examples to include explicit None values for API key inputs, ensuring persistence across example clicks.
3. Set temperature explicitly to 1.0 for compatibility with reasoning models in Magentic agents.

All tests passing.

Files changed (8) hide show

docs/bugs/P0_GRADIO_EXAMPLE_CACHING_CRASH.md +134 -0
docs/bugs/P1_MULTIPLE_UX_BUGS.md +49 -0
docs/bugs/P2_MAGENTIC_THINKING_STATE.md +232 -0
src/agents/magentic_agents.py +4 -4
src/app.py +21 -5
src/orchestrator_magentic.py +11 -0
src/utils/models.py +2 -0
tests/unit/test_gradio_crash.py +86 -0

docs/bugs/P0_GRADIO_EXAMPLE_CACHING_CRASH.md ADDED Viewed

	@@ -0,0 +1,134 @@

+# P0 Bug Report: Gradio Example Caching Crash
+## Status
+- **Date:** 2025-11-29
+- **Priority:** P0 CRITICAL (Production Down)
+- **Component:** `src/app.py:131`
+- **Environment:** HuggingFace Spaces (Python 3.11, Gradio)
+## Error Message
+```text
+AttributeError: 'NoneType' object has no attribute 'strip'
+```
+## Full Stack Trace
+```text
+File "/app/src/app.py", line 131, in research_agent
+    user_api_key = (api_key.strip() or api_key_state.strip()) or None
+                    ^^^^^^^^^^^^^
+AttributeError: 'NoneType' object has no attribute 'strip'
+```
+## Root Cause Analysis
+### The Trigger
+Gradio's example caching mechanism runs the `research_agent` function during startup to pre-cache example outputs. This happens at:
+```text
+File "/usr/local/lib/python3.11/site-packages/gradio/helpers.py", line 509, in _start_caching
+    await self.cache()
+```
+### The Problem
+Our examples only provide values for 2 of the 4 function parameters:
+```python
+examples=[
+    ["What is the evidence for testosterone therapy in women with HSDD?", "simple"],
+    ["Promising drug candidates for endometriosis pain management", "simple"],
+]
+```
+These map to `[message, mode]` but **NOT** to `api_key` or `api_key_state`.
+When Gradio runs the function for caching, it passes `None` for the unprovided parameters:
+```python
+async def research_agent(
+    message: str,           # ✅ Provided by example
+    history: list[...],     # ✅ Empty list default
+    mode: str = "simple",   # ✅ Provided by example
+    api_key: str = "",      # ❌ Becomes None during caching!
+    api_key_state: str = "" # ❌ Becomes None during caching!
+) -> AsyncGenerator[...]:
+```
+### The Crash
+Line 131 attempts to call `.strip()` on `None`:
+```python
+user_api_key = (api_key.strip() or api_key_state.strip()) or None
+#               ^^^^^^^^^^^^^
+#               NoneType has no attribute 'strip'
+```
+## Gradio Warning (Ignored)
+Gradio actually warned us about this:
+```text
+UserWarning: Examples will be cached but not all input components have
+example values. This may result in an exception being thrown by your function.
+```
+## Solution
+### Option A: Defensive None Handling (Recommended)
+Add None guards before calling `.strip()`:
+```python
+# Handle None values from Gradio example caching
+api_key_str = api_key or ""
+api_key_state_str = api_key_state or ""
+user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
+```
+### Option B: Disable Example Caching
+Set `cache_examples=False` in ChatInterface:
+```python
+gr.ChatInterface(
+    fn=research_agent,
+    examples=[...],
+    cache_examples=False,  # Disable caching
+)
+```
+This avoids the crash but loses the UX benefit of pre-cached examples.
+### Option C: Provide Full Example Values
+Include all 4 columns in examples:
+```python
+examples=[
+    ["What is the evidence...", "simple", "", ""],  # [msg, mode, api_key, state]
+]
+```
+This is verbose and exposes internal state to users.
+## Recommendation
+**Option A** is the cleanest fix. It:
+1. Maintains cached examples for fast UX
+2. Handles edge cases defensively
+3. Doesn't expose internal state in examples
+## Pre-Merge Checklist
+- [ ] Fix applied to `src/app.py`
+- [ ] Unit test added for None parameter handling
+- [ ] `make check` passes
+- [ ] Test locally with `uv run python -m src.app`
+- [ ] Verify example caching works without crash
+- [ ] Deploy to HuggingFace Spaces
+- [ ] Verify Space starts without error
+## Lessons Learned
+1. Always test Gradio apps with example caching enabled locally before deploying
+2. Gradio's "partial examples" feature passes `None` for missing columns
+3. Default parameter values (`str = ""`) are ignored when Gradio explicitly passes `None`
+4. The Gradio warning about missing example values should be treated as an error

docs/bugs/P1_MULTIPLE_UX_BUGS.md ADDED Viewed

	@@ -0,0 +1,49 @@

+# P1 Bug Report: Multiple UX and Configuration Issues
+## Status
+- **Date:** 2025-11-29
+- **Priority:** P1 (Multiple user-facing issues)
+- **Components:** `src/app.py`, `src/orchestrator_magentic.py`
+## Resolved Issues (Fixed 2025-11-29)
+### Bug 1: API Key Cleared When Clicking Examples
+**Fixed.** Updated `examples` in `app.py` to include explicit `None` values for additional inputs. Gradio preserves values when the example value is `None`.
+### Bug 2: No Loading/Processing Indicator
+**Fixed.** `research_agent` yields an immediate "⏳ Processing..." message before starting the orchestrator.
+### Bug 3: Advanced Mode Temperature Error
+**Fixed.** Explicitly set `temperature=1.0` for all Magentic agents in `src/agents/magentic_agents.py`. This is compatible with OpenAI reasoning models (o1/o3) which require `temperature=1` and were rejecting the default (likely 0.3 or None).
+### Bug 4: HSDD Acronym Not Spelled Out
+**Fixed.** Updated example text in `app.py` to "HSDD (Hypoactive Sexual Desire Disorder)".
+---
+## Open / Deferred Issues
+### Bug 5: Free Tier Quota Exhausted (UX Improvement)
+**Deferred.** Currently shows standard error message. Improve if users report confusion.
+### Bug 6: Asyncio File Descriptor Warnings
+**Won't Fix.** Cosmetic issue only.
+---
+## Priority Order (Completed)
+1. **Bug 4 (HSDD)** - Fixed
+2. **Bug 2 (Loading indicator)** - Fixed
+3. **Bug 3 (Temperature)** - Fixed
+4. **Bug 1 (API key)** - Fixed
+---
+## Test Plan
+- [x] Fix HSDD acronym
+- [x] Add loading indicator yield
+- [x] Test advanced mode with temperature fix (Static analysis/Code change)
+- [x] Research Gradio example behavior for API key (Implemented None fix)
+- [ ] Run `make check`
+- [ ] Deploy and test on HuggingFace Spaces

docs/bugs/P2_MAGENTIC_THINKING_STATE.md ADDED Viewed

	@@ -0,0 +1,232 @@

+# P2 Bug Report: Advanced Mode Missing "Thinking" State
+## Status
+- **Date:** 2025-11-29
+- **Priority:** P2 (UX polish, not blocking functionality)
+- **Component:** `src/orchestrator_magentic.py`, `src/app.py`
+---
+## Symptoms
+User experience in **Advanced (Magentic) mode**:
+1. Click example or submit query
+2. See: `🚀 **STARTED**: Starting research (Magentic mode)...`
+3. **2+ minutes of nothing** (no spinner, no progress, no indication work is happening)
+4. Eventually see: `🧠 **JUDGING**: Manager (user_task)...`
+**User perception:** "Is it frozen? Did it crash?"
+### Container Logs Confirm Work IS Happening
+```
+14:54:22 [info] Starting Magentic orchestrator query='...'
+14:54:22 [info] Embedding service enabled
+... 2+ MINUTES OF SILENCE (agent-framework doing internal LLM calls) ...
+14:56:38 [info] Creating orchestrator mode=advanced
+```
+The silence is because `workflow.run_stream()` doesn't yield events during its setup phase.
+---
+## Root Cause Analysis
+### Current Flow (`src/orchestrator_magentic.py`)
+```python
+async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
+    # 1. Immediately yields "started"
+    yield AgentEvent(type="started", message=f"Starting research (Magentic mode): {query}")
+    # 2. Setup (fast, no yield needed)
+    embedding_service = self._init_embedding_service()
+    init_magentic_state(embedding_service)
+    workflow = self._build_workflow()
+    # 3. GAP: workflow.run_stream() blocks for 2+ minutes before first event
+    async for event in workflow.run_stream(task):  # <-- THE BOTTLENECK
+        yield self._process_event(event)
+```
+The `agent-framework`'s `workflow.run_stream()` is calling OpenAI's API, building the manager prompt, coordinating agents, etc. **It doesn't yield events during this setup phase**.
+---
+## Gold Standard UX (What We'd Want)
+### Gradio's Native Thinking Support
+Per [Gradio Chatbot Docs](https://www.gradio.app/docs/gradio/chatbot):
+> "The Gradio Chatbot can natively display intermediate thoughts and tool usage in a collapsible accordion next to a chat message. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) or reasoning demos."
+**Features available:**
+- `gr.ChatMessage` with `metadata={"status": "pending"}` shows spinner
+- `metadata={"title": "Thinking...", "status": "pending"}` creates collapsible accordion
+- Nested thoughts via `id` and `parent_id`
+- `duration` metadata shows time spent
+**Example from Gradio docs:**
+```python
+import gradio as gr
+def chat_fn(message, history):
+    # Yield thinking state with spinner
+    yield gr.ChatMessage(
+        role="assistant",
+        metadata={"title": "🧠 Thinking...", "status": "pending"}
+    )
+    # Do work...
+    # Update with completed thought
+    yield gr.ChatMessage(
+        role="assistant",
+        content="Analysis complete",
+        metadata={"title": "🧠 Thinking...", "status": "done", "duration": 5.2}
+    )
+    yield "Here's the final answer..."
+```
+---
+## Why This is Complex for DeepBoner
+### Constraint 1: ChatInterface Returns Strings
+Our `research_agent()` yields plain strings:
+```python
+yield "🧠 **Backend**: {backend_name}\n\n"
+yield "⏳ **Processing...** Searching PubMed...\n"
+yield "\n\n".join(response_parts)
+```
+Converting to `gr.ChatMessage` objects would require refactoring the entire response pipeline.
+### Constraint 2: Agent-Framework is the Bottleneck
+The 2-minute gap is inside `workflow.run_stream(task)`, which is the `agent-framework` library. We can't inject yields into a third-party library's blocking call.
+### Constraint 3: ChatInterface vs Blocks
+`gr.ChatInterface` is a convenience wrapper. The full `gr.ChatMessage` metadata features work best with raw `gr.Blocks` + `gr.Chatbot` components.
+---
+## Options
+### Option A: Yield "Thinking" Before Blocking Call (Recommended)
+**Effort:** 5 minutes
+**Impact:** Users see *something* while waiting
+```python
+# In src/orchestrator_magentic.py
+async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
+    yield AgentEvent(type="started", message=f"Starting research (Magentic mode): {query}")
+    # NEW: Yield thinking state before the blocking call
+    yield AgentEvent(
+        type="thinking",  # New event type
+        message="🧠 Agents are reasoning... This may take 2-5 minutes for complex queries.",
+        iteration=0,
+    )
+    # ... rest of setup ...
+    async for event in workflow.run_stream(task):
+        yield self._process_event(event)
+```
+**Pros:**
+- Simple, doesn't require Gradio changes
+- Works with current string-based approach
+- Sets user expectations ("2-5 minutes")
+**Cons:**
+- No spinner/animation (static text)
+- Doesn't show real-time progress during the gap
+### Option B: Use `gr.ChatMessage` with Metadata (Major Refactor)
+**Effort:** 2-4 hours
+**Impact:** Full gold-standard UX
+Would require:
+1. Changing `research_agent()` to yield `gr.ChatMessage` objects
+2. Adding thinking states with `metadata={"status": "pending"}`
+3. Updating all event handlers to produce proper ChatMessage objects
+### Option C: Heartbeat/Polling (Over-Engineering)
+**Effort:** 4+ hours
+**Impact:** Spinner during blocking call
+Create a background task that yields "still working..." every 10 seconds while waiting for the agent-framework. Requires:
+- `asyncio.create_task()` for heartbeat
+- Task cancellation when real events arrive
+- Proper cleanup
+**Verdict:** Over-engineering for a demo.
+### Option D: Accept the Limitation (Document It)
+**Effort:** 0
+**Impact:** None (users still confused)
+Just document that Advanced mode takes 2-5 minutes and users should wait.
+---
+## Recommendation
+**Implement Option A** - Add a "thinking" yield before the blocking call.
+It's:
+1. Minimal code change (5 minutes)
+2. Sets user expectations clearly
+3. Doesn't require Gradio refactoring
+4. Better than silence
+---
+## Implementation Plan
+### Step 1: Add "thinking" Event Type
+```python
+# In src/utils/models.py
+class AgentEvent(BaseModel):
+    type: Literal[
+        "started", "thinking", "searching", ...  # Add "thinking"
+    ]
+```
+### Step 2: Yield Thinking Event in Magentic Orchestrator
+```python
+# In src/orchestrator_magentic.py, run() method
+yield AgentEvent(
+    type="thinking",
+    message="🧠 Multi-agent reasoning in progress... This may take 2-5 minutes.",
+    iteration=0,
+)
+```
+### Step 3: Handle in App
+```python
+# In src/app.py, research_agent()
+if event.type == "thinking":
+    yield f"⏳ {event.message}"
+```
+---
+## Test Plan
+- [ ] Add `"thinking"` to AgentEvent type literals
+- [ ] Add yield before `workflow.run_stream()`
+- [ ] Handle in app.py
+- [ ] `make check` passes
+- [ ] Manual test: Advanced mode shows "reasoning in progress" message
+- [ ] Deploy to HuggingFace, verify UX improvement
+---
+## References
+- [Gradio ChatInterface Docs](https://www.gradio.app/docs/gradio/chatinterface)
+- [Gradio Chatbot Metadata](https://www.gradio.app/docs/gradio/chatbot)
+- [Agents and Tool Usage Guide](https://www.gradio.app/guides/agents-and-tool-usage)
+- [GitHub Issue: Streaming text not working](https://github.com/gradio-app/gradio/issues/11443)

src/agents/magentic_agents.py CHANGED Viewed

@@ -46,7 +46,7 @@ Be thorough - search multiple databases when appropriate.
 Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
         chat_client=client,
         tools=[search_pubmed, search_clinical_trials, search_preprints],
-        temperature=0.3,  # More deterministic for tool use
     )
@@ -85,7 +85,7 @@ Be rigorous but fair. Look for:
 - Safety data
 - Drug-drug interactions""",
         chat_client=client,
-        temperature=0.2,  # Consistent judgments
     )
@@ -122,7 +122,7 @@ def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> Chat
 Focus on mechanistic plausibility and existing evidence.""",
         chat_client=client,
-        temperature=0.5,  # Some creativity for hypothesis generation
     )
@@ -180,5 +180,5 @@ Format them as a numbered list.
 Be comprehensive but concise. Cite evidence for all claims.""",
         chat_client=client,
         tools=[get_bibliography],
-        temperature=0.3,
     )

 Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
         chat_client=client,
         tools=[search_pubmed, search_clinical_trials, search_preprints],
+        temperature=1.0,  # Explicitly set for reasoning model compatibility (o1/o3)
     )
 - Safety data
 - Drug-drug interactions""",
         chat_client=client,
+        temperature=1.0,  # Explicitly set for reasoning model compatibility
     )
 Focus on mechanistic plausibility and existing evidence.""",
         chat_client=client,
+        temperature=1.0,  # Explicitly set for reasoning model compatibility
     )
 Be comprehensive but concise. Cite evidence for all claims.""",
         chat_client=client,
         tools=[get_bibliography],
+        temperature=1.0,  # Explicitly set for reasoning model compatibility
     )

src/app.py CHANGED Viewed

@@ -127,8 +127,13 @@ async def research_agent(
         yield "Please enter a research question."
         return
     # BUG FIX: Prefer freshly-entered key, then persisted state
-    user_api_key = (api_key.strip() or api_key_state.strip()) or None
     # Check available keys
     has_openai = bool(os.getenv("OPENAI_API_KEY"))
@@ -170,6 +175,12 @@ async def research_agent(
         yield f"🧠 **Backend**: {backend_name}\n\n"
         async for event in orchestrator.run(message):
             # BUG FIX: Handle streaming events separately to avoid token-by-token spam
             if event.type == "streaming":
@@ -236,15 +247,20 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
             [
                 "What drugs improve female libido post-menopause?",
                 "simple",
-                # Removed empty strings for api_key and api_key_state to prevent overwriting
             ],
             [
                 "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
                 "advanced",
             ],
             [
-                "Evidence for testosterone therapy in women with HSDD?",
                 "simple",
             ],
         ],
         additional_inputs_accordion=additional_inputs_accordion,
@@ -265,8 +281,8 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
         ],
     )
-    # API key persists because examples only include [message, mode] columns,
-    # so Gradio doesn't overwrite the api_key textbox when examples are clicked.
     return demo, additional_inputs_accordion

         yield "Please enter a research question."
         return
+    # BUG FIX: Handle None values from Gradio example caching
+    # Gradio passes None for missing example columns, overriding defaults
+    api_key_str = api_key or ""
+    api_key_state_str = api_key_state or ""
     # BUG FIX: Prefer freshly-entered key, then persisted state
+    user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
     # Check available keys
     has_openai = bool(os.getenv("OPENAI_API_KEY"))
         yield f"🧠 **Backend**: {backend_name}\n\n"
+        # Immediate loading feedback so user knows something is happening
+        yield (
+            f"🧠 **Backend**: {backend_name}\n\n"
+            "⏳ **Processing...** Searching PubMed, ClinicalTrials.gov, Europe PMC...\n"
+        )
         async for event in orchestrator.run(message):
             # BUG FIX: Handle streaming events separately to avoid token-by-token spam
             if event.type == "streaming":
             [
                 "What drugs improve female libido post-menopause?",
                 "simple",
+                None,
+                None,
             ],
             [
                 "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
                 "advanced",
+                None,
+                None,
             ],
             [
+                "Testosterone therapy for HSDD (Hypoactive Sexual Desire Disorder)?",
                 "simple",
+                None,
+                None,
             ],
         ],
         additional_inputs_accordion=additional_inputs_accordion,
         ],
     )
+    # API key persists because examples include [message, mode, None, None].
+    # The explicit None values tell Gradio to NOT overwrite those inputs.
     return demo, additional_inputs_accordion

src/orchestrator_magentic.py CHANGED Viewed

@@ -156,6 +156,17 @@ Focus on:
 The final output should be a structured research report."""
         iteration = 0
         try:
             async for event in workflow.run_stream(task):

 The final output should be a structured research report."""
+        # UX FIX: Yield thinking state before blocking workflow call
+        # The workflow.run_stream() blocks for 2+ minutes on first LLM call
+        yield AgentEvent(
+            type="thinking",
+            message=(
+                "Multi-agent reasoning in progress... "
+                "This may take 2-5 minutes for complex queries."
+            ),
+            iteration=0,
+        )
         iteration = 0
         try:
             async for event in workflow.run_stream(task):

src/utils/models.py CHANGED Viewed

@@ -106,6 +106,7 @@ class AgentEvent(BaseModel):
     type: Literal[
         "started",
         "searching",
         "search_complete",
         "judging",
@@ -128,6 +129,7 @@ class AgentEvent(BaseModel):
         """Format event as markdown for chat display."""
         icons = {
             "started": "🚀",
             "searching": "🔍",
             "search_complete": "📚",
             "judging": "🧠",

     type: Literal[
         "started",
+        "thinking",  # Multi-agent reasoning in progress (before first event)
         "searching",
         "search_complete",
         "judging",
         """Format event as markdown for chat display."""
         icons = {
             "started": "🚀",
+            "thinking": "⏳",  # Hourglass for thinking/waiting
             "searching": "🔍",
             "search_complete": "📚",
             "judging": "🧠",

tests/unit/test_gradio_crash.py ADDED Viewed

	@@ -0,0 +1,86 @@

+"""Test that Gradio example caching doesn't crash with None parameters."""
+from unittest.mock import MagicMock
+import pytest
+from src.utils.models import AgentEvent
+@pytest.mark.unit
+@pytest.mark.asyncio
+async def test_research_agent_handles_none_parameters():
+    """
+    Test that research_agent handles None parameters gracefully.
+    This simulates Gradio's example caching behavior where missing
+    example columns are passed as None instead of using default values.
+    Bug: https://huggingface.co/spaces/MCP-1st-Birthday/DeepBoner crashed
+    because api_key=None and api_key_state=None caused .strip() to fail.
+    """
+    # Mock the orchestrator to avoid real API calls
+    import src.app as app_module
+    from src.app import research_agent
+    mock_orchestrator = MagicMock()
+    async def mock_run(query):
+        yield AgentEvent(type="complete", message="Test complete", iteration=1)
+    mock_orchestrator.run = mock_run
+    original_configure = app_module.configure_orchestrator
+    app_module.configure_orchestrator = MagicMock(return_value=(mock_orchestrator, "Mock"))
+    try:
+        # This should NOT raise AttributeError: 'NoneType' object has no attribute 'strip'
+        results = []
+        async for result in research_agent(
+            message="test query",
+            history=[],
+            mode="simple",
+            api_key=None,  # Simulating Gradio passing None
+            api_key_state=None,  # Simulating Gradio passing None
+        ):
+            results.append(result)
+        # If we get here without AttributeError, the fix works
+        assert len(results) > 0, "Should have yielded at least one result"
+    finally:
+        app_module.configure_orchestrator = original_configure
+@pytest.mark.unit
+@pytest.mark.asyncio
+async def test_research_agent_handles_empty_string_parameters():
+    """Test that empty strings (the expected default) also work."""
+    import src.app as app_module
+    from src.app import research_agent
+    mock_orchestrator = MagicMock()
+    async def mock_run(query):
+        yield AgentEvent(type="complete", message="Test complete", iteration=1)
+    mock_orchestrator.run = mock_run
+    original_configure = app_module.configure_orchestrator
+    app_module.configure_orchestrator = MagicMock(return_value=(mock_orchestrator, "Mock"))
+    try:
+        results = []
+        async for result in research_agent(
+            message="test query",
+            history=[],
+            mode="simple",
+            api_key="",  # Normal empty string
+            api_key_state="",  # Normal empty string
+        ):
+            results.append(result)
+        assert len(results) > 0
+    finally:
+        app_module.configure_orchestrator = original_configure