VibecoderMcSwaggins commited on
Commit
6b5e05b
Β·
unverified Β·
1 Parent(s): 8f45b69

fix: P0/P1/P2 - Gradio crash, UX improvements, thinking state (#61)

Browse files

* fix: handle None parameters from Gradio example caching (P0)

Root cause: Gradio passes None for missing example columns during
startup caching, overriding Python default values. Line 131 called
.strip() on None, crashing the HuggingFace Space.

Fix: Add defensive None handling before .strip():
api_key_str = api_key or ""
api_key_state_str = api_key_state or ""

Added tests to prevent regression.

* fix: multiple UX improvements (P1 bugs)

1. HSDD acronym spelled out (Hypoactive Sexual Desire Disorder)
2. Added loading indicator ("Processing...") for immediate feedback
3. Removed temperature settings from magentic agents for reasoning
model compatibility (o3, o1 only support temperature=1)
4. Bug report documenting remaining issues (API key persistence)

140 tests passing.

* fix: enhance UX with "thinking" state and API key persistence

1. Added a "thinking" state yield before blocking calls in Magentic orchestrator to improve user feedback during long processing times.
2. Updated Gradio examples to include explicit None values for API key inputs, ensuring persistence across example clicks.
3. Set temperature explicitly to 1.0 for compatibility with reasoning models in Magentic agents.

All tests passing.

docs/bugs/P0_GRADIO_EXAMPLE_CACHING_CRASH.md ADDED
@@ -0,0 +1,134 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P0 Bug Report: Gradio Example Caching Crash
2
+
3
+ ## Status
4
+ - **Date:** 2025-11-29
5
+ - **Priority:** P0 CRITICAL (Production Down)
6
+ - **Component:** `src/app.py:131`
7
+ - **Environment:** HuggingFace Spaces (Python 3.11, Gradio)
8
+
9
+ ## Error Message
10
+
11
+ ```text
12
+ AttributeError: 'NoneType' object has no attribute 'strip'
13
+ ```
14
+
15
+ ## Full Stack Trace
16
+
17
+ ```text
18
+ File "/app/src/app.py", line 131, in research_agent
19
+ user_api_key = (api_key.strip() or api_key_state.strip()) or None
20
+ ^^^^^^^^^^^^^
21
+ AttributeError: 'NoneType' object has no attribute 'strip'
22
+ ```
23
+
24
+ ## Root Cause Analysis
25
+
26
+ ### The Trigger
27
+ Gradio's example caching mechanism runs the `research_agent` function during startup to pre-cache example outputs. This happens at:
28
+
29
+ ```text
30
+ File "/usr/local/lib/python3.11/site-packages/gradio/helpers.py", line 509, in _start_caching
31
+ await self.cache()
32
+ ```
33
+
34
+ ### The Problem
35
+ Our examples only provide values for 2 of the 4 function parameters:
36
+
37
+ ```python
38
+ examples=[
39
+ ["What is the evidence for testosterone therapy in women with HSDD?", "simple"],
40
+ ["Promising drug candidates for endometriosis pain management", "simple"],
41
+ ]
42
+ ```
43
+
44
+ These map to `[message, mode]` but **NOT** to `api_key` or `api_key_state`.
45
+
46
+ When Gradio runs the function for caching, it passes `None` for the unprovided parameters:
47
+
48
+ ```python
49
+ async def research_agent(
50
+ message: str, # βœ… Provided by example
51
+ history: list[...], # βœ… Empty list default
52
+ mode: str = "simple", # βœ… Provided by example
53
+ api_key: str = "", # ❌ Becomes None during caching!
54
+ api_key_state: str = "" # ❌ Becomes None during caching!
55
+ ) -> AsyncGenerator[...]:
56
+ ```
57
+
58
+ ### The Crash
59
+ Line 131 attempts to call `.strip()` on `None`:
60
+
61
+ ```python
62
+ user_api_key = (api_key.strip() or api_key_state.strip()) or None
63
+ # ^^^^^^^^^^^^^
64
+ # NoneType has no attribute 'strip'
65
+ ```
66
+
67
+ ## Gradio Warning (Ignored)
68
+
69
+ Gradio actually warned us about this:
70
+
71
+ ```text
72
+ UserWarning: Examples will be cached but not all input components have
73
+ example values. This may result in an exception being thrown by your function.
74
+ ```
75
+
76
+ ## Solution
77
+
78
+ ### Option A: Defensive None Handling (Recommended)
79
+ Add None guards before calling `.strip()`:
80
+
81
+ ```python
82
+ # Handle None values from Gradio example caching
83
+ api_key_str = api_key or ""
84
+ api_key_state_str = api_key_state or ""
85
+ user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
86
+ ```
87
+
88
+ ### Option B: Disable Example Caching
89
+ Set `cache_examples=False` in ChatInterface:
90
+
91
+ ```python
92
+ gr.ChatInterface(
93
+ fn=research_agent,
94
+ examples=[...],
95
+ cache_examples=False, # Disable caching
96
+ )
97
+ ```
98
+
99
+ This avoids the crash but loses the UX benefit of pre-cached examples.
100
+
101
+ ### Option C: Provide Full Example Values
102
+ Include all 4 columns in examples:
103
+
104
+ ```python
105
+ examples=[
106
+ ["What is the evidence...", "simple", "", ""], # [msg, mode, api_key, state]
107
+ ]
108
+ ```
109
+
110
+ This is verbose and exposes internal state to users.
111
+
112
+ ## Recommendation
113
+
114
+ **Option A** is the cleanest fix. It:
115
+ 1. Maintains cached examples for fast UX
116
+ 2. Handles edge cases defensively
117
+ 3. Doesn't expose internal state in examples
118
+
119
+ ## Pre-Merge Checklist
120
+
121
+ - [ ] Fix applied to `src/app.py`
122
+ - [ ] Unit test added for None parameter handling
123
+ - [ ] `make check` passes
124
+ - [ ] Test locally with `uv run python -m src.app`
125
+ - [ ] Verify example caching works without crash
126
+ - [ ] Deploy to HuggingFace Spaces
127
+ - [ ] Verify Space starts without error
128
+
129
+ ## Lessons Learned
130
+
131
+ 1. Always test Gradio apps with example caching enabled locally before deploying
132
+ 2. Gradio's "partial examples" feature passes `None` for missing columns
133
+ 3. Default parameter values (`str = ""`) are ignored when Gradio explicitly passes `None`
134
+ 4. The Gradio warning about missing example values should be treated as an error
docs/bugs/P1_MULTIPLE_UX_BUGS.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Bug Report: Multiple UX and Configuration Issues
2
+
3
+ ## Status
4
+ - **Date:** 2025-11-29
5
+ - **Priority:** P1 (Multiple user-facing issues)
6
+ - **Components:** `src/app.py`, `src/orchestrator_magentic.py`
7
+
8
+ ## Resolved Issues (Fixed 2025-11-29)
9
+
10
+ ### Bug 1: API Key Cleared When Clicking Examples
11
+ **Fixed.** Updated `examples` in `app.py` to include explicit `None` values for additional inputs. Gradio preserves values when the example value is `None`.
12
+
13
+ ### Bug 2: No Loading/Processing Indicator
14
+ **Fixed.** `research_agent` yields an immediate "⏳ Processing..." message before starting the orchestrator.
15
+
16
+ ### Bug 3: Advanced Mode Temperature Error
17
+ **Fixed.** Explicitly set `temperature=1.0` for all Magentic agents in `src/agents/magentic_agents.py`. This is compatible with OpenAI reasoning models (o1/o3) which require `temperature=1` and were rejecting the default (likely 0.3 or None).
18
+
19
+ ### Bug 4: HSDD Acronym Not Spelled Out
20
+ **Fixed.** Updated example text in `app.py` to "HSDD (Hypoactive Sexual Desire Disorder)".
21
+
22
+ ---
23
+
24
+ ## Open / Deferred Issues
25
+
26
+ ### Bug 5: Free Tier Quota Exhausted (UX Improvement)
27
+ **Deferred.** Currently shows standard error message. Improve if users report confusion.
28
+
29
+ ### Bug 6: Asyncio File Descriptor Warnings
30
+ **Won't Fix.** Cosmetic issue only.
31
+
32
+ ---
33
+
34
+ ## Priority Order (Completed)
35
+
36
+ 1. **Bug 4 (HSDD)** - Fixed
37
+ 2. **Bug 2 (Loading indicator)** - Fixed
38
+ 3. **Bug 3 (Temperature)** - Fixed
39
+ 4. **Bug 1 (API key)** - Fixed
40
+
41
+ ---
42
+
43
+ ## Test Plan
44
+ - [x] Fix HSDD acronym
45
+ - [x] Add loading indicator yield
46
+ - [x] Test advanced mode with temperature fix (Static analysis/Code change)
47
+ - [x] Research Gradio example behavior for API key (Implemented None fix)
48
+ - [ ] Run `make check`
49
+ - [ ] Deploy and test on HuggingFace Spaces
docs/bugs/P2_MAGENTIC_THINKING_STATE.md ADDED
@@ -0,0 +1,232 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P2 Bug Report: Advanced Mode Missing "Thinking" State
2
+
3
+ ## Status
4
+ - **Date:** 2025-11-29
5
+ - **Priority:** P2 (UX polish, not blocking functionality)
6
+ - **Component:** `src/orchestrator_magentic.py`, `src/app.py`
7
+
8
+ ---
9
+
10
+ ## Symptoms
11
+
12
+ User experience in **Advanced (Magentic) mode**:
13
+ 1. Click example or submit query
14
+ 2. See: `πŸš€ **STARTED**: Starting research (Magentic mode)...`
15
+ 3. **2+ minutes of nothing** (no spinner, no progress, no indication work is happening)
16
+ 4. Eventually see: `🧠 **JUDGING**: Manager (user_task)...`
17
+
18
+ **User perception:** "Is it frozen? Did it crash?"
19
+
20
+ ### Container Logs Confirm Work IS Happening
21
+ ```
22
+ 14:54:22 [info] Starting Magentic orchestrator query='...'
23
+ 14:54:22 [info] Embedding service enabled
24
+ ... 2+ MINUTES OF SILENCE (agent-framework doing internal LLM calls) ...
25
+ 14:56:38 [info] Creating orchestrator mode=advanced
26
+ ```
27
+
28
+ The silence is because `workflow.run_stream()` doesn't yield events during its setup phase.
29
+
30
+ ---
31
+
32
+ ## Root Cause Analysis
33
+
34
+ ### Current Flow (`src/orchestrator_magentic.py`)
35
+ ```python
36
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
37
+ # 1. Immediately yields "started"
38
+ yield AgentEvent(type="started", message=f"Starting research (Magentic mode): {query}")
39
+
40
+ # 2. Setup (fast, no yield needed)
41
+ embedding_service = self._init_embedding_service()
42
+ init_magentic_state(embedding_service)
43
+ workflow = self._build_workflow()
44
+
45
+ # 3. GAP: workflow.run_stream() blocks for 2+ minutes before first event
46
+ async for event in workflow.run_stream(task): # <-- THE BOTTLENECK
47
+ yield self._process_event(event)
48
+ ```
49
+
50
+ The `agent-framework`'s `workflow.run_stream()` is calling OpenAI's API, building the manager prompt, coordinating agents, etc. **It doesn't yield events during this setup phase**.
51
+
52
+ ---
53
+
54
+ ## Gold Standard UX (What We'd Want)
55
+
56
+ ### Gradio's Native Thinking Support
57
+
58
+ Per [Gradio Chatbot Docs](https://www.gradio.app/docs/gradio/chatbot):
59
+
60
+ > "The Gradio Chatbot can natively display intermediate thoughts and tool usage in a collapsible accordion next to a chat message. This makes it perfect for creating UIs for LLM agents and chain-of-thought (CoT) or reasoning demos."
61
+
62
+ **Features available:**
63
+ - `gr.ChatMessage` with `metadata={"status": "pending"}` shows spinner
64
+ - `metadata={"title": "Thinking...", "status": "pending"}` creates collapsible accordion
65
+ - Nested thoughts via `id` and `parent_id`
66
+ - `duration` metadata shows time spent
67
+
68
+ **Example from Gradio docs:**
69
+ ```python
70
+ import gradio as gr
71
+
72
+ def chat_fn(message, history):
73
+ # Yield thinking state with spinner
74
+ yield gr.ChatMessage(
75
+ role="assistant",
76
+ metadata={"title": "🧠 Thinking...", "status": "pending"}
77
+ )
78
+
79
+ # Do work...
80
+
81
+ # Update with completed thought
82
+ yield gr.ChatMessage(
83
+ role="assistant",
84
+ content="Analysis complete",
85
+ metadata={"title": "🧠 Thinking...", "status": "done", "duration": 5.2}
86
+ )
87
+
88
+ yield "Here's the final answer..."
89
+ ```
90
+
91
+ ---
92
+
93
+ ## Why This is Complex for DeepBoner
94
+
95
+ ### Constraint 1: ChatInterface Returns Strings
96
+ Our `research_agent()` yields plain strings:
97
+ ```python
98
+ yield "🧠 **Backend**: {backend_name}\n\n"
99
+ yield "⏳ **Processing...** Searching PubMed...\n"
100
+ yield "\n\n".join(response_parts)
101
+ ```
102
+
103
+ Converting to `gr.ChatMessage` objects would require refactoring the entire response pipeline.
104
+
105
+ ### Constraint 2: Agent-Framework is the Bottleneck
106
+ The 2-minute gap is inside `workflow.run_stream(task)`, which is the `agent-framework` library. We can't inject yields into a third-party library's blocking call.
107
+
108
+ ### Constraint 3: ChatInterface vs Blocks
109
+ `gr.ChatInterface` is a convenience wrapper. The full `gr.ChatMessage` metadata features work best with raw `gr.Blocks` + `gr.Chatbot` components.
110
+
111
+ ---
112
+
113
+ ## Options
114
+
115
+ ### Option A: Yield "Thinking" Before Blocking Call (Recommended)
116
+ **Effort:** 5 minutes
117
+ **Impact:** Users see *something* while waiting
118
+
119
+ ```python
120
+ # In src/orchestrator_magentic.py
121
+ async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
122
+ yield AgentEvent(type="started", message=f"Starting research (Magentic mode): {query}")
123
+
124
+ # NEW: Yield thinking state before the blocking call
125
+ yield AgentEvent(
126
+ type="thinking", # New event type
127
+ message="🧠 Agents are reasoning... This may take 2-5 minutes for complex queries.",
128
+ iteration=0,
129
+ )
130
+
131
+ # ... rest of setup ...
132
+
133
+ async for event in workflow.run_stream(task):
134
+ yield self._process_event(event)
135
+ ```
136
+
137
+ **Pros:**
138
+ - Simple, doesn't require Gradio changes
139
+ - Works with current string-based approach
140
+ - Sets user expectations ("2-5 minutes")
141
+
142
+ **Cons:**
143
+ - No spinner/animation (static text)
144
+ - Doesn't show real-time progress during the gap
145
+
146
+ ### Option B: Use `gr.ChatMessage` with Metadata (Major Refactor)
147
+ **Effort:** 2-4 hours
148
+ **Impact:** Full gold-standard UX
149
+
150
+ Would require:
151
+ 1. Changing `research_agent()` to yield `gr.ChatMessage` objects
152
+ 2. Adding thinking states with `metadata={"status": "pending"}`
153
+ 3. Updating all event handlers to produce proper ChatMessage objects
154
+
155
+ ### Option C: Heartbeat/Polling (Over-Engineering)
156
+ **Effort:** 4+ hours
157
+ **Impact:** Spinner during blocking call
158
+
159
+ Create a background task that yields "still working..." every 10 seconds while waiting for the agent-framework. Requires:
160
+ - `asyncio.create_task()` for heartbeat
161
+ - Task cancellation when real events arrive
162
+ - Proper cleanup
163
+
164
+ **Verdict:** Over-engineering for a demo.
165
+
166
+ ### Option D: Accept the Limitation (Document It)
167
+ **Effort:** 0
168
+ **Impact:** None (users still confused)
169
+
170
+ Just document that Advanced mode takes 2-5 minutes and users should wait.
171
+
172
+ ---
173
+
174
+ ## Recommendation
175
+
176
+ **Implement Option A** - Add a "thinking" yield before the blocking call.
177
+
178
+ It's:
179
+ 1. Minimal code change (5 minutes)
180
+ 2. Sets user expectations clearly
181
+ 3. Doesn't require Gradio refactoring
182
+ 4. Better than silence
183
+
184
+ ---
185
+
186
+ ## Implementation Plan
187
+
188
+ ### Step 1: Add "thinking" Event Type
189
+ ```python
190
+ # In src/utils/models.py
191
+ class AgentEvent(BaseModel):
192
+ type: Literal[
193
+ "started", "thinking", "searching", ... # Add "thinking"
194
+ ]
195
+ ```
196
+
197
+ ### Step 2: Yield Thinking Event in Magentic Orchestrator
198
+ ```python
199
+ # In src/orchestrator_magentic.py, run() method
200
+ yield AgentEvent(
201
+ type="thinking",
202
+ message="🧠 Multi-agent reasoning in progress... This may take 2-5 minutes.",
203
+ iteration=0,
204
+ )
205
+ ```
206
+
207
+ ### Step 3: Handle in App
208
+ ```python
209
+ # In src/app.py, research_agent()
210
+ if event.type == "thinking":
211
+ yield f"⏳ {event.message}"
212
+ ```
213
+
214
+ ---
215
+
216
+ ## Test Plan
217
+
218
+ - [ ] Add `"thinking"` to AgentEvent type literals
219
+ - [ ] Add yield before `workflow.run_stream()`
220
+ - [ ] Handle in app.py
221
+ - [ ] `make check` passes
222
+ - [ ] Manual test: Advanced mode shows "reasoning in progress" message
223
+ - [ ] Deploy to HuggingFace, verify UX improvement
224
+
225
+ ---
226
+
227
+ ## References
228
+
229
+ - [Gradio ChatInterface Docs](https://www.gradio.app/docs/gradio/chatinterface)
230
+ - [Gradio Chatbot Metadata](https://www.gradio.app/docs/gradio/chatbot)
231
+ - [Agents and Tool Usage Guide](https://www.gradio.app/guides/agents-and-tool-usage)
232
+ - [GitHub Issue: Streaming text not working](https://github.com/gradio-app/gradio/issues/11443)
src/agents/magentic_agents.py CHANGED
@@ -46,7 +46,7 @@ Be thorough - search multiple databases when appropriate.
46
  Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
47
  chat_client=client,
48
  tools=[search_pubmed, search_clinical_trials, search_preprints],
49
- temperature=0.3, # More deterministic for tool use
50
  )
51
 
52
 
@@ -85,7 +85,7 @@ Be rigorous but fair. Look for:
85
  - Safety data
86
  - Drug-drug interactions""",
87
  chat_client=client,
88
- temperature=0.2, # Consistent judgments
89
  )
90
 
91
 
@@ -122,7 +122,7 @@ def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> Chat
122
 
123
  Focus on mechanistic plausibility and existing evidence.""",
124
  chat_client=client,
125
- temperature=0.5, # Some creativity for hypothesis generation
126
  )
127
 
128
 
@@ -180,5 +180,5 @@ Format them as a numbered list.
180
  Be comprehensive but concise. Cite evidence for all claims.""",
181
  chat_client=client,
182
  tools=[get_bibliography],
183
- temperature=0.3,
184
  )
 
46
  Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
47
  chat_client=client,
48
  tools=[search_pubmed, search_clinical_trials, search_preprints],
49
+ temperature=1.0, # Explicitly set for reasoning model compatibility (o1/o3)
50
  )
51
 
52
 
 
85
  - Safety data
86
  - Drug-drug interactions""",
87
  chat_client=client,
88
+ temperature=1.0, # Explicitly set for reasoning model compatibility
89
  )
90
 
91
 
 
122
 
123
  Focus on mechanistic plausibility and existing evidence.""",
124
  chat_client=client,
125
+ temperature=1.0, # Explicitly set for reasoning model compatibility
126
  )
127
 
128
 
 
180
  Be comprehensive but concise. Cite evidence for all claims.""",
181
  chat_client=client,
182
  tools=[get_bibliography],
183
+ temperature=1.0, # Explicitly set for reasoning model compatibility
184
  )
src/app.py CHANGED
@@ -127,8 +127,13 @@ async def research_agent(
127
  yield "Please enter a research question."
128
  return
129
 
 
 
 
 
 
130
  # BUG FIX: Prefer freshly-entered key, then persisted state
131
- user_api_key = (api_key.strip() or api_key_state.strip()) or None
132
 
133
  # Check available keys
134
  has_openai = bool(os.getenv("OPENAI_API_KEY"))
@@ -170,6 +175,12 @@ async def research_agent(
170
 
171
  yield f"🧠 **Backend**: {backend_name}\n\n"
172
 
 
 
 
 
 
 
173
  async for event in orchestrator.run(message):
174
  # BUG FIX: Handle streaming events separately to avoid token-by-token spam
175
  if event.type == "streaming":
@@ -236,15 +247,20 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
236
  [
237
  "What drugs improve female libido post-menopause?",
238
  "simple",
239
- # Removed empty strings for api_key and api_key_state to prevent overwriting
 
240
  ],
241
  [
242
  "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
243
  "advanced",
 
 
244
  ],
245
  [
246
- "Evidence for testosterone therapy in women with HSDD?",
247
  "simple",
 
 
248
  ],
249
  ],
250
  additional_inputs_accordion=additional_inputs_accordion,
@@ -265,8 +281,8 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
265
  ],
266
  )
267
 
268
- # API key persists because examples only include [message, mode] columns,
269
- # so Gradio doesn't overwrite the api_key textbox when examples are clicked.
270
 
271
  return demo, additional_inputs_accordion
272
 
 
127
  yield "Please enter a research question."
128
  return
129
 
130
+ # BUG FIX: Handle None values from Gradio example caching
131
+ # Gradio passes None for missing example columns, overriding defaults
132
+ api_key_str = api_key or ""
133
+ api_key_state_str = api_key_state or ""
134
+
135
  # BUG FIX: Prefer freshly-entered key, then persisted state
136
+ user_api_key = (api_key_str.strip() or api_key_state_str.strip()) or None
137
 
138
  # Check available keys
139
  has_openai = bool(os.getenv("OPENAI_API_KEY"))
 
175
 
176
  yield f"🧠 **Backend**: {backend_name}\n\n"
177
 
178
+ # Immediate loading feedback so user knows something is happening
179
+ yield (
180
+ f"🧠 **Backend**: {backend_name}\n\n"
181
+ "⏳ **Processing...** Searching PubMed, ClinicalTrials.gov, Europe PMC...\n"
182
+ )
183
+
184
  async for event in orchestrator.run(message):
185
  # BUG FIX: Handle streaming events separately to avoid token-by-token spam
186
  if event.type == "streaming":
 
247
  [
248
  "What drugs improve female libido post-menopause?",
249
  "simple",
250
+ None,
251
+ None,
252
  ],
253
  [
254
  "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
255
  "advanced",
256
+ None,
257
+ None,
258
  ],
259
  [
260
+ "Testosterone therapy for HSDD (Hypoactive Sexual Desire Disorder)?",
261
  "simple",
262
+ None,
263
+ None,
264
  ],
265
  ],
266
  additional_inputs_accordion=additional_inputs_accordion,
 
281
  ],
282
  )
283
 
284
+ # API key persists because examples include [message, mode, None, None].
285
+ # The explicit None values tell Gradio to NOT overwrite those inputs.
286
 
287
  return demo, additional_inputs_accordion
288
 
src/orchestrator_magentic.py CHANGED
@@ -156,6 +156,17 @@ Focus on:
156
 
157
  The final output should be a structured research report."""
158
 
 
 
 
 
 
 
 
 
 
 
 
159
  iteration = 0
160
  try:
161
  async for event in workflow.run_stream(task):
 
156
 
157
  The final output should be a structured research report."""
158
 
159
+ # UX FIX: Yield thinking state before blocking workflow call
160
+ # The workflow.run_stream() blocks for 2+ minutes on first LLM call
161
+ yield AgentEvent(
162
+ type="thinking",
163
+ message=(
164
+ "Multi-agent reasoning in progress... "
165
+ "This may take 2-5 minutes for complex queries."
166
+ ),
167
+ iteration=0,
168
+ )
169
+
170
  iteration = 0
171
  try:
172
  async for event in workflow.run_stream(task):
src/utils/models.py CHANGED
@@ -106,6 +106,7 @@ class AgentEvent(BaseModel):
106
 
107
  type: Literal[
108
  "started",
 
109
  "searching",
110
  "search_complete",
111
  "judging",
@@ -128,6 +129,7 @@ class AgentEvent(BaseModel):
128
  """Format event as markdown for chat display."""
129
  icons = {
130
  "started": "πŸš€",
 
131
  "searching": "πŸ”",
132
  "search_complete": "πŸ“š",
133
  "judging": "🧠",
 
106
 
107
  type: Literal[
108
  "started",
109
+ "thinking", # Multi-agent reasoning in progress (before first event)
110
  "searching",
111
  "search_complete",
112
  "judging",
 
129
  """Format event as markdown for chat display."""
130
  icons = {
131
  "started": "πŸš€",
132
+ "thinking": "⏳", # Hourglass for thinking/waiting
133
  "searching": "πŸ”",
134
  "search_complete": "πŸ“š",
135
  "judging": "🧠",
tests/unit/test_gradio_crash.py ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Test that Gradio example caching doesn't crash with None parameters."""
2
+
3
+ from unittest.mock import MagicMock
4
+
5
+ import pytest
6
+
7
+ from src.utils.models import AgentEvent
8
+
9
+
10
+ @pytest.mark.unit
11
+ @pytest.mark.asyncio
12
+ async def test_research_agent_handles_none_parameters():
13
+ """
14
+ Test that research_agent handles None parameters gracefully.
15
+
16
+ This simulates Gradio's example caching behavior where missing
17
+ example columns are passed as None instead of using default values.
18
+
19
+ Bug: https://huggingface.co/spaces/MCP-1st-Birthday/DeepBoner crashed
20
+ because api_key=None and api_key_state=None caused .strip() to fail.
21
+ """
22
+ # Mock the orchestrator to avoid real API calls
23
+ import src.app as app_module
24
+ from src.app import research_agent
25
+
26
+ mock_orchestrator = MagicMock()
27
+
28
+ async def mock_run(query):
29
+ yield AgentEvent(type="complete", message="Test complete", iteration=1)
30
+
31
+ mock_orchestrator.run = mock_run
32
+
33
+ original_configure = app_module.configure_orchestrator
34
+ app_module.configure_orchestrator = MagicMock(return_value=(mock_orchestrator, "Mock"))
35
+
36
+ try:
37
+ # This should NOT raise AttributeError: 'NoneType' object has no attribute 'strip'
38
+ results = []
39
+ async for result in research_agent(
40
+ message="test query",
41
+ history=[],
42
+ mode="simple",
43
+ api_key=None, # Simulating Gradio passing None
44
+ api_key_state=None, # Simulating Gradio passing None
45
+ ):
46
+ results.append(result)
47
+
48
+ # If we get here without AttributeError, the fix works
49
+ assert len(results) > 0, "Should have yielded at least one result"
50
+
51
+ finally:
52
+ app_module.configure_orchestrator = original_configure
53
+
54
+
55
+ @pytest.mark.unit
56
+ @pytest.mark.asyncio
57
+ async def test_research_agent_handles_empty_string_parameters():
58
+ """Test that empty strings (the expected default) also work."""
59
+ import src.app as app_module
60
+ from src.app import research_agent
61
+
62
+ mock_orchestrator = MagicMock()
63
+
64
+ async def mock_run(query):
65
+ yield AgentEvent(type="complete", message="Test complete", iteration=1)
66
+
67
+ mock_orchestrator.run = mock_run
68
+
69
+ original_configure = app_module.configure_orchestrator
70
+ app_module.configure_orchestrator = MagicMock(return_value=(mock_orchestrator, "Mock"))
71
+
72
+ try:
73
+ results = []
74
+ async for result in research_agent(
75
+ message="test query",
76
+ history=[],
77
+ mode="simple",
78
+ api_key="", # Normal empty string
79
+ api_key_state="", # Normal empty string
80
+ ):
81
+ results.append(result)
82
+
83
+ assert len(results) > 0
84
+
85
+ finally:
86
+ app_module.configure_orchestrator = original_configure