VibecoderMcSwaggins commited on
Commit
5441526
·
1 Parent(s): 29a6844

fix: multiple UX improvements (P1 bugs)

Browse files

1. HSDD acronym spelled out (Hypoactive Sexual Desire Disorder)
2. Added loading indicator ("Processing...") for immediate feedback
3. Removed temperature settings from magentic agents for reasoning
model compatibility (o3, o1 only support temperature=1)
4. Bug report documenting remaining issues (API key persistence)

140 tests passing.

docs/bugs/P1_MULTIPLE_UX_BUGS.md ADDED
@@ -0,0 +1,174 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1 Bug Report: Multiple UX and Configuration Issues
2
+
3
+ ## Status
4
+ - **Date:** 2025-11-29
5
+ - **Priority:** P1 (Multiple user-facing issues)
6
+ - **Components:** `src/app.py`, `src/orchestrator_magentic.py`
7
+
8
+ ---
9
+
10
+ ## Bug 1: API Key Cleared When Clicking Examples
11
+
12
+ ### Symptoms
13
+ - User enters API key in textbox
14
+ - User clicks an example prompt
15
+ - API key textbox is cleared/reset
16
+
17
+ ### Root Cause
18
+ Despite examples only having 2 columns `[message, mode]`, Gradio's ChatInterface still resets `additional_inputs` that aren't in the examples list. The comment on line 273-274 was incorrect:
19
+
20
+ ```python
21
+ # API key persists because examples only include [message, mode] columns,
22
+ # so Gradio doesn't overwrite the api_key textbox when examples are clicked.
23
+ ```
24
+
25
+ This assumption is **wrong** - Gradio resets ALL additional_inputs, not just those with example values.
26
+
27
+ ### Potential Fix
28
+ Option A: Include API key column in examples (set to empty string explicitly)
29
+ ```python
30
+ examples=[
31
+ ["What drugs improve female libido?", "simple", ""],
32
+ ...
33
+ ]
34
+ ```
35
+
36
+ Option B: Use JavaScript to preserve the value (hacky)
37
+
38
+ Option C: Move API key outside ChatInterface into a separate Blocks layout
39
+
40
+ ### Research Needed
41
+ - Gradio ChatInterface 2025 behavior with partial examples
42
+ - Whether `cache_examples=False` affects this
43
+
44
+ ---
45
+
46
+ ## Bug 2: No Loading/Processing Indicator
47
+
48
+ ### Symptoms
49
+ - User submits query
50
+ - UI shows "🚀 STARTED:" message but nothing else
51
+ - No spinner, no "thinking...", no indication work is happening
52
+ - User thinks it's frozen
53
+
54
+ ### Container Logs Show
55
+ Work IS happening:
56
+ ```
57
+ [info] Creating orchestrator mode=advanced
58
+ [info] Starting Magentic orchestrator query='...'
59
+ [info] Embedding service enabled
60
+ ```
61
+
62
+ But user sees nothing for 30+ seconds.
63
+
64
+ ### Root Cause
65
+ The Gradio ChatInterface doesn't show intermediate yields quickly enough, and we don't yield a "⏳ Processing..." message immediately.
66
+
67
+ ### Proposed Fix
68
+ Add immediate feedback in `research_agent()`:
69
+ ```python
70
+ yield "⏳ **Processing...** Searching PubMed, ClinicalTrials.gov, Europe PMC..."
71
+ ```
72
+
73
+ ---
74
+
75
+ ## Bug 3: Advanced Mode Temperature Error
76
+
77
+ ### Error
78
+ ```
79
+ Unsupported value: 'temperature' does not support 0.3 with this model.
80
+ Only the default (1) value is supported.
81
+ ```
82
+
83
+ ### Root Cause
84
+ The `agent_framework` (Magentic) is using `temperature=0.3` but some OpenAI models (like `o3`, `o1`, reasoning models) only support `temperature=1`.
85
+
86
+ ### Location
87
+ Likely in `src/orchestrator_magentic.py` or agent-framework configuration.
88
+
89
+ ### Proposed Fix
90
+ 1. Detect model type and skip temperature for reasoning models
91
+ 2. Or: Remove explicit temperature setting, use model defaults
92
+ 3. Or: Catch this error and fall back to default temperature
93
+
94
+ ---
95
+
96
+ ## Bug 4: HSDD Acronym Not Spelled Out
97
+
98
+ ### Issue
99
+ Example prompt says:
100
+ ```
101
+ "Evidence for testosterone therapy in women with HSDD?"
102
+ ```
103
+
104
+ **HSDD = Hypoactive Sexual Desire Disorder** (low libido condition)
105
+
106
+ Most users (including doctors!) won't know this acronym.
107
+
108
+ ### Fix
109
+ Change to:
110
+ ```
111
+ "Evidence for testosterone therapy in women with HSDD (Hypoactive Sexual Desire Disorder)?"
112
+ ```
113
+
114
+ Also update README if it uses this acronym.
115
+
116
+ ---
117
+
118
+ ## Bug 5: Free Tier Quota Exhausted (Expected Behavior)
119
+
120
+ ### Logs
121
+ ```
122
+ [error] HF Quota Exhausted error='402 Client Error: Payment Required...'
123
+ ```
124
+
125
+ ### This is NOT a bug
126
+ HuggingFace free tier has limited credits. When exhausted:
127
+ - User should enter their own API key
128
+ - The app correctly falls back to showing evidence without LLM analysis
129
+
130
+ ### UX Improvement
131
+ Show clearer message to user when quota is exhausted:
132
+ ```
133
+ ⚠️ Free tier quota exceeded. Enter your OpenAI/Anthropic API key above for full analysis.
134
+ ```
135
+
136
+ ---
137
+
138
+ ## Bug 6: Asyncio File Descriptor Warnings (Low Priority)
139
+
140
+ ### Error
141
+ ```
142
+ ValueError: Invalid file descriptor: -1
143
+ Exception ignored in: <function BaseEventLoop.__del__>
144
+ ```
145
+
146
+ ### Root Cause
147
+ Event loop cleanup issue in async code. Common when mixing sync/async or when event loops are garbage collected.
148
+
149
+ ### Impact
150
+ **Cosmetic only** - doesn't affect functionality. Just pollutes logs.
151
+
152
+ ### Fix (if desired)
153
+ Properly close event loops or use `asyncio.run()` context managers.
154
+
155
+ ---
156
+
157
+ ## Priority Order
158
+
159
+ 1. **Bug 4 (HSDD)** - 2 min fix, improves UX immediately
160
+ 2. **Bug 2 (Loading indicator)** - 5 min fix, critical for UX
161
+ 3. **Bug 3 (Temperature)** - Needs investigation, breaks advanced mode
162
+ 4. **Bug 1 (API key)** - Needs Gradio research, workaround exists (enter key after clicking example)
163
+ 5. **Bug 5 (Quota message)** - Nice to have
164
+ 6. **Bug 6 (Asyncio)** - Low priority, cosmetic
165
+
166
+ ---
167
+
168
+ ## Test Plan
169
+ - [ ] Fix HSDD acronym
170
+ - [ ] Add loading indicator yield
171
+ - [ ] Test advanced mode with temperature fix
172
+ - [ ] Research Gradio example behavior for API key
173
+ - [ ] Run `make check`
174
+ - [ ] Deploy and test on HuggingFace Spaces
src/agents/magentic_agents.py CHANGED
@@ -46,7 +46,8 @@ Be thorough - search multiple databases when appropriate.
46
  Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
47
  chat_client=client,
48
  tools=[search_pubmed, search_clinical_trials, search_preprints],
49
- temperature=0.3, # More deterministic for tool use
 
50
  )
51
 
52
 
@@ -85,7 +86,7 @@ Be rigorous but fair. Look for:
85
  - Safety data
86
  - Drug-drug interactions""",
87
  chat_client=client,
88
- temperature=0.2, # Consistent judgments
89
  )
90
 
91
 
@@ -122,7 +123,7 @@ def create_hypothesis_agent(chat_client: OpenAIChatClient | None = None) -> Chat
122
 
123
  Focus on mechanistic plausibility and existing evidence.""",
124
  chat_client=client,
125
- temperature=0.5, # Some creativity for hypothesis generation
126
  )
127
 
128
 
@@ -180,5 +181,5 @@ Format them as a numbered list.
180
  Be comprehensive but concise. Cite evidence for all claims.""",
181
  chat_client=client,
182
  tools=[get_bibliography],
183
- temperature=0.3,
184
  )
 
46
  Focus on finding: mechanisms of action, clinical evidence, and specific drug candidates.""",
47
  chat_client=client,
48
  tools=[search_pubmed, search_clinical_trials, search_preprints],
49
+ # Note: temperature removed for compatibility with reasoning models (o3, o1)
50
+ # which only support temperature=1
51
  )
52
 
53
 
 
86
  - Safety data
87
  - Drug-drug interactions""",
88
  chat_client=client,
89
+ # Note: temperature removed for reasoning model compatibility
90
  )
91
 
92
 
 
123
 
124
  Focus on mechanistic plausibility and existing evidence.""",
125
  chat_client=client,
126
+ # Note: temperature removed for reasoning model compatibility
127
  )
128
 
129
 
 
181
  Be comprehensive but concise. Cite evidence for all claims.""",
182
  chat_client=client,
183
  tools=[get_bibliography],
184
+ # Note: temperature removed for reasoning model compatibility
185
  )
src/app.py CHANGED
@@ -175,6 +175,12 @@ async def research_agent(
175
 
176
  yield f"🧠 **Backend**: {backend_name}\n\n"
177
 
 
 
 
 
 
 
178
  async for event in orchestrator.run(message):
179
  # BUG FIX: Handle streaming events separately to avoid token-by-token spam
180
  if event.type == "streaming":
@@ -248,7 +254,7 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
248
  "advanced",
249
  ],
250
  [
251
- "Evidence for testosterone therapy in women with HSDD?",
252
  "simple",
253
  ],
254
  ],
 
175
 
176
  yield f"🧠 **Backend**: {backend_name}\n\n"
177
 
178
+ # Immediate loading feedback so user knows something is happening
179
+ yield (
180
+ f"🧠 **Backend**: {backend_name}\n\n"
181
+ "⏳ **Processing...** Searching PubMed, ClinicalTrials.gov, Europe PMC...\n"
182
+ )
183
+
184
  async for event in orchestrator.run(message):
185
  # BUG FIX: Handle streaming events separately to avoid token-by-token spam
186
  if event.type == "streaming":
 
254
  "advanced",
255
  ],
256
  [
257
+ "Testosterone therapy for HSDD (Hypoactive Sexual Desire Disorder)?",
258
  "simple",
259
  ],
260
  ],