VibecoderMcSwaggins commited on
Commit
9006d69
Β·
1 Parent(s): 0c9be4a

fix: resolve P1 bugs (streaming UX + api key persistence) and update tests/docs

Browse files
docs/bugs/P1_MAGENTIC_STREAMING_AND_KEY_PERSISTENCE.md CHANGED
@@ -133,178 +133,25 @@ Gradio's `ChatInterface` with `additional_inputs` has known issues:
133
 
134
  ### Fix Applied
135
  **Files Modified:**
136
- 1. `src/app.py:111` - Added `api_key_state: str = ""` parameter to `research_agent()`
137
- 2. `src/app.py:133` - Logic: Use `api_key` if present, else fallback to `api_key_state`
138
- 3. `src/app.py:219` - Created `api_key_state = gr.State("")` component
139
- 4. `src/app.py:234-252` - Added empty `api_key_state` values to examples
140
- 5. `src/app.py:268` - Added `api_key_state` to `additional_inputs` list
141
 
142
- The `gr.State` component persists across example clicks, providing a fallback when the textbox is reset.
143
-
144
- ### Proposed Fix Options
145
-
146
- **Option A: Use `gr.State` for persistence**
147
- ```python
148
- api_key_state = gr.State("")
149
-
150
- def research_agent(message, history, mode, api_key, api_key_state):
151
- # Use api_key_state if api_key is empty
152
- effective_key = api_key or api_key_state
153
- ...
154
- return response, effective_key # Return to update state
155
- ```
156
-
157
- **Option B: Use browser localStorage via JavaScript**
158
- ```python
159
- demo.load(js="""
160
- () => {
161
- const saved = localStorage.getItem('deepboner_api_key');
162
- if (saved) document.querySelector('input[type=password]').value = saved;
163
- }
164
- """)
165
- ```
166
-
167
- **Option C: Environment variable only (remove BYOK textbox)**
168
- Remove the API key input entirely. Require users to set `OPENAI_API_KEY` in HuggingFace Secrets. This is more secure but less user-friendly.
169
-
170
- **Option D: Use Gradio LoginButton or HuggingFace OAuth**
171
- Leverage HF's built-in auth and secrets management.
172
-
173
- ---
174
-
175
- ## Bug 3: Deprecated `OpenAIModel` Import
176
-
177
- ### Symptoms
178
- HuggingFace Spaces logs show deprecation warning:
179
- ```
180
- DeprecationWarning: OpenAIModel is deprecated, use OpenAIChatModel instead
181
- ```
182
-
183
- ### Root Cause
184
- **Files using deprecated API:**
185
- - `src/app.py:9` - `from pydantic_ai.models.openai import OpenAIModel`
186
- - `src/utils/llm_factory.py:59` - `from pydantic_ai.models.openai import OpenAIModel`
187
-
188
- **File already using correct API:**
189
- - `src/agent_factory/judges.py:12` - `from pydantic_ai.models.openai import OpenAIChatModel`
190
-
191
- ### Fix
192
- Replace all `OpenAIModel` imports with `OpenAIChatModel`:
193
-
194
- ```python
195
- # Before (deprecated)
196
- from pydantic_ai.models.openai import OpenAIModel
197
- model = OpenAIModel(settings.openai_model, provider=provider)
198
-
199
- # After (correct)
200
- from pydantic_ai.models.openai import OpenAIChatModel
201
- model = OpenAIChatModel(settings.openai_model, provider=provider)
202
- ```
203
-
204
- **Files to update:**
205
- 1. `src/app.py` - lines 9, 64, 73
206
- 2. `src/utils/llm_factory.py` - lines 59, 67
207
-
208
- ---
209
-
210
- ## Bug 4: Asyncio Event Loop Garbage Collection Error
211
-
212
- ### Symptoms
213
- HuggingFace Spaces logs show intermittent errors:
214
- ```
215
- ValueError: Invalid file descriptor: -1
216
- Exception ignored in: <function BaseSelector.__del__ at 0x...>
217
- ```
218
-
219
- ### Root Cause
220
- This occurs during garbage collection of asyncio event loops. Likely causes:
221
- 1. Event loop cleanup timing issues in Gradio's threaded model
222
- 2. Selector objects being garbage-collected before proper cleanup
223
- 3. Concurrent access to event loop resources during shutdown
224
-
225
- ### Analysis
226
- The codebase uses `asyncio.get_running_loop()` correctly (not the deprecated `get_event_loop()`).
227
- This error appears to be a Gradio/HuggingFace Spaces environment issue rather than a code bug.
228
-
229
- ### Potential Mitigations
230
- 1. **Add explicit cleanup**: Use `asyncio.get_event_loop().close()` in appropriate places
231
- 2. **Ignore in logs**: This is a known Python issue and can be safely ignored if it doesn't affect functionality
232
- 3. **File issue with Gradio**: If reproducible, report to Gradio GitHub
233
-
234
- ### Impact
235
- - **Severity**: Low - appears to be a cosmetic log issue
236
- - **User-visible**: No - errors occur during garbage collection, not during request handling
237
-
238
- ---
239
-
240
- ## Recommended Priority
241
-
242
- 1. **Bug 1 (Streaming Spam)**: HIGH - makes Advanced mode unusable for reading output
243
- 2. **Bug 3 (OpenAIModel deprecation)**: MEDIUM - fix to avoid future breakage
244
- 3. **Bug 2 (Key Persistence)**: LOW - annoying but users can re-paste
245
- 4. **Bug 4 (Asyncio GC)**: LOW - cosmetic log noise, monitor but likely no action needed
246
-
247
- ## Testing Plan
248
-
249
- 1. Run Advanced mode query, verify no token-by-token spam
250
- 2. Verify no deprecation warnings in logs after OpenAIChatModel fix
251
- 3. Paste API key, click example, verify key persists
252
- 4. Refresh page, verify key persists (if using localStorage)
253
- 5. Run `make check` - all tests pass
254
-
255
- ---
256
-
257
- ## Fix Summary (2025-11-29)
258
-
259
- ### βœ… Bug 1: Token-by-Token Streaming Spam - FIXED
260
-
261
- **Root Cause Analysis:**
262
- - Validated the exact data flow from `orchestrator_magentic.py` β†’ `models.py` β†’ `app.py`
263
- - Confirmed O(NΒ²) complexity: For N tokens, yielding N times with full history each time
264
- - Each `MagenticAgentDeltaEvent` created individual `AgentEvent(type="streaming")`
265
-
266
- **Fix Implementation:**
267
- - **File:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner/src/app.py`
268
- - **Lines Modified:** 158, 171-197
269
- - **Strategy:** Streaming token buffering (Option A from proposals)
270
- 1. Added `streaming_buffer = ""` variable
271
- 2. When `event.type == "streaming"`: accumulate in buffer, skip yield
272
- 3. On non-streaming events: flush buffer, reset
273
- 4. At completion: flush any remaining buffer
274
- - **Result:** One consolidated streaming message instead of hundreds of individual tokens
275
-
276
- **Validation:**
277
- - Created unit test: `tests/unit/test_streaming_fix.py::test_streaming_events_are_buffered_not_spammed`
278
- - Test verifies max 1 buffered streaming message (not N individual ones)
279
- - All 138 tests pass
280
-
281
- ### βœ… Bug 2: API Key Persistence - FIXED
282
-
283
- **Root Cause Analysis:**
284
- - Validated Gradio `ChatInterface.additional_inputs` limitation
285
- - Clicking examples resets textbox values to defaults
286
- - No state persistence mechanism existed
287
-
288
- **Fix Implementation:**
289
- - **File:** `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner/src/app.py`
290
- - **Lines Modified:** 111, 133, 219, 234-252, 268
291
- - **Strategy:** `gr.State` for persistence (Option A from proposals)
292
- 1. Added `api_key_state: str = ""` parameter to `research_agent()`
293
- 2. Logic: Use `api_key` if present, else fallback to `api_key_state`
294
- 3. Created `api_key_state = gr.State("")` component
295
- 4. Added to `additional_inputs` list
296
- 5. Updated examples with empty state placeholders
297
- - **Result:** API key persists across example clicks via state component
298
-
299
- **Validation:**
300
- - Created unit test: `tests/unit/test_streaming_fix.py::test_api_key_state_parameter_exists`
301
- - Test verifies parameter exists and signature is correct
302
- - All 138 tests pass
303
-
304
- ### Files Modified
305
- 1. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner/src/app.py` - Streaming buffering + API key state
306
- 2. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner/docs/bugs/P1_MAGENTIC_STREAMING_AND_KEY_PERSISTENCE.md` - Documentation
307
- 3. `/Users/ray/Desktop/CLARITY-DIGITAL-TWIN/DeepBoner/tests/unit/test_streaming_fix.py` - New validation tests
308
 
309
  ### Test Results
310
  ```
@@ -312,22 +159,20 @@ uv run pytest tests/ -q
312
  ============================= 138 passed in 20.60s =============================
313
  ```
314
 
315
- **Before:** 136 tests
316
- **After:** 138 tests (added 2 validation tests)
317
  **Status:** βœ… All tests passing
318
 
319
  ### Why This Fix Works
320
 
321
  **Bug 1 (Streaming Spam):**
322
- - **Before:** Every token β†’ `append()` β†’ `yield "\n\n".join(all_parts)` β†’ O(NΒ²) spam
323
- - **After:** Every token β†’ `buffer += token` β†’ Skip yield β†’ O(1) per token, O(N) total
324
- - **Impact:** Reduced from hundreds of UI updates to ~1-2 consolidated messages
325
 
326
  **Bug 2 (API Key):**
327
- - **Before:** Textbox value lost on example click (Gradio limitation)
328
- - **After:** `gr.State` survives example clicks, fallback logic ensures key persists
329
- - **Impact:** User doesn't need to re-paste key after clicking examples
330
 
331
  ### Remaining Work
332
- - **Bug 3 (OpenAIModel deprecation):** Not addressed in this fix - separate issue
333
  - **Bug 4 (Asyncio GC errors):** Monitoring only - likely Gradio/HF Spaces issue
 
 
133
 
134
  ### Fix Applied
135
  **Files Modified:**
136
+ 1. `src/app.py`
137
+ 2. `src/utils/llm_factory.py`
 
 
 
138
 
139
+ **Bug 1 (Streaming Spam):**
140
+ - Implemented "smart streaming":
141
+ - Accumulate tokens in `streaming_buffer`
142
+ - Yield updates immediately to show progress (UX improvement)
143
+ - **Crucially**: Do NOT append to the persistent `response_parts` list until the stream segment is complete.
144
+ - This prevents the O(NΒ²) list growth and "new line spam" while keeping the UI responsive.
145
+
146
+ **Bug 2 (API Key Persistence):**
147
+ - **Strategy:** Cleaner Fix (Example List Modification)
148
+ - Instead of complex `gr.State` wiring (which caused context issues), we simply **removed the empty string columns** for `api_key` and `api_key_state` from the `examples` list in `create_demo`.
149
+ - Gradio's behavior is that if an example row has fewer columns than inputs, the remaining inputs (like the API key textbox) are **left unchanged** when the example is clicked.
150
+ - This naturally preserves the user's input without requiring extra state management.
151
+ - The `api_key_state` parameter remains in `research_agent` as a fallback but is largely redundant with this cleaner fix.
152
+
153
+ **Bug 3 (OpenAIModel Deprecation):** βœ… FIXED
154
+ - Replaced all `OpenAIModel` imports with `OpenAIChatModel` in `src/app.py` and `src/utils/llm_factory.py`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  ### Test Results
157
  ```
 
159
  ============================= 138 passed in 20.60s =============================
160
  ```
161
 
 
 
162
  **Status:** βœ… All tests passing
163
 
164
  ### Why This Fix Works
165
 
166
  **Bug 1 (Streaming Spam):**
167
+ - **Before:** Every token β†’ `append()` to list β†’ `yield` β†’ List grew to size N β†’ O(NΒ²) complexity.
168
+ - **After:** Every token β†’ `yield` dynamically constructed string (buffer + history) β†’ List stays size K (number of *events*).
169
+ - **Impact:** Smooth streaming, no visual spam, no browser freeze.
170
 
171
  **Bug 2 (API Key):**
172
+ - **Before:** Example click β†’ Overwrote API Key textbox with `""`.
173
+ - **After:** Example click β†’ Updates only `message` and `mode` β†’ API Key textbox untouched.
174
+ - **Impact:** User input persists naturally.
175
 
176
  ### Remaining Work
 
177
  - **Bug 4 (Asyncio GC errors):** Monitoring only - likely Gradio/HF Spaces issue
178
+
src/app.py CHANGED
@@ -177,7 +177,10 @@ async def research_agent(
177
  if event.type == "streaming":
178
  # Accumulate streaming tokens without emitting individual events
179
  streaming_buffer += event.message
180
- # Don't append to response_parts or yield - just buffer
 
 
 
181
  continue
182
 
183
  # For non-streaming events, flush any buffered streaming content first
@@ -235,20 +238,15 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
235
  [
236
  "What drugs improve female libido post-menopause?",
237
  "simple",
238
- "", # api_key placeholder for examples
239
- "", # api_key_state placeholder for examples
240
  ],
241
  [
242
  "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
243
  "advanced",
244
- "", # api_key placeholder
245
- "", # api_key_state placeholder
246
  ],
247
  [
248
  "Evidence for testosterone therapy in women with HSDD?",
249
  "simple",
250
- "", # api_key placeholder
251
- "", # api_key_state placeholder
252
  ],
253
  ],
254
  additional_inputs_accordion=additional_inputs_accordion,
@@ -269,6 +267,15 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
269
  ],
270
  )
271
 
 
 
 
 
 
 
 
 
 
272
  return demo, additional_inputs_accordion
273
 
274
 
 
177
  if event.type == "streaming":
178
  # Accumulate streaming tokens without emitting individual events
179
  streaming_buffer += event.message
180
+ # Yield the current buffer combined with previous parts to show progress
181
+ # But DO NOT append to response_parts list yet (to avoid O(N^2) list growth)
182
+ current_parts = [*response_parts, f"πŸ“‘ **STREAMING**: {streaming_buffer}"]
183
+ yield "\n\n".join(current_parts)
184
  continue
185
 
186
  # For non-streaming events, flush any buffered streaming content first
 
238
  [
239
  "What drugs improve female libido post-menopause?",
240
  "simple",
241
+ # Removed empty strings for api_key and api_key_state to prevent overwriting
 
242
  ],
243
  [
244
  "Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?",
245
  "advanced",
 
 
246
  ],
247
  [
248
  "Evidence for testosterone therapy in women with HSDD?",
249
  "simple",
 
 
250
  ],
251
  ],
252
  additional_inputs_accordion=additional_inputs_accordion,
 
267
  ],
268
  )
269
 
270
+ # Wire up API key change to update state
271
+ # This ensures that when user types, state is updated.
272
+ # When examples are clicked (and only modify first 2 args), state remains.
273
+ # Note: This requires a Blocks context, which ChatInterface doesn't expose easily here.
274
+ # However, by removing the empty strings from the examples list above,
275
+ # we prevent the API key from being overwritten in the first place,
276
+ # so the api_key textbox retains its value, and research_agent receives it directly.
277
+ # api_key_input.change(lambda x: x, inputs=api_key_input, outputs=api_key_state)
278
+
279
  return demo, additional_inputs_accordion
280
 
281
 
tests/unit/test_streaming_fix.py CHANGED
@@ -51,18 +51,38 @@ async def test_streaming_events_are_buffered_not_spammed():
51
  async for result in research_agent("test query", [], mode="simple", api_key=""):
52
  results.append(result)
53
 
54
- # Verify that we don't have individual streaming events in the output
55
- # Before fix: Would see "πŸ“‘ **STREAMING**: This", "πŸ“‘ **STREAMING**: is", etc.
56
- # After fix: Should see buffered content only
57
-
58
- # Count how many times we see streaming markers
59
- streaming_count = sum(1 for r in results if "πŸ“‘ **STREAMING**:" in r)
60
-
61
- # Should be at most 1 streaming message (buffered), not 4 (one per token)
62
- assert streaming_count <= 1, (
63
- f"Expected at most 1 buffered streaming message, got {streaming_count}. "
64
- f"This indicates token-by-token spam is still happening!"
65
- )
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  # The final result should be the complete message
68
  assert any("Final answer" in r for r in results), "Missing final complete message"
 
51
  async for result in research_agent("test query", [], mode="simple", api_key=""):
52
  results.append(result)
53
 
54
+ # Verify that we DO see streaming updates (for UX responsiveness)
55
+ # But we don't want O(N^2) growth of the persisted list.
56
+
57
+ # We expect results to contain the streaming updates
58
+ assert len(results) > 0, "Should have yielded results"
59
+
60
+ # Check that we see the accumulated message
61
+ assert any(
62
+ "πŸ“‘ **STREAMING**: This is a test" in r for r in results
63
+ ), "Buffer didn't accumulate correctly"
64
+
65
+ # The critical check for the "Spam" bug:
66
+ # In the spam bug, the output grew like:
67
+ # "Stream: T"
68
+ # "Stream: T\nStream: h"
69
+ # "Stream: T\nStream: h\nStream: i"
70
+ #
71
+ # In the fixed version, it should look like:
72
+ # "Stream: T"
73
+ # "Stream: Th"
74
+ # "Stream: Thi"
75
+ # (Replacing the last line, not adding new lines)
76
+
77
+ for res in results:
78
+ # Count occurrences of "πŸ“‘ **STREAMING**:": in a single result string
79
+ # It should appear AT MOST once
80
+ # (unless we have multiple distinct streaming blocks)
81
+ streaming_markers = res.count("πŸ“‘ **STREAMING**:")
82
+ assert streaming_markers <= 1, (
83
+ f"Found multiple streaming markers in single response: {res}\n"
84
+ "This indicates we are appending new lines instead of updating in place."
85
+ )
86
 
87
  # The final result should be the complete message
88
  assert any("Final answer" in r for r in results), "Missing final complete message"