VibecoderMcSwaggins commited on
Commit
91a017e
Β·
unverified Β·
2 Parent(s): 89f1173 84a8bce

Merge pull request #93 from The-Obstacle-Is-The-Way/feat/sexual-health-spec-11

Browse files
BRAINSTORM_EMBEDDINGS_META.md β†’ docs/brainstorming/BRAINSTORM_EMBEDDINGS_META.md RENAMED
File without changes
docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,6 +1,6 @@
1
  # Active Bugs
2
 
3
- > Last updated: 2025-11-29
4
 
5
  ## P0 - Blocker
6
 
@@ -8,6 +8,22 @@
8
 
9
  ---
10
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ## P3 - Architecture/Enhancement
12
 
13
  ### ~~P3 - Missing Structured Cognitive Memory~~ FIXED (Phase 1)
 
1
  # Active Bugs
2
 
3
+ > Last updated: 2025-11-30
4
 
5
  ## P0 - Blocker
6
 
 
8
 
9
  ---
10
 
11
+ ## P1 - Important
12
+
13
+ ### P1 - Narrative Synthesis Falls Back to Template (NEW)
14
+ **File:** `P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
15
+ **Related:** SPEC_12 (implemented but falling back)
16
+
17
+ **Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
18
+ **Root Cause:** Any exception in LLM synthesis triggers silent fallback to template.
19
+ **Impact:** Core value proposition (synthesized reports) not delivered.
20
+ **Fix Options:**
21
+ 1. Surface errors to user instead of silent fallback
22
+ 2. Configure HuggingFace Spaces secrets with API keys
23
+ 3. Add synthesis status indicator in UI
24
+
25
+ ---
26
+
27
  ## P3 - Architecture/Enhancement
28
 
29
  ### ~~P3 - Missing Structured Cognitive Memory~~ FIXED (Phase 1)
docs/bugs/P1_NARRATIVE_SYNTHESIS_FALLBACK.md ADDED
@@ -0,0 +1,185 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)
2
+
3
+ **Status**: Open
4
+ **Priority**: P1 - Major UX degradation
5
+ **Affects**: Simple mode, all deployments
6
+ **Root Cause**: LLM synthesis silently failing β†’ template fallback
7
+ **Related**: SPEC_12 (implemented but not functioning)
8
+
9
+ ---
10
+
11
+ ## Problem Statement
12
+
13
+ SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**:
14
+
15
+ ### What Users See (Template Fallback)
16
+
17
+ ```markdown
18
+ ## Sexual Health Analysis
19
+
20
+ ### Question
21
+ what medication for the best boners?
22
+
23
+ ### Drug Candidates
24
+ - **tadalafil**
25
+ - **sildenafil**
26
+
27
+ ### Key Findings
28
+ - Tadalafil improves erectile function
29
+
30
+ ### Assessment
31
+ - **Mechanism Score**: 4/10
32
+ - **Clinical Evidence Score**: 6/10
33
+ ```
34
+
35
+ ### What They Should See (LLM Synthesis)
36
+
37
+ ```markdown
38
+ ### Executive Summary
39
+
40
+ Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
41
+ with strong evidence from multiple RCTs demonstrating improved erectile function...
42
+
43
+ ### Background
44
+
45
+ Erectile dysfunction (ED) is a common male sexual health disorder...
46
+
47
+ ### Evidence Synthesis
48
+
49
+ **Mechanism of Action**
50
+ Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Root Cause Analysis
56
+
57
+ ### Location: `src/orchestrators/simple.py:555-564`
58
+
59
+ ```python
60
+ try:
61
+ agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
62
+ result = await agent.run(user_prompt)
63
+ narrative = result.output
64
+ except Exception as e: # ← SILENT FALLBACK
65
+ logger.warning("LLM synthesis failed, using template fallback", error=str(e))
66
+ return self._generate_template_synthesis(query, evidence, assessment)
67
+ ```
68
+
69
+ **The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.
70
+
71
+ ### Why Synthesis Fails
72
+
73
+ | Cause | Symptom | Frequency |
74
+ |-------|---------|-----------|
75
+ | No API key in deployment | HuggingFace Spaces | HIGH |
76
+ | API rate limiting | Heavy usage | MEDIUM |
77
+ | Token overflow | Long evidence lists | MEDIUM |
78
+ | Model mismatch | Wrong model ID | LOW |
79
+ | Network timeout | Slow connections | LOW |
80
+
81
+ ---
82
+
83
+ ## Evidence: LLM Synthesis WORKS When Configured
84
+
85
+ Local test with API key:
86
+ ```python
87
+ # This works perfectly:
88
+ agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
89
+ result = await agent.run(user_prompt)
90
+ print(result.output) # β†’ Beautiful narrative prose!
91
+ ```
92
+
93
+ Output:
94
+ ```
95
+ ### Executive Summary
96
+
97
+ Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
98
+ with one study (Smith, 2020; N=100) reporting improved erectile function...
99
+ ```
100
+
101
+ ---
102
+
103
+ ## Impact
104
+
105
+ | Metric | Current | Expected |
106
+ |--------|---------|----------|
107
+ | Report quality | 3/10 (metadata dump) | 9/10 (professional prose) |
108
+ | User satisfaction | Low | High |
109
+ | Clinical utility | Limited | High |
110
+
111
+ The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.
112
+
113
+ ---
114
+
115
+ ## Fix Options
116
+
117
+ ### Option A: Surface Error to User (RECOMMENDED)
118
+
119
+ When LLM synthesis fails, don't silently fall back. Show the user what went wrong:
120
+
121
+ ```python
122
+ except Exception as e:
123
+ logger.error("LLM synthesis failed", error=str(e), exc_info=True)
124
+
125
+ # Show error in report instead of silent fallback
126
+ error_note = f"""
127
+ ⚠️ **Note**: AI narrative synthesis unavailable.
128
+ Showing structured summary instead.
129
+
130
+ _Technical: {type(e).__name__}: {str(e)[:100]}_
131
+ """
132
+ template = self._generate_template_synthesis(query, evidence, assessment)
133
+ return f"{error_note}\n\n{template}"
134
+ ```
135
+
136
+ ### Option B: HuggingFace Secrets Configuration
137
+
138
+ For HuggingFace Spaces deployment, add secrets:
139
+ - `OPENAI_API_KEY` β†’ Required for synthesis
140
+ - `ANTHROPIC_API_KEY` β†’ Alternative provider
141
+
142
+ ### Option C: Graceful Degradation with Explanation
143
+
144
+ Add a banner explaining synthesis status:
145
+ - βœ… "AI-synthesized narrative report" (when LLM works)
146
+ - ⚠️ "Structured summary (AI synthesis unavailable)" (fallback)
147
+
148
+ ---
149
+
150
+ ## Diagnostic Steps
151
+
152
+ To determine why synthesis is failing in production:
153
+
154
+ 1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"`
155
+ 2. **Verify API key**: Is `OPENAI_API_KEY` set in environment?
156
+ 3. **Confirm model access**: Is `gpt-5` accessible with current API tier?
157
+ 4. **Inspect rate limits**: Is the account quota exhausted?
158
+
159
+ ---
160
+
161
+ ## Acceptance Criteria
162
+
163
+ - [ ] Users see narrative prose reports (not bullet points) when API key is configured
164
+ - [ ] When synthesis fails, user sees clear indication (not silent fallback)
165
+ - [ ] HuggingFace Spaces deployment has proper secrets configured
166
+ - [ ] Logging captures the specific exception for debugging
167
+
168
+ ---
169
+
170
+ ## Files to Modify
171
+
172
+ | File | Change |
173
+ |------|--------|
174
+ | `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback |
175
+ | `src/app.py` | Add synthesis status indicator to UI |
176
+ | HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret |
177
+
178
+ ---
179
+
180
+ ## Test Plan
181
+
182
+ 1. Run locally with API key β†’ Should get narrative prose
183
+ 2. Run locally WITHOUT API key β†’ Should get template WITH error message
184
+ 3. Deploy to HuggingFace with secrets β†’ Should get narrative prose
185
+ 4. Deploy to HuggingFace WITHOUT secrets β†’ Should get template WITH warning
SPEC_12_NARRATIVE_SYNTHESIS.md β†’ docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md RENAMED
File without changes
src/app.py CHANGED
@@ -25,6 +25,33 @@ from src.utils.models import OrchestratorConfig
25
  OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
26
 
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  def configure_orchestrator(
29
  use_mock: bool = False,
30
  mode: OrchestratorMode = "simple",
@@ -247,14 +274,21 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
247
  api_key_state = gr.State("")
248
 
249
  # 1. Unwrapped ChatInterface (Fixes Accordion Bug)
 
250
  description = (
251
- "*AI-Powered Research Agent β€” searches PubMed, "
252
- "ClinicalTrials.gov, Europe PMC & OpenAlex*\n\n"
 
253
  "Deep research for sexual wellness, ED treatments, hormone therapy, "
254
- "libido, and reproductive health - for all genders.\n\n"
255
- "---\n"
256
- "*Research tool only β€” not for medical advice.* \n"
257
- "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`"
 
 
 
 
 
258
  )
259
 
260
  demo = gr.ChatInterface(
@@ -304,6 +338,7 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
304
  placeholder="sk-... (OpenAI) or sk-ant-... (Anthropic)",
305
  type="password",
306
  info="Leave empty for free tier. Auto-detects provider from key prefix.",
 
307
  ),
308
  api_key_state, # Hidden state component for persistence
309
  ],
@@ -321,6 +356,7 @@ def main() -> None:
321
  share=False,
322
  mcp_server=True,
323
  ssr_mode=False, # Fix for intermittent loading/hydration issues in HF Spaces
 
324
  )
325
 
326
 
 
25
  OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
26
 
27
 
28
+ # CSS to force dark mode on API key input
29
+ # NOTE: Browser autofill requires -webkit-autofill selectors to override
30
+ CUSTOM_CSS = """
31
+ .api-key-input input {
32
+ background-color: #1f2937 !important;
33
+ color: white !important;
34
+ border-color: #374151 !important;
35
+ }
36
+ .api-key-input input:focus,
37
+ .api-key-input input:focus-visible {
38
+ background-color: #1f2937 !important;
39
+ color: white !important;
40
+ border-color: #e879f9 !important;
41
+ outline: none !important;
42
+ }
43
+ /* Override aggressive browser autofill styling */
44
+ .api-key-input input:-webkit-autofill,
45
+ .api-key-input input:-webkit-autofill:hover,
46
+ .api-key-input input:-webkit-autofill:focus {
47
+ -webkit-box-shadow: 0 0 0px 1000px #1f2937 inset !important;
48
+ -webkit-text-fill-color: white !important;
49
+ caret-color: white !important;
50
+ transition: background-color 5000s ease-in-out 0s;
51
+ }
52
+ """
53
+
54
+
55
  def configure_orchestrator(
56
  use_mock: bool = False,
57
  mode: OrchestratorMode = "simple",
 
274
  api_key_state = gr.State("")
275
 
276
  # 1. Unwrapped ChatInterface (Fixes Accordion Bug)
277
+ # NOTE: Using inline styles on each element because HR breaks text-align inheritance
278
  description = (
279
+ "<div style='text-align: center;'>"
280
+ "<em>AI-Powered Research Agent β€” searches PubMed, "
281
+ "ClinicalTrials.gov, Europe PMC & OpenAlex</em><br><br>"
282
  "Deep research for sexual wellness, ED treatments, hormone therapy, "
283
+ "libido, and reproductive health - for all genders."
284
+ "</div>"
285
+ "<hr style='margin: 1em auto; width: 80%; border: none; "
286
+ "border-top: 1px solid #374151;'>"
287
+ "<div style='text-align: center;'>"
288
+ "<em>Research tool only β€” not for medical advice.</em><br>"
289
+ "<strong>MCP Server Active</strong>: Connect Claude Desktop to "
290
+ "<code>/gradio_api/mcp/</code>"
291
+ "</div>"
292
  )
293
 
294
  demo = gr.ChatInterface(
 
338
  placeholder="sk-... (OpenAI) or sk-ant-... (Anthropic)",
339
  type="password",
340
  info="Leave empty for free tier. Auto-detects provider from key prefix.",
341
+ elem_classes=["api-key-input"],
342
  ),
343
  api_key_state, # Hidden state component for persistence
344
  ],
 
356
  share=False,
357
  mcp_server=True,
358
  ssr_mode=False, # Fix for intermittent loading/hydration issues in HF Spaces
359
+ css=CUSTOM_CSS, # Moved here for Gradio 6.0 support
360
  )
361
 
362
 
src/orchestrators/simple.py CHANGED
@@ -541,11 +541,13 @@ class Orchestrator:
541
 
542
  from src.agent_factory.judges import get_model
543
 
544
- # Create synthesis agent (string output, not structured)
 
545
  agent: Agent[None, str] = Agent(
546
  model=get_model(),
547
  output_type=str,
548
  system_prompt=system_prompt,
 
549
  )
550
  result = await agent.run(user_prompt)
551
  narrative = result.output
@@ -554,14 +556,23 @@ class Orchestrator:
554
 
555
  except Exception as e:
556
  # Fallback to template synthesis if LLM fails
557
- # This is intentionally broad - LLM can fail many ways (API, parsing, etc.)
558
- logger.warning(
559
  "LLM synthesis failed, using template fallback",
560
  error=str(e),
561
  exc_type=type(e).__name__,
562
  evidence_count=len(evidence),
 
 
 
 
 
 
 
 
563
  )
564
- return self._generate_template_synthesis(query, evidence, assessment)
 
565
 
566
  # Add full citation list footer
567
  citations = "\n".join(
 
541
 
542
  from src.agent_factory.judges import get_model
543
 
544
+ # Create synthesis agent with retries (matching Judge agent pattern)
545
+ # Without retries, transient errors immediately trigger fallback
546
  agent: Agent[None, str] = Agent(
547
  model=get_model(),
548
  output_type=str,
549
  system_prompt=system_prompt,
550
+ retries=3, # Match Judge agent - retry on transient errors
551
  )
552
  result = await agent.run(user_prompt)
553
  narrative = result.output
 
556
 
557
  except Exception as e:
558
  # Fallback to template synthesis if LLM fails
559
+ # Log error details for debugging
560
+ logger.error(
561
  "LLM synthesis failed, using template fallback",
562
  error=str(e),
563
  exc_type=type(e).__name__,
564
  evidence_count=len(evidence),
565
+ exc_info=True, # Capture stack trace for debugging
566
+ )
567
+ # Surface the error to user (MS Agent Framework pattern)
568
+ # Don't silently fall back - let user know synthesis degraded
569
+ error_note = (
570
+ f"\n\n> ⚠️ **Note**: AI narrative synthesis unavailable. "
571
+ f"Showing structured summary.\n"
572
+ f"> _Error: {type(e).__name__}_\n"
573
  )
574
+ template = self._generate_template_synthesis(query, evidence, assessment)
575
+ return f"{error_note}\n{template}"
576
 
577
  # Add full citation list footer
578
  citations = "\n".join(
tests/unit/orchestrators/test_simple_synthesis.py CHANGED
@@ -130,12 +130,12 @@ Long-term safety data is limited.
130
  assert "Evidence Synthesis" in result
131
 
132
  @pytest.mark.asyncio
133
- async def test_falls_back_on_llm_error(
134
  self,
135
  sample_evidence: list[Evidence],
136
  sample_assessment: JudgeAssessment,
137
  ) -> None:
138
- """Synthesis should fall back to template if LLM fails."""
139
  mock_search = MagicMock()
140
  mock_judge = MagicMock()
141
 
@@ -155,7 +155,11 @@ Long-term safety data is limited.
155
  assessment=sample_assessment,
156
  )
157
 
158
- # Should return template fallback (has Assessment section)
 
 
 
 
159
  assert "Assessment" in result or "Drug Candidates" in result
160
  assert "Testosterone" in result # Drug candidate should be present
161
 
 
130
  assert "Evidence Synthesis" in result
131
 
132
  @pytest.mark.asyncio
133
+ async def test_falls_back_on_llm_error_with_notice(
134
  self,
135
  sample_evidence: list[Evidence],
136
  sample_assessment: JudgeAssessment,
137
  ) -> None:
138
+ """Synthesis should fall back to template if LLM fails, WITH error notice."""
139
  mock_search = MagicMock()
140
  mock_judge = MagicMock()
141
 
 
155
  assessment=sample_assessment,
156
  )
157
 
158
+ # Should surface error to user (MS Agent Framework pattern)
159
+ assert "AI narrative synthesis unavailable" in result
160
+ assert "Error" in result
161
+
162
+ # Should still include template content
163
  assert "Assessment" in result or "Drug Candidates" in result
164
  assert "Testosterone" in result # Drug candidate should be present
165