Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on Nov 30, 2025

Commit

91a017e

unverified ·

2 Parent(s): 89f1173 84a8bce

Merge pull request #93 from The-Obstacle-Is-The-Way/feat/sexual-health-spec-11

Browse files

Files changed (7) hide show

BRAINSTORM_EMBEDDINGS_META.md → docs/brainstorming/BRAINSTORM_EMBEDDINGS_META.md +0 -0
docs/bugs/ACTIVE_BUGS.md +17 -1
docs/bugs/P1_NARRATIVE_SYNTHESIS_FALLBACK.md +185 -0
SPEC_12_NARRATIVE_SYNTHESIS.md → docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md +0 -0
src/app.py +42 -6
src/orchestrators/simple.py +15 -4
tests/unit/orchestrators/test_simple_synthesis.py +7 -3

BRAINSTORM_EMBEDDINGS_META.md → docs/brainstorming/BRAINSTORM_EMBEDDINGS_META.md RENAMED Viewed

File without changes

docs/bugs/ACTIVE_BUGS.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # Active Bugs
-> Last updated: 2025-11-29
 ## P0 - Blocker
@@ -8,6 +8,22 @@
 ---
 ## P3 - Architecture/Enhancement
 ### ~~P3 - Missing Structured Cognitive Memory~~ FIXED (Phase 1)

 # Active Bugs
+> Last updated: 2025-11-30
 ## P0 - Blocker
 ---
+## P1 - Important
+### P1 - Narrative Synthesis Falls Back to Template (NEW)
+**File:** `P1_NARRATIVE_SYNTHESIS_FALLBACK.md`
+**Related:** SPEC_12 (implemented but falling back)
+**Problem:** Users see bullet-point template output instead of LLM-generated narrative prose.
+**Root Cause:** Any exception in LLM synthesis triggers silent fallback to template.
+**Impact:** Core value proposition (synthesized reports) not delivered.
+**Fix Options:**
+1. Surface errors to user instead of silent fallback
+2. Configure HuggingFace Spaces secrets with API keys
+3. Add synthesis status indicator in UI
+---
 ## P3 - Architecture/Enhancement
 ### ~~P3 - Missing Structured Cognitive Memory~~ FIXED (Phase 1)

docs/bugs/P1_NARRATIVE_SYNTHESIS_FALLBACK.md ADDED Viewed

	@@ -0,0 +1,185 @@

+# P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)
+**Status**: Open
+**Priority**: P1 - Major UX degradation
+**Affects**: Simple mode, all deployments
+**Root Cause**: LLM synthesis silently failing → template fallback
+**Related**: SPEC_12 (implemented but not functioning)
+---
+## Problem Statement
+SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**:
+### What Users See (Template Fallback)
+```markdown
+## Sexual Health Analysis
+### Question
+what medication for the best boners?
+### Drug Candidates
+- **tadalafil**
+- **sildenafil**
+### Key Findings
+- Tadalafil improves erectile function
+### Assessment
+- **Mechanism Score**: 4/10
+- **Clinical Evidence Score**: 6/10
+```
+### What They Should See (LLM Synthesis)
+```markdown
+### Executive Summary
+Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
+with strong evidence from multiple RCTs demonstrating improved erectile function...
+### Background
+Erectile dysfunction (ED) is a common male sexual health disorder...
+### Evidence Synthesis
+**Mechanism of Action**
+Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
+```
+---
+## Root Cause Analysis
+### Location: `src/orchestrators/simple.py:555-564`
+```python
+try:
+    agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
+    result = await agent.run(user_prompt)
+    narrative = result.output
+except Exception as e:  # ← SILENT FALLBACK
+    logger.warning("LLM synthesis failed, using template fallback", error=str(e))
+    return self._generate_template_synthesis(query, evidence, assessment)
+```
+**The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.
+### Why Synthesis Fails
+| Cause | Symptom | Frequency |
+|-------|---------|-----------|
+| No API key in deployment | HuggingFace Spaces | HIGH |
+| API rate limiting | Heavy usage | MEDIUM |
+| Token overflow | Long evidence lists | MEDIUM |
+| Model mismatch | Wrong model ID | LOW |
+| Network timeout | Slow connections | LOW |
+---
+## Evidence: LLM Synthesis WORKS When Configured
+Local test with API key:
+```python
+# This works perfectly:
+agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
+result = await agent.run(user_prompt)
+print(result.output)  # → Beautiful narrative prose!
+```
+Output:
+```
+### Executive Summary
+Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
+with one study (Smith, 2020; N=100) reporting improved erectile function...
+```
+---
+## Impact
+| Metric | Current | Expected |
+|--------|---------|----------|
+| Report quality | 3/10 (metadata dump) | 9/10 (professional prose) |
+| User satisfaction | Low | High |
+| Clinical utility | Limited | High |
+The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.
+---
+## Fix Options
+### Option A: Surface Error to User (RECOMMENDED)
+When LLM synthesis fails, don't silently fall back. Show the user what went wrong:
+```python
+except Exception as e:
+    logger.error("LLM synthesis failed", error=str(e), exc_info=True)
+    # Show error in report instead of silent fallback
+    error_note = f"""
+⚠️ **Note**: AI narrative synthesis unavailable.
+Showing structured summary instead.
+_Technical: {type(e).__name__}: {str(e)[:100]}_
+"""
+    template = self._generate_template_synthesis(query, evidence, assessment)
+    return f"{error_note}\n\n{template}"
+```
+### Option B: HuggingFace Secrets Configuration
+For HuggingFace Spaces deployment, add secrets:
+- `OPENAI_API_KEY` → Required for synthesis
+- `ANTHROPIC_API_KEY` → Alternative provider
+### Option C: Graceful Degradation with Explanation
+Add a banner explaining synthesis status:
+- ✅ "AI-synthesized narrative report" (when LLM works)
+- ⚠️ "Structured summary (AI synthesis unavailable)" (fallback)
+---
+## Diagnostic Steps
+To determine why synthesis is failing in production:
+1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"`
+2. **Verify API key**: Is `OPENAI_API_KEY` set in environment?
+3. **Confirm model access**: Is `gpt-5` accessible with current API tier?
+4. **Inspect rate limits**: Is the account quota exhausted?
+---
+## Acceptance Criteria
+- [ ] Users see narrative prose reports (not bullet points) when API key is configured
+- [ ] When synthesis fails, user sees clear indication (not silent fallback)
+- [ ] HuggingFace Spaces deployment has proper secrets configured
+- [ ] Logging captures the specific exception for debugging
+---
+## Files to Modify
+| File | Change |
+|------|--------|
+| `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback |
+| `src/app.py` | Add synthesis status indicator to UI |
+| HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret |
+---
+## Test Plan
+1. Run locally with API key → Should get narrative prose
+2. Run locally WITHOUT API key → Should get template WITH error message
+3. Deploy to HuggingFace with secrets → Should get narrative prose
+4. Deploy to HuggingFace WITHOUT secrets → Should get template WITH warning

SPEC_12_NARRATIVE_SYNTHESIS.md → docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md RENAMED Viewed

File without changes

src/app.py CHANGED Viewed

@@ -25,6 +25,33 @@ from src.utils.models import OrchestratorConfig
 OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
 def configure_orchestrator(
     use_mock: bool = False,
     mode: OrchestratorMode = "simple",
@@ -247,14 +274,21 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
     api_key_state = gr.State("")
     # 1. Unwrapped ChatInterface (Fixes Accordion Bug)
     description = (
-        "*AI-Powered Research Agent — searches PubMed, "
-        "ClinicalTrials.gov, Europe PMC & OpenAlex*\n\n"
         "Deep research for sexual wellness, ED treatments, hormone therapy, "
-        "libido, and reproductive health - for all genders.\n\n"
-        "---\n"
-        "*Research tool only — not for medical advice.*  \n"
-        "**MCP Server Active**: Connect Claude Desktop to `/gradio_api/mcp/`"
     )
     demo = gr.ChatInterface(
@@ -304,6 +338,7 @@ def create_demo() -> tuple[gr.ChatInterface, gr.Accordion]:
                 placeholder="sk-... (OpenAI) or sk-ant-... (Anthropic)",
                 type="password",
                 info="Leave empty for free tier. Auto-detects provider from key prefix.",
             ),
             api_key_state,  # Hidden state component for persistence
         ],
@@ -321,6 +356,7 @@ def main() -> None:
         share=False,
         mcp_server=True,
         ssr_mode=False,  # Fix for intermittent loading/hydration issues in HF Spaces
     )

 OrchestratorMode = Literal["simple", "magentic", "advanced", "hierarchical"]
+# CSS to force dark mode on API key input
+# NOTE: Browser autofill requires -webkit-autofill selectors to override
+CUSTOM_CSS = """
+.api-key-input input {
+    background-color: #1f2937 !important;
+    color: white !important;
+    border-color: #374151 !important;
+}
+.api-key-input input:focus,
+.api-key-input input:focus-visible {
+    background-color: #1f2937 !important;
+    color: white !important;
+    border-color: #e879f9 !important;
+    outline: none !important;
+}
+/* Override aggressive browser autofill styling */
+.api-key-input input:-webkit-autofill,
+.api-key-input input:-webkit-autofill:hover,
+.api-key-input input:-webkit-autofill:focus {
+    -webkit-box-shadow: 0 0 0px 1000px #1f2937 inset !important;
+    -webkit-text-fill-color: white !important;
+    caret-color: white !important;
+    transition: background-color 5000s ease-in-out 0s;
+}
+"""
 def configure_orchestrator(
     use_mock: bool = False,
     mode: OrchestratorMode = "simple",
     api_key_state = gr.State("")
     # 1. Unwrapped ChatInterface (Fixes Accordion Bug)
+    # NOTE: Using inline styles on each element because HR breaks text-align inheritance
     description = (
+        "<div style='text-align: center;'>"
+        "<em>AI-Powered Research Agent — searches PubMed, "
+        "ClinicalTrials.gov, Europe PMC & OpenAlex</em><br><br>"
         "Deep research for sexual wellness, ED treatments, hormone therapy, "
+        "libido, and reproductive health - for all genders."
+        "</div>"
+        "<hr style='margin: 1em auto; width: 80%; border: none; "
+        "border-top: 1px solid #374151;'>"
+        "<div style='text-align: center;'>"
+        "<em>Research tool only — not for medical advice.</em><br>"
+        "<strong>MCP Server Active</strong>: Connect Claude Desktop to "
+        "<code>/gradio_api/mcp/</code>"
+        "</div>"
     )
     demo = gr.ChatInterface(
                 placeholder="sk-... (OpenAI) or sk-ant-... (Anthropic)",
                 type="password",
                 info="Leave empty for free tier. Auto-detects provider from key prefix.",
+                elem_classes=["api-key-input"],
             ),
             api_key_state,  # Hidden state component for persistence
         ],
         share=False,
         mcp_server=True,
         ssr_mode=False,  # Fix for intermittent loading/hydration issues in HF Spaces
+        css=CUSTOM_CSS,  # Moved here for Gradio 6.0 support
     )

src/orchestrators/simple.py CHANGED Viewed

@@ -541,11 +541,13 @@ class Orchestrator:
             from src.agent_factory.judges import get_model
-            # Create synthesis agent (string output, not structured)
             agent: Agent[None, str] = Agent(
                 model=get_model(),
                 output_type=str,
                 system_prompt=system_prompt,
             )
             result = await agent.run(user_prompt)
             narrative = result.output
@@ -554,14 +556,23 @@ class Orchestrator:
         except Exception as e:
             # Fallback to template synthesis if LLM fails
-            # This is intentionally broad - LLM can fail many ways (API, parsing, etc.)
-            logger.warning(
                 "LLM synthesis failed, using template fallback",
                 error=str(e),
                 exc_type=type(e).__name__,
                 evidence_count=len(evidence),
             )
-            return self._generate_template_synthesis(query, evidence, assessment)
         # Add full citation list footer
         citations = "\n".join(

             from src.agent_factory.judges import get_model
+            # Create synthesis agent with retries (matching Judge agent pattern)
+            # Without retries, transient errors immediately trigger fallback
             agent: Agent[None, str] = Agent(
                 model=get_model(),
                 output_type=str,
                 system_prompt=system_prompt,
+                retries=3,  # Match Judge agent - retry on transient errors
             )
             result = await agent.run(user_prompt)
             narrative = result.output
         except Exception as e:
             # Fallback to template synthesis if LLM fails
+            # Log error details for debugging
+            logger.error(
                 "LLM synthesis failed, using template fallback",
                 error=str(e),
                 exc_type=type(e).__name__,
                 evidence_count=len(evidence),
+                exc_info=True,  # Capture stack trace for debugging
+            )
+            # Surface the error to user (MS Agent Framework pattern)
+            # Don't silently fall back - let user know synthesis degraded
+            error_note = (
+                f"\n\n> ⚠️ **Note**: AI narrative synthesis unavailable. "
+                f"Showing structured summary.\n"
+                f"> _Error: {type(e).__name__}_\n"
             )
+            template = self._generate_template_synthesis(query, evidence, assessment)
+            return f"{error_note}\n{template}"
         # Add full citation list footer
         citations = "\n".join(

tests/unit/orchestrators/test_simple_synthesis.py CHANGED Viewed

@@ -130,12 +130,12 @@ Long-term safety data is limited.
             assert "Evidence Synthesis" in result
     @pytest.mark.asyncio
-    async def test_falls_back_on_llm_error(
         self,
         sample_evidence: list[Evidence],
         sample_assessment: JudgeAssessment,
     ) -> None:
-        """Synthesis should fall back to template if LLM fails."""
         mock_search = MagicMock()
         mock_judge = MagicMock()
@@ -155,7 +155,11 @@ Long-term safety data is limited.
                 assessment=sample_assessment,
             )
-            # Should return template fallback (has Assessment section)
             assert "Assessment" in result or "Drug Candidates" in result
             assert "Testosterone" in result  # Drug candidate should be present

             assert "Evidence Synthesis" in result
     @pytest.mark.asyncio
+    async def test_falls_back_on_llm_error_with_notice(
         self,
         sample_evidence: list[Evidence],
         sample_assessment: JudgeAssessment,
     ) -> None:
+        """Synthesis should fall back to template if LLM fails, WITH error notice."""
         mock_search = MagicMock()
         mock_judge = MagicMock()
                 assessment=sample_assessment,
             )
+            # Should surface error to user (MS Agent Framework pattern)
+            assert "AI narrative synthesis unavailable" in result
+            assert "Error" in result
+            # Should still include template content
             assert "Assessment" in result or "Drug Candidates" in result
             assert "Testosterone" in result  # Drug candidate should be present