| # P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect) | |
| **Status**: Open | |
| **Priority**: P1 - Major UX degradation | |
| **Affects**: Simple mode, all deployments | |
| **Root Cause**: LLM synthesis silently failing β template fallback | |
| **Related**: SPEC_12 (implemented but not functioning) | |
| --- | |
| ## Problem Statement | |
| SPEC_12 implemented LLM-based narrative synthesis, but users still see **template-formatted bullet points** instead of **prose paragraphs**: | |
| ### What Users See (Template Fallback) | |
| ```markdown | |
| ## Sexual Health Analysis | |
| ### Question | |
| what medication for the best boners? | |
| ### Drug Candidates | |
| - **tadalafil** | |
| - **sildenafil** | |
| ### Key Findings | |
| - Tadalafil improves erectile function | |
| ### Assessment | |
| - **Mechanism Score**: 4/10 | |
| - **Clinical Evidence Score**: 6/10 | |
| ``` | |
| ### What They Should See (LLM Synthesis) | |
| ```markdown | |
| ### Executive Summary | |
| Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction, | |
| with strong evidence from multiple RCTs demonstrating improved erectile function... | |
| ### Background | |
| Erectile dysfunction (ED) is a common male sexual health disorder... | |
| ### Evidence Synthesis | |
| **Mechanism of Action** | |
| Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)... | |
| ``` | |
| --- | |
| ## Root Cause Analysis | |
| ### Location: `src/orchestrators/simple.py:555-564` | |
| ```python | |
| try: | |
| agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt) | |
| result = await agent.run(user_prompt) | |
| narrative = result.output | |
| except Exception as e: # β SILENT FALLBACK | |
| logger.warning("LLM synthesis failed, using template fallback", error=str(e)) | |
| return self._generate_template_synthesis(query, evidence, assessment) | |
| ``` | |
| **The Problem**: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed. | |
| ### Why Synthesis Fails | |
| | Cause | Symptom | Frequency | | |
| |-------|---------|-----------| | |
| | No API key in deployment | HuggingFace Spaces | HIGH | | |
| | API rate limiting | Heavy usage | MEDIUM | | |
| | Token overflow | Long evidence lists | MEDIUM | | |
| | Model mismatch | Wrong model ID | LOW | | |
| | Network timeout | Slow connections | LOW | | |
| --- | |
| ## Evidence: LLM Synthesis WORKS When Configured | |
| Local test with API key: | |
| ```python | |
| # This works perfectly: | |
| agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt) | |
| result = await agent.run(user_prompt) | |
| print(result.output) # β Beautiful narrative prose! | |
| ``` | |
| Output: | |
| ``` | |
| ### Executive Summary | |
| Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction, | |
| with one study (Smith, 2020; N=100) reporting improved erectile function... | |
| ``` | |
| --- | |
| ## Impact | |
| | Metric | Current | Expected | | |
| |--------|---------|----------| | |
| | Report quality | 3/10 (metadata dump) | 9/10 (professional prose) | | |
| | User satisfaction | Low | High | | |
| | Clinical utility | Limited | High | | |
| The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose. | |
| --- | |
| ## Fix Options | |
| ### Option A: Surface Error to User (RECOMMENDED) | |
| When LLM synthesis fails, don't silently fall back. Show the user what went wrong: | |
| ```python | |
| except Exception as e: | |
| logger.error("LLM synthesis failed", error=str(e), exc_info=True) | |
| # Show error in report instead of silent fallback | |
| error_note = f""" | |
| β οΈ **Note**: AI narrative synthesis unavailable. | |
| Showing structured summary instead. | |
| _Technical: {type(e).__name__}: {str(e)[:100]}_ | |
| """ | |
| template = self._generate_template_synthesis(query, evidence, assessment) | |
| return f"{error_note}\n\n{template}" | |
| ``` | |
| ### Option B: HuggingFace Secrets Configuration | |
| For HuggingFace Spaces deployment, add secrets: | |
| - `OPENAI_API_KEY` β Required for synthesis | |
| - `ANTHROPIC_API_KEY` β Alternative provider | |
| ### Option C: Graceful Degradation with Explanation | |
| Add a banner explaining synthesis status: | |
| - β "AI-synthesized narrative report" (when LLM works) | |
| - β οΈ "Structured summary (AI synthesis unavailable)" (fallback) | |
| --- | |
| ## Diagnostic Steps | |
| To determine why synthesis is failing in production: | |
| 1. **Review logs** for warning: `"LLM synthesis failed, using template fallback"` | |
| 2. **Verify API key**: Is `OPENAI_API_KEY` set in environment? | |
| 3. **Confirm model access**: Is `gpt-5` accessible with current API tier? | |
| 4. **Inspect rate limits**: Is the account quota exhausted? | |
| --- | |
| ## Acceptance Criteria | |
| - [ ] Users see narrative prose reports (not bullet points) when API key is configured | |
| - [ ] When synthesis fails, user sees clear indication (not silent fallback) | |
| - [ ] HuggingFace Spaces deployment has proper secrets configured | |
| - [ ] Logging captures the specific exception for debugging | |
| --- | |
| ## Files to Modify | |
| | File | Change | | |
| |------|--------| | |
| | `src/orchestrators/simple.py:555-580` | Add error surfacing in fallback | | |
| | `src/app.py` | Add synthesis status indicator to UI | | |
| | HuggingFace Spaces Settings | Add `OPENAI_API_KEY` secret | | |
| --- | |
| ## Test Plan | |
| 1. Run locally with API key β Should get narrative prose | |
| 2. Run locally WITHOUT API key β Should get template WITH error message | |
| 3. Deploy to HuggingFace with secrets β Should get narrative prose | |
| 4. Deploy to HuggingFace WITHOUT secrets β Should get template WITH warning | |