Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /bugs /archive /P1_NARRATIVE_SYNTHESIS_FALLBACK.md

VibecoderMcSwaggins

feat(search): SPEC_13 Evidence Deduplication (#98)

2c5db87 unverified 22 days ago

preview code

raw

history blame

5.29 kB

	# P1: Narrative Synthesis Falls Back to Template (SPEC_12 Not Taking Effect)

	Status: Open
	Priority: P1 - Major UX degradation
	Affects: Simple mode, all deployments
	Root Cause: LLM synthesis silently failing → template fallback
	Related: SPEC_12 (implemented but not functioning)

	---

	## Problem Statement

	SPEC_12 implemented LLM-based narrative synthesis, but users still see template-formatted bullet points instead of prose paragraphs:

	### What Users See (Template Fallback)

	```markdown
	## Sexual Health Analysis

	### Question
	what medication for the best boners?

	### Drug Candidates
	- tadalafil
	- sildenafil

	### Key Findings
	- Tadalafil improves erectile function

	### Assessment
	- Mechanism Score: 4/10
	- Clinical Evidence Score: 6/10
	```

	### What They Should See (LLM Synthesis)

	```markdown
	### Executive Summary

	Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
	with strong evidence from multiple RCTs demonstrating improved erectile function...

	### Background

	Erectile dysfunction (ED) is a common male sexual health disorder...

	### Evidence Synthesis

	Mechanism of Action
	Sildenafil works by inhibiting phosphodiesterase type 5 (PDE5)...
	```

	---

	## Root Cause Analysis

	### Location: `src/orchestrators/simple.py:555-564`

	```python
	try:
	agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
	result = await agent.run(user_prompt)
	narrative = result.output
	except Exception as e: # ← SILENT FALLBACK
	logger.warning("LLM synthesis failed, using template fallback", error=str(e))
	return self._generate_template_synthesis(query, evidence, assessment)
	```

	The Problem: When ANY exception occurs during LLM synthesis, it silently falls back to template. Users see janky bullet points with no indication that the LLM call failed.

	### Why Synthesis Fails

	\| Cause \| Symptom \| Frequency \|
	\|-------\|---------\|-----------\|
	\| No API key in deployment \| HuggingFace Spaces \| HIGH \|
	\| API rate limiting \| Heavy usage \| MEDIUM \|
	\| Token overflow \| Long evidence lists \| MEDIUM \|
	\| Model mismatch \| Wrong model ID \| LOW \|
	\| Network timeout \| Slow connections \| LOW \|

	---

	## Evidence: LLM Synthesis WORKS When Configured

	Local test with API key:
	```python
	# This works perfectly:
	agent = Agent(model=get_model(), output_type=str, system_prompt=system_prompt)
	result = await agent.run(user_prompt)
	print(result.output) # → Beautiful narrative prose!
	```

	Output:
	```
	### Executive Summary

	Sildenafil demonstrates clinically meaningful efficacy for erectile dysfunction,
	with one study (Smith, 2020; N=100) reporting improved erectile function...
	```

	---

	## Impact

	\| Metric \| Current \| Expected \|
	\|--------\|---------\|----------\|
	\| Report quality \| 3/10 (metadata dump) \| 9/10 (professional prose) \|
	\| User satisfaction \| Low \| High \|
	\| Clinical utility \| Limited \| High \|

	The ENTIRE VALUE PROPOSITION of the research agent is the synthesized report. Template output defeats the purpose.

	---

	## Fix Options

	### Option A: Surface Error to User (RECOMMENDED)

	When LLM synthesis fails, don't silently fall back. Show the user what went wrong:

	```python
	except Exception as e:
	logger.error("LLM synthesis failed", error=str(e), exc_info=True)

	# Show error in report instead of silent fallback
	error_note = f"""
	⚠️ Note: AI narrative synthesis unavailable.
	Showing structured summary instead.

	_Technical: {type(e).__name__}: {str(e)[:100]}_
	"""
	template = self._generate_template_synthesis(query, evidence, assessment)
	return f"{error_note}\n\n{template}"
	```

	### Option B: HuggingFace Secrets Configuration

	For HuggingFace Spaces deployment, add secrets:
	- `OPENAI_API_KEY` → Required for synthesis
	- `ANTHROPIC_API_KEY` → Alternative provider

	### Option C: Graceful Degradation with Explanation

	Add a banner explaining synthesis status:
	- ✅ "AI-synthesized narrative report" (when LLM works)
	- ⚠️ "Structured summary (AI synthesis unavailable)" (fallback)

	---

	## Diagnostic Steps

	To determine why synthesis is failing in production:

	1. Review logs for warning: `"LLM synthesis failed, using template fallback"`
	2. Verify API key: Is `OPENAI_API_KEY` set in environment?
	3. Confirm model access: Is `gpt-5` accessible with current API tier?
	4. Inspect rate limits: Is the account quota exhausted?

	---

	## Acceptance Criteria

	- [ ] Users see narrative prose reports (not bullet points) when API key is configured
	- [ ] When synthesis fails, user sees clear indication (not silent fallback)
	- [ ] HuggingFace Spaces deployment has proper secrets configured
	- [ ] Logging captures the specific exception for debugging

	---

	## Files to Modify

	\| File \| Change \|
	\|------\|--------\|
	\| `src/orchestrators/simple.py:555-580` \| Add error surfacing in fallback \|
	\| `src/app.py` \| Add synthesis status indicator to UI \|
	\| HuggingFace Spaces Settings \| Add `OPENAI_API_KEY` secret \|

	---

	## Test Plan

	1. Run locally with API key → Should get narrative prose
	2. Run locally WITHOUT API key → Should get template WITH error message
	3. Deploy to HuggingFace with secrets → Should get narrative prose
	4. Deploy to HuggingFace WITHOUT secrets → Should get template WITH warning