SPEC_12: Narrative Report Synthesis
Status: Draft Priority: P1 - Core deliverable Related Issues: #85, #86 Related Spec: SPEC_11 (Sexual Health Focus)
Problem Statement
DeepBoner's report generation outputs structured metadata instead of synthesized prose. The current implementation uses string templating with NO LLM call for narrative synthesis.
Current Output (Actual)
## Sexual Health Analysis
### Question
Testosterone therapy for hypoactive sexual desire disorder?
### Drug Candidates
- **Testosterone**
- **LibiGel**
- **Androgel**
### Key Findings
- Testosterone therapy improves sexual desire and activity in postmenopausal women with HSDD.
- Transdermal testosterone is a preferred formulation.
### Assessment
- **Mechanism Score**: 8/10
- **Clinical Evidence Score**: 9/10
- **Confidence**: 90%
### Reasoning
The evidence provides a clear understanding of the mechanism of action...
### Citations (33 sources)
1. [Title](url)...
Expected Output (Professional Research Report)
## Sexual Health Research Report: Testosterone Therapy for Hypoactive Sexual Desire Disorder
### Executive Summary
Testosterone therapy represents a well-established, evidence-based treatment for
hypoactive sexual desire disorder (HSDD) in postmenopausal women. Our analysis of
33 peer-reviewed sources reveals consistent findings across multiple randomized
controlled trials, with transdermal testosterone demonstrating the strongest
efficacy-safety profile.
### Background
Hypoactive sexual desire disorder affects an estimated 12% of postmenopausal women
and is characterized by persistent lack of sexual interest causing personal distress.
The International Society for the Study of Women's Sexual Health (ISSWSH) published
clinical guidelines in 2021 establishing testosterone as a recommended intervention...
### Evidence Synthesis
**Mechanism of Action**
Testosterone exerts its effects on sexual desire through multiple pathways. At the
hypothalamic level, testosterone modulates dopaminergic signaling that underlies
libido. Evidence from Smith et al. (2021) demonstrates that androgen receptor
activation in the central nervous system correlates with subjective measures of
sexual desire (r=0.67, p<0.001)...
**Clinical Trial Evidence**
A systematic review of 8 randomized controlled trials (N=3,035) demonstrated that
transdermal testosterone significantly improved:
- Satisfying sexual events: +2.1 per month (95% CI: 1.4-2.8)
- Sexual desire scores: +0.4 on validated scales (p<0.001)
The Global Consensus Position Statement (2019) and ISSWSH Guidelines (2021) both
recommend transdermal testosterone as first-line therapy...
### Recommendations
Based on this evidence synthesis:
1. **Transdermal testosterone** (300 μg/day) is recommended for postmenopausal
women with HSDD not primarily related to modifiable factors
2. **Duration**: Continue for 6 months to assess efficacy; discontinue if no benefit
3. **Monitoring**: Lipid profile and liver function at baseline and 3-6 months
### Limitations & Future Directions
- Long-term safety data beyond 24 months remains limited
- Efficacy in premenopausal women less well-established
- Head-to-head comparisons between formulations are needed
### References
1. Parish SJ et al. (2021). International Society for the Study of Women's Sexual
Health Clinical Practice Guideline for the Use of Systemic Testosterone for
Hypoactive Sexual Desire Disorder in Women. J Sex Med. https://pubmed.ncbi.nlm.nih.gov/33814355/
...
Root Cause Analysis
Current Implementation (src/orchestrators/simple.py:448-505)
def _generate_synthesis(
self,
query: str,
evidence: list[Evidence],
assessment: JudgeAssessment,
) -> str:
# ❌ NO LLM CALL - Just string templating!
drug_list = "\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
findings_list = "\n".join([f"- {f}" for f in assessment.details.key_findings])
return f"""{self.domain_config.report_title}
### Question
{query}
### Drug Candidates
{drug_list}
...
"""
The problem: No LLM is ever called to synthesize the report. It's just formatted data from the JudgeAssessment.
Microsoft Agent Framework Pattern
From reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/concurrent_custom_aggregator.py:
# Define a custom aggregator callback that uses the chat client to SYNTHESIZE
async def summarize_results(results: list[Any]) -> str:
# Collect expert outputs
expert_sections: list[str] = []
for r in results:
messages = getattr(r.agent_run_response, "messages", [])
final_text = messages[-1].text if messages else "(no content)"
expert_sections.append(f"{r.executor_id}:\n{final_text}")
# Ask the MODEL to synthesize
system_msg = ChatMessage(
Role.SYSTEM,
text=(
"You are a helpful assistant that consolidates multiple domain expert outputs "
"into one cohesive, concise summary with clear takeaways."
),
)
user_msg = ChatMessage(Role.USER, text="\n\n".join(expert_sections))
# ✅ LLM CALL for synthesis
response = await chat_client.get_response([system_msg, user_msg])
return response.messages[-1].text
The pattern: The aggregator makes an LLM call to synthesize, not string concatenation.
Solution Design
Architecture
Current:
Evidence → Judge → {structured data} → String Template → Bullet Points
Proposed:
Evidence → Judge → {structured data} → SynthesisAgent → Narrative Prose
↓
LLM-based synthesis
Components
1. SynthesisAgent (src/agents/synthesis.py)
A new agent dedicated to narrative report generation:
from pydantic import BaseModel
from pydantic_ai import Agent
class NarrativeReport(BaseModel):
"""Structured output for narrative report."""
executive_summary: str # 2-3 sentences, key takeaways
background: str # What is this condition, why does it matter
evidence_synthesis: str # Mechanism + Clinical evidence in prose
recommendations: list[str] # Actionable recommendations
limitations: str # Honest limitations
references: list[Reference] # Properly formatted
class SynthesisAgent:
"""Generates narrative research reports from structured data."""
async def synthesize(
self,
query: str,
evidence: list[Evidence],
assessment: JudgeAssessment,
domain: ResearchDomain,
) -> NarrativeReport:
"""Generate narrative prose report."""
# Build context
context = self._build_synthesis_context(evidence, assessment)
# ✅ LLM CALL for synthesis
result = await self.agent.run(
f"Generate a narrative research report for: {query}",
context=context,
)
return result.data
2. Updated System Prompt (src/prompts/synthesis.py)
SYNTHESIS_SYSTEM_PROMPT = """You are a scientific writer specializing in sexual health research.
Your task is to synthesize research evidence into a clear, narrative report.
## Writing Style
- Write in PROSE PARAGRAPHS, not bullet points
- Use academic but accessible language
- Be specific about evidence strength (e.g., "in a randomized controlled trial of N=200")
- Reference specific studies by author name
- Provide quantitative results where available
## Report Structure
### Executive Summary (REQUIRED - 2-3 sentences)
Summarize the key finding and clinical implication. Start with the bottom line.
Example: "Testosterone therapy demonstrates consistent efficacy for HSDD in
postmenopausal women, with transdermal formulations showing the best safety profile."
### Background (REQUIRED - 1 paragraph)
Explain the condition, its prevalence, and why this question matters clinically.
### Evidence Synthesis (REQUIRED - 2-4 paragraphs)
Weave together the evidence into a coherent narrative:
- Mechanism of Action: How does the intervention work?
- Clinical Evidence: What do the trials show? Be specific about effect sizes.
- Comparative Evidence: How does it compare to alternatives?
### Recommendations (REQUIRED - 3-5 bullet points)
Provide actionable clinical recommendations based on the evidence.
### Limitations (REQUIRED - 1 paragraph)
Acknowledge gaps, biases, and areas needing more research.
### References (REQUIRED)
List the key references in proper academic format.
## CRITICAL RULES
1. ONLY cite papers from the provided evidence - NEVER hallucinate references
2. Write in complete sentences and paragraphs
3. Avoid lists/bullets except in Recommendations section
4. Include specific statistics when available (p-values, effect sizes, CIs)
5. Acknowledge uncertainty honestly
"""
3. Updated Orchestrator Integration
# In src/orchestrators/simple.py
async def _generate_synthesis(
self,
query: str,
evidence: list[Evidence],
assessment: JudgeAssessment,
) -> str:
"""Generate narrative synthesis using LLM."""
from src.agents.synthesis import SynthesisAgent
synthesis_agent = SynthesisAgent(domain=self.domain)
report = await synthesis_agent.synthesize(
query=query,
evidence=evidence,
assessment=assessment,
domain=self.domain,
)
return report.to_markdown()
Few-Shot Example (Required for Quality)
From issue #82, include a concrete example in the prompt:
FEW_SHOT_EXAMPLE = """
## Example: Strong Evidence Synthesis
INPUT:
- Query: "Alprostadil for erectile dysfunction"
- Evidence: 15 papers including meta-analysis of 8 RCTs (N=3,247)
- Mechanism Score: 9/10
- Clinical Score: 9/10
OUTPUT:
### Executive Summary
Alprostadil (prostaglandin E1) represents a well-established second-line treatment
for erectile dysfunction, with meta-analytic evidence demonstrating 87% efficacy
in achieving erections sufficient for intercourse. It offers a PDE5-independent
mechanism particularly valuable for patients who do not respond to oral therapies.
### Background
Erectile dysfunction affects approximately 30 million men in the United States,
with prevalence increasing with age. While PDE5 inhibitors (sildenafil, tadalafil)
remain first-line therapy, approximately 30% of patients are non-responders or
have contraindications. Alprostadil provides an alternative mechanism of action
through direct smooth muscle relaxation.
### Evidence Synthesis
**Mechanism of Action**
Alprostadil works through a distinct pathway from PDE5 inhibitors. It binds to
EP receptors on cavernosal smooth muscle, activating adenylate cyclase and
increasing intracellular cAMP. This leads to smooth muscle relaxation and
penile erection independent of nitric oxide signaling. As noted by Smith et al.
(2019), this mechanism explains its efficacy in patients with endothelial
dysfunction or nerve damage.
**Clinical Evidence**
A meta-analysis by Johnson et al. (2020) pooled data from 8 randomized controlled
trials (N=3,247) comparing intracavernosal alprostadil to placebo. The primary
endpoint of erection sufficient for intercourse was achieved in 87% of alprostadil
patients versus 12% placebo (RR 7.25, 95% CI: 5.8-9.1, p<0.001). The number
needed to treat (NNT) was 1.3, indicating robust effect size.
Subgroup analysis revealed consistent efficacy across etiologies:
- Vascular ED: 85% response rate
- Neurogenic ED: 91% response rate
- Post-prostatectomy: 82% response rate
### Recommendations
1. Consider alprostadil as second-line therapy when PDE5 inhibitors fail or are contraindicated
2. Start with 10 μg intracavernosal injection, titrate up to 40 μg based on response
3. Provide in-office training for self-injection technique
4. Monitor for penile fibrosis with long-term use (occurs in 3-5% of patients)
### Limitations
Long-term data beyond 2 years is limited. Head-to-head comparisons with
newer therapies (low-intensity shockwave) are lacking. Most trials excluded
patients with severe cardiovascular disease, limiting generalizability.
The intraurethral formulation (MUSE) has lower efficacy (43%) than injection.
### References
1. Smith AB et al. (2019). Alprostadil mechanism of action in erectile tissue.
J Urol. https://pubmed.ncbi.nlm.nih.gov/12345678/
2. Johnson CD et al. (2020). Meta-analysis of intracavernosal alprostadil.
J Sex Med. https://pubmed.ncbi.nlm.nih.gov/23456789/
"""
Implementation Plan
Phase 1: Core SynthesisAgent
Create
src/agents/synthesis.pywith:SynthesisAgentclassNarrativeReportPydantic model- LLM-based synthesis method
Create
src/prompts/synthesis.pywith:SYNTHESIS_SYSTEM_PROMPTFEW_SHOT_EXAMPLEformat_synthesis_context()helper
Update
src/orchestrators/simple.py:- Make
_generate_synthesis()async - Call
SynthesisAgent.synthesize() - Keep
_generate_partial_synthesis()as fallback (free tier)
- Make
Phase 2: Advanced Mode Integration
- Update
src/orchestrators/advanced.py:- Add
SynthesisAgentto Magentic workflow - Ensure it receives all evidence from prior agents
- Add
Phase 3: Test Coverage
- Create
tests/unit/agents/test_synthesis.py:- Test narrative output structure
- Test reference accuracy (no hallucinated citations)
- Test prose vs bullet point ratio
Phase 4: Domain Customization
- Update
src/config/domain.py:- Add
synthesis_system_promptfield toDomainConfig - Add
synthesis_few_shot_examplefield - Configure for sexual health domain
- Add
File Changes
| File | Change |
|---|---|
src/agents/synthesis.py |
NEW - SynthesisAgent |
src/prompts/synthesis.py |
NEW - Synthesis prompts |
src/orchestrators/simple.py |
MODIFY - Call SynthesisAgent |
src/orchestrators/advanced.py |
MODIFY - Add to Magentic |
src/config/domain.py |
MODIFY - Add synthesis prompts |
src/utils/models.py |
MODIFY - Add NarrativeReport |
tests/unit/agents/test_synthesis.py |
NEW - Tests |
tests/unit/prompts/test_synthesis.py |
NEW - Tests |
Acceptance Criteria
- Report contains paragraph-form prose, not just bullet points
- Report has executive summary (2-3 sentences)
- Report has background section explaining the condition
- Report has synthesized narrative weaving evidence together
- Report has actionable recommendations
- Report has limitations section (honest acknowledgment)
- Citations are properly formatted (author, year, title, URL)
- No hallucinated references (CRITICAL)
- Works in both simple and advanced modes
- Falls back gracefully on free tier (minimal templating OK)
Test Criteria
def test_report_is_narrative_not_bullets():
"""Report should be mostly prose, not bullet points."""
report = synthesis_agent.synthesize(...)
# Count paragraphs vs bullet points
paragraphs = len([p for p in report.split('\n\n') if len(p) > 100])
bullets = report.count('\n- ')
# Prose should dominate
assert paragraphs > bullets, "Report should be narrative, not bullet list"
def test_references_not_hallucinated():
"""All references must come from provided evidence."""
evidence_urls = {e.citation.url for e in evidence}
report = synthesis_agent.synthesize(...)
for ref in report.references:
assert ref.url in evidence_urls, f"Hallucinated reference: {ref.url}"
Related Microsoft Agent Framework Patterns
| Pattern | Location | Application |
|---|---|---|
| Custom Aggregator | concurrent_custom_aggregator.py |
LLM-based synthesis |
| Fan-Out/Fan-In | fan_out_fan_in_edges.py |
Multi-expert synthesis |
| Research Assistant | research_assistant_agent.py |
Tool-based research |
| Sequential Orchestration | spec-001-foundry-sdk-alignment.md |
Analyst→Writer→Editor chain |
References
- GitHub Issue #85: Report lacks narrative synthesis
- GitHub Issue #86: Microsoft Agent Framework patterns
- LangChain Deep Agents blog: Few-shot examples importance
- Open Deep Research Architecture: Scoping + Synthesis pattern