# SPEC_12: Narrative Report Synthesis

**Status**: Draft
**Priority**: P1 - Core deliverable
**Related Issues**: #85, #86
**Related Spec**: SPEC_11 (Sexual Health Focus)

## Problem Statement

DeepBoner's report generation outputs **structured metadata** instead of **synthesized prose**. The current implementation uses string templating with NO LLM call for narrative synthesis.

### Current Output (Actual)

```markdown
## Sexual Health Analysis

### Question
Testosterone therapy for hypoactive sexual desire disorder?

### Drug Candidates
- **Testosterone**
- **LibiGel**
- **Androgel**

### Key Findings
- Testosterone therapy improves sexual desire and activity in postmenopausal women with HSDD.
- Transdermal testosterone is a preferred formulation.

### Assessment
- **Mechanism Score**: 8/10
- **Clinical Evidence Score**: 9/10
- **Confidence**: 90%

### Reasoning
The evidence provides a clear understanding of the mechanism of action...

### Citations (33 sources)
1. [Title](url)...
```

### Expected Output (Professional Research Report)

```markdown
## Sexual Health Research Report: Testosterone Therapy for Hypoactive Sexual Desire Disorder

### Executive Summary

Testosterone therapy represents a well-established, evidence-based treatment for
hypoactive sexual desire disorder (HSDD) in postmenopausal women. Our analysis of
33 peer-reviewed sources reveals consistent findings across multiple randomized
controlled trials, with transdermal testosterone demonstrating the strongest
efficacy-safety profile.

### Background

Hypoactive sexual desire disorder affects an estimated 12% of postmenopausal women
and is characterized by persistent lack of sexual interest causing personal distress.
The International Society for the Study of Women's Sexual Health (ISSWSH) published
clinical guidelines in 2021 establishing testosterone as a recommended intervention...

### Evidence Synthesis

**Mechanism of Action**

Testosterone exerts its effects on sexual desire through multiple pathways. At the
hypothalamic level, testosterone modulates dopaminergic signaling that underlies
libido. Evidence from Smith et al. (2021) demonstrates that androgen receptor
activation in the central nervous system correlates with subjective measures of
sexual desire (r=0.67, p<0.001)...

**Clinical Trial Evidence**

A systematic review of 8 randomized controlled trials (N=3,035) demonstrated that
transdermal testosterone significantly improved:
- Satisfying sexual events: +2.1 per month (95% CI: 1.4-2.8)
- Sexual desire scores: +0.4 on validated scales (p<0.001)

The Global Consensus Position Statement (2019) and ISSWSH Guidelines (2021) both
recommend transdermal testosterone as first-line therapy...

### Recommendations

Based on this evidence synthesis:
1. **Transdermal testosterone** (300 μg/day) is recommended for postmenopausal
   women with HSDD not primarily related to modifiable factors
2. **Duration**: Continue for 6 months to assess efficacy; discontinue if no benefit
3. **Monitoring**: Lipid profile and liver function at baseline and 3-6 months

### Limitations & Future Directions

- Long-term safety data beyond 24 months remains limited
- Efficacy in premenopausal women less well-established
- Head-to-head comparisons between formulations are needed

### References

1. Parish SJ et al. (2021). International Society for the Study of Women's Sexual
   Health Clinical Practice Guideline for the Use of Systemic Testosterone for
   Hypoactive Sexual Desire Disorder in Women. J Sex Med. https://pubmed.ncbi.nlm.nih.gov/33814355/
...
```

## Root Cause Analysis

### Current Implementation (`src/orchestrators/simple.py:448-505`)

```python
def _generate_synthesis(
    self,
    query: str,
    evidence: list[Evidence],
    assessment: JudgeAssessment,
) -> str:
    # ❌ NO LLM CALL - Just string templating!
    drug_list = "\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
    findings_list = "\n".join([f"- {f}" for f in assessment.details.key_findings])

    return f"""{self.domain_config.report_title}
### Question
{query}
### Drug Candidates
{drug_list}
...
"""
```

**The problem**: No LLM is ever called to synthesize the report. It's just formatted
data from the JudgeAssessment.

### Microsoft Agent Framework Pattern

From `reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/concurrent_custom_aggregator.py`:

```python
# Define a custom aggregator callback that uses the chat client to SYNTHESIZE
async def summarize_results(results: list[Any]) -> str:
    # Collect expert outputs
    expert_sections: list[str] = []
    for r in results:
        messages = getattr(r.agent_run_response, "messages", [])
        final_text = messages[-1].text if messages else "(no content)"
        expert_sections.append(f"{r.executor_id}:\n{final_text}")

    # Ask the MODEL to synthesize
    system_msg = ChatMessage(
        Role.SYSTEM,
        text=(
            "You are a helpful assistant that consolidates multiple domain expert outputs "
            "into one cohesive, concise summary with clear takeaways."
        ),
    )
    user_msg = ChatMessage(Role.USER, text="\n\n".join(expert_sections))

    # ✅ LLM CALL for synthesis
    response = await chat_client.get_response([system_msg, user_msg])
    return response.messages[-1].text
```

**The pattern**: The aggregator makes an **LLM call** to synthesize, not string concatenation.

## Solution Design

### Architecture

```
Current:
  Evidence → Judge → {structured data} → String Template → Bullet Points

Proposed:
  Evidence → Judge → {structured data} → SynthesisAgent → Narrative Prose
                                                ↓
                                         LLM-based synthesis
```

### Components

#### 1. `SynthesisAgent` (`src/agents/synthesis.py`)

A new agent dedicated to narrative report generation:

```python
from pydantic import BaseModel
from pydantic_ai import Agent

class NarrativeReport(BaseModel):
    """Structured output for narrative report."""
    executive_summary: str  # 2-3 sentences, key takeaways
    background: str  # What is this condition, why does it matter
    evidence_synthesis: str  # Mechanism + Clinical evidence in prose
    recommendations: list[str]  # Actionable recommendations
    limitations: str  # Honest limitations
    references: list[Reference]  # Properly formatted

class SynthesisAgent:
    """Generates narrative research reports from structured data."""

    async def synthesize(
        self,
        query: str,
        evidence: list[Evidence],
        assessment: JudgeAssessment,
        domain: ResearchDomain,
    ) -> NarrativeReport:
        """Generate narrative prose report."""
        # Build context
        context = self._build_synthesis_context(evidence, assessment)

        # ✅ LLM CALL for synthesis
        result = await self.agent.run(
            f"Generate a narrative research report for: {query}",
            context=context,
        )
        return result.data
```

#### 2. Updated System Prompt (`src/prompts/synthesis.py`)

```python
SYNTHESIS_SYSTEM_PROMPT = """You are a scientific writer specializing in sexual health research.
Your task is to synthesize research evidence into a clear, narrative report.

## Writing Style
- Write in PROSE PARAGRAPHS, not bullet points
- Use academic but accessible language
- Be specific about evidence strength (e.g., "in a randomized controlled trial of N=200")
- Reference specific studies by author name
- Provide quantitative results where available

## Report Structure

### Executive Summary (REQUIRED - 2-3 sentences)
Summarize the key finding and clinical implication. Start with the bottom line.
Example: "Testosterone therapy demonstrates consistent efficacy for HSDD in
postmenopausal women, with transdermal formulations showing the best safety profile."

### Background (REQUIRED - 1 paragraph)
Explain the condition, its prevalence, and why this question matters clinically.

### Evidence Synthesis (REQUIRED - 2-4 paragraphs)
Weave together the evidence into a coherent narrative:
- Mechanism of Action: How does the intervention work?
- Clinical Evidence: What do the trials show? Be specific about effect sizes.
- Comparative Evidence: How does it compare to alternatives?

### Recommendations (REQUIRED - 3-5 bullet points)
Provide actionable clinical recommendations based on the evidence.

### Limitations (REQUIRED - 1 paragraph)
Acknowledge gaps, biases, and areas needing more research.

### References (REQUIRED)
List the key references in proper academic format.

## CRITICAL RULES
1. ONLY cite papers from the provided evidence - NEVER hallucinate references
2. Write in complete sentences and paragraphs
3. Avoid lists/bullets except in Recommendations section
4. Include specific statistics when available (p-values, effect sizes, CIs)
5. Acknowledge uncertainty honestly
"""
```

#### 3. Updated Orchestrator Integration

```python
# In src/orchestrators/simple.py

async def _generate_synthesis(
    self,
    query: str,
    evidence: list[Evidence],
    assessment: JudgeAssessment,
) -> str:
    """Generate narrative synthesis using LLM."""
    from src.agents.synthesis import SynthesisAgent

    synthesis_agent = SynthesisAgent(domain=self.domain)

    report = await synthesis_agent.synthesize(
        query=query,
        evidence=evidence,
        assessment=assessment,
        domain=self.domain,
    )

    return report.to_markdown()
```

### Few-Shot Example (Required for Quality)

From issue #82, include a concrete example in the prompt:

```python
FEW_SHOT_EXAMPLE = """
## Example: Strong Evidence Synthesis

INPUT:
- Query: "Alprostadil for erectile dysfunction"
- Evidence: 15 papers including meta-analysis of 8 RCTs (N=3,247)
- Mechanism Score: 9/10
- Clinical Score: 9/10

OUTPUT:

### Executive Summary

Alprostadil (prostaglandin E1) represents a well-established second-line treatment
for erectile dysfunction, with meta-analytic evidence demonstrating 87% efficacy
in achieving erections sufficient for intercourse. It offers a PDE5-independent
mechanism particularly valuable for patients who do not respond to oral therapies.

### Background

Erectile dysfunction affects approximately 30 million men in the United States,
with prevalence increasing with age. While PDE5 inhibitors (sildenafil, tadalafil)
remain first-line therapy, approximately 30% of patients are non-responders or
have contraindications. Alprostadil provides an alternative mechanism of action
through direct smooth muscle relaxation.

### Evidence Synthesis

**Mechanism of Action**

Alprostadil works through a distinct pathway from PDE5 inhibitors. It binds to
EP receptors on cavernosal smooth muscle, activating adenylate cyclase and
increasing intracellular cAMP. This leads to smooth muscle relaxation and
penile erection independent of nitric oxide signaling. As noted by Smith et al.
(2019), this mechanism explains its efficacy in patients with endothelial
dysfunction or nerve damage.

**Clinical Evidence**

A meta-analysis by Johnson et al. (2020) pooled data from 8 randomized controlled
trials (N=3,247) comparing intracavernosal alprostadil to placebo. The primary
endpoint of erection sufficient for intercourse was achieved in 87% of alprostadil
patients versus 12% placebo (RR 7.25, 95% CI: 5.8-9.1, p<0.001). The number
needed to treat (NNT) was 1.3, indicating robust effect size.

Subgroup analysis revealed consistent efficacy across etiologies:
- Vascular ED: 85% response rate
- Neurogenic ED: 91% response rate
- Post-prostatectomy: 82% response rate

### Recommendations

1. Consider alprostadil as second-line therapy when PDE5 inhibitors fail or are contraindicated
2. Start with 10 μg intracavernosal injection, titrate up to 40 μg based on response
3. Provide in-office training for self-injection technique
4. Monitor for penile fibrosis with long-term use (occurs in 3-5% of patients)

### Limitations

Long-term data beyond 2 years is limited. Head-to-head comparisons with
newer therapies (low-intensity shockwave) are lacking. Most trials excluded
patients with severe cardiovascular disease, limiting generalizability.
The intraurethral formulation (MUSE) has lower efficacy (43%) than injection.

### References

1. Smith AB et al. (2019). Alprostadil mechanism of action in erectile tissue.
   J Urol. https://pubmed.ncbi.nlm.nih.gov/12345678/
2. Johnson CD et al. (2020). Meta-analysis of intracavernosal alprostadil.
   J Sex Med. https://pubmed.ncbi.nlm.nih.gov/23456789/
"""
```

## Implementation Plan

### Phase 1: Core SynthesisAgent

1. Create `src/agents/synthesis.py` with:
   - `SynthesisAgent` class
   - `NarrativeReport` Pydantic model
   - LLM-based synthesis method

2. Create `src/prompts/synthesis.py` with:
   - `SYNTHESIS_SYSTEM_PROMPT`
   - `FEW_SHOT_EXAMPLE`
   - `format_synthesis_context()` helper

3. Update `src/orchestrators/simple.py`:
   - Make `_generate_synthesis()` async
   - Call `SynthesisAgent.synthesize()`
   - Keep `_generate_partial_synthesis()` as fallback (free tier)

### Phase 2: Advanced Mode Integration

4. Update `src/orchestrators/advanced.py`:
   - Add `SynthesisAgent` to Magentic workflow
   - Ensure it receives all evidence from prior agents

### Phase 3: Test Coverage

5. Create `tests/unit/agents/test_synthesis.py`:
   - Test narrative output structure
   - Test reference accuracy (no hallucinated citations)
   - Test prose vs bullet point ratio

### Phase 4: Domain Customization

6. Update `src/config/domain.py`:
   - Add `synthesis_system_prompt` field to `DomainConfig`
   - Add `synthesis_few_shot_example` field
   - Configure for sexual health domain

## File Changes

| File | Change |
|------|--------|
| `src/agents/synthesis.py` | NEW - SynthesisAgent |
| `src/prompts/synthesis.py` | NEW - Synthesis prompts |
| `src/orchestrators/simple.py` | MODIFY - Call SynthesisAgent |
| `src/orchestrators/advanced.py` | MODIFY - Add to Magentic |
| `src/config/domain.py` | MODIFY - Add synthesis prompts |
| `src/utils/models.py` | MODIFY - Add NarrativeReport |
| `tests/unit/agents/test_synthesis.py` | NEW - Tests |
| `tests/unit/prompts/test_synthesis.py` | NEW - Tests |

## Acceptance Criteria

- [ ] Report contains **paragraph-form prose**, not just bullet points
- [ ] Report has **executive summary** (2-3 sentences)
- [ ] Report has **background section** explaining the condition
- [ ] Report has **synthesized narrative** weaving evidence together
- [ ] Report has **actionable recommendations**
- [ ] Report has **limitations** section (honest acknowledgment)
- [ ] Citations are **properly formatted** (author, year, title, URL)
- [ ] No hallucinated references (CRITICAL)
- [ ] Works in both simple and advanced modes
- [ ] Falls back gracefully on free tier (minimal templating OK)

## Test Criteria

```python
def test_report_is_narrative_not_bullets():
    """Report should be mostly prose, not bullet points."""
    report = synthesis_agent.synthesize(...)

    # Count paragraphs vs bullet points
    paragraphs = len([p for p in report.split('\n\n') if len(p) > 100])
    bullets = report.count('\n- ')

    # Prose should dominate
    assert paragraphs > bullets, "Report should be narrative, not bullet list"

def test_references_not_hallucinated():
    """All references must come from provided evidence."""
    evidence_urls = {e.citation.url for e in evidence}
    report = synthesis_agent.synthesize(...)

    for ref in report.references:
        assert ref.url in evidence_urls, f"Hallucinated reference: {ref.url}"
```

## Related Microsoft Agent Framework Patterns

| Pattern | Location | Application |
|---------|----------|-------------|
| Custom Aggregator | `concurrent_custom_aggregator.py` | LLM-based synthesis |
| Fan-Out/Fan-In | `fan_out_fan_in_edges.py` | Multi-expert synthesis |
| Research Assistant | `research_assistant_agent.py` | Tool-based research |
| Sequential Orchestration | `spec-001-foundry-sdk-alignment.md` | Analyst→Writer→Editor chain |

## References

- GitHub Issue #85: Report lacks narrative synthesis
- GitHub Issue #86: Microsoft Agent Framework patterns
- LangChain Deep Agents blog: Few-shot examples importance
- Open Deep Research Architecture: Scoping + Synthesis pattern