DeepBoner / docs /specs /SPEC_12_NARRATIVE_SYNTHESIS.md
VibecoderMcSwaggins's picture
docs: add SPEC_12 for narrative report synthesis
25c3ff9
|
raw
history blame
16.3 kB

SPEC_12: Narrative Report Synthesis

Status: Draft Priority: P1 - Core deliverable Related Issues: #85, #86 Related Spec: SPEC_11 (Sexual Health Focus)

Problem Statement

DeepBoner's report generation outputs structured metadata instead of synthesized prose. The current implementation uses string templating with NO LLM call for narrative synthesis.

Current Output (Actual)

## Sexual Health Analysis

### Question
Testosterone therapy for hypoactive sexual desire disorder?

### Drug Candidates
- **Testosterone**
- **LibiGel**
- **Androgel**

### Key Findings
- Testosterone therapy improves sexual desire and activity in postmenopausal women with HSDD.
- Transdermal testosterone is a preferred formulation.

### Assessment
- **Mechanism Score**: 8/10
- **Clinical Evidence Score**: 9/10
- **Confidence**: 90%

### Reasoning
The evidence provides a clear understanding of the mechanism of action...

### Citations (33 sources)
1. [Title](url)...

Expected Output (Professional Research Report)

## Sexual Health Research Report: Testosterone Therapy for Hypoactive Sexual Desire Disorder

### Executive Summary

Testosterone therapy represents a well-established, evidence-based treatment for
hypoactive sexual desire disorder (HSDD) in postmenopausal women. Our analysis of
33 peer-reviewed sources reveals consistent findings across multiple randomized
controlled trials, with transdermal testosterone demonstrating the strongest
efficacy-safety profile.

### Background

Hypoactive sexual desire disorder affects an estimated 12% of postmenopausal women
and is characterized by persistent lack of sexual interest causing personal distress.
The International Society for the Study of Women's Sexual Health (ISSWSH) published
clinical guidelines in 2021 establishing testosterone as a recommended intervention...

### Evidence Synthesis

**Mechanism of Action**

Testosterone exerts its effects on sexual desire through multiple pathways. At the
hypothalamic level, testosterone modulates dopaminergic signaling that underlies
libido. Evidence from Smith et al. (2021) demonstrates that androgen receptor
activation in the central nervous system correlates with subjective measures of
sexual desire (r=0.67, p<0.001)...

**Clinical Trial Evidence**

A systematic review of 8 randomized controlled trials (N=3,035) demonstrated that
transdermal testosterone significantly improved:
- Satisfying sexual events: +2.1 per month (95% CI: 1.4-2.8)
- Sexual desire scores: +0.4 on validated scales (p<0.001)

The Global Consensus Position Statement (2019) and ISSWSH Guidelines (2021) both
recommend transdermal testosterone as first-line therapy...

### Recommendations

Based on this evidence synthesis:
1. **Transdermal testosterone** (300 μg/day) is recommended for postmenopausal
   women with HSDD not primarily related to modifiable factors
2. **Duration**: Continue for 6 months to assess efficacy; discontinue if no benefit
3. **Monitoring**: Lipid profile and liver function at baseline and 3-6 months

### Limitations & Future Directions

- Long-term safety data beyond 24 months remains limited
- Efficacy in premenopausal women less well-established
- Head-to-head comparisons between formulations are needed

### References

1. Parish SJ et al. (2021). International Society for the Study of Women's Sexual
   Health Clinical Practice Guideline for the Use of Systemic Testosterone for
   Hypoactive Sexual Desire Disorder in Women. J Sex Med. https://pubmed.ncbi.nlm.nih.gov/33814355/
...

Root Cause Analysis

Current Implementation (src/orchestrators/simple.py:448-505)

def _generate_synthesis(
    self,
    query: str,
    evidence: list[Evidence],
    assessment: JudgeAssessment,
) -> str:
    # ❌ NO LLM CALL - Just string templating!
    drug_list = "\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
    findings_list = "\n".join([f"- {f}" for f in assessment.details.key_findings])

    return f"""{self.domain_config.report_title}
### Question
{query}
### Drug Candidates
{drug_list}
...
"""

The problem: No LLM is ever called to synthesize the report. It's just formatted data from the JudgeAssessment.

Microsoft Agent Framework Pattern

From reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/concurrent_custom_aggregator.py:

# Define a custom aggregator callback that uses the chat client to SYNTHESIZE
async def summarize_results(results: list[Any]) -> str:
    # Collect expert outputs
    expert_sections: list[str] = []
    for r in results:
        messages = getattr(r.agent_run_response, "messages", [])
        final_text = messages[-1].text if messages else "(no content)"
        expert_sections.append(f"{r.executor_id}:\n{final_text}")

    # Ask the MODEL to synthesize
    system_msg = ChatMessage(
        Role.SYSTEM,
        text=(
            "You are a helpful assistant that consolidates multiple domain expert outputs "
            "into one cohesive, concise summary with clear takeaways."
        ),
    )
    user_msg = ChatMessage(Role.USER, text="\n\n".join(expert_sections))

    # ✅ LLM CALL for synthesis
    response = await chat_client.get_response([system_msg, user_msg])
    return response.messages[-1].text

The pattern: The aggregator makes an LLM call to synthesize, not string concatenation.

Solution Design

Architecture

Current:
  Evidence → Judge → {structured data} → String Template → Bullet Points

Proposed:
  Evidence → Judge → {structured data} → SynthesisAgent → Narrative Prose
                                                ↓
                                         LLM-based synthesis

Components

1. SynthesisAgent (src/agents/synthesis.py)

A new agent dedicated to narrative report generation:

from pydantic import BaseModel
from pydantic_ai import Agent

class NarrativeReport(BaseModel):
    """Structured output for narrative report."""
    executive_summary: str  # 2-3 sentences, key takeaways
    background: str  # What is this condition, why does it matter
    evidence_synthesis: str  # Mechanism + Clinical evidence in prose
    recommendations: list[str]  # Actionable recommendations
    limitations: str  # Honest limitations
    references: list[Reference]  # Properly formatted

class SynthesisAgent:
    """Generates narrative research reports from structured data."""

    async def synthesize(
        self,
        query: str,
        evidence: list[Evidence],
        assessment: JudgeAssessment,
        domain: ResearchDomain,
    ) -> NarrativeReport:
        """Generate narrative prose report."""
        # Build context
        context = self._build_synthesis_context(evidence, assessment)

        # ✅ LLM CALL for synthesis
        result = await self.agent.run(
            f"Generate a narrative research report for: {query}",
            context=context,
        )
        return result.data

2. Updated System Prompt (src/prompts/synthesis.py)

SYNTHESIS_SYSTEM_PROMPT = """You are a scientific writer specializing in sexual health research.
Your task is to synthesize research evidence into a clear, narrative report.

## Writing Style
- Write in PROSE PARAGRAPHS, not bullet points
- Use academic but accessible language
- Be specific about evidence strength (e.g., "in a randomized controlled trial of N=200")
- Reference specific studies by author name
- Provide quantitative results where available

## Report Structure

### Executive Summary (REQUIRED - 2-3 sentences)
Summarize the key finding and clinical implication. Start with the bottom line.
Example: "Testosterone therapy demonstrates consistent efficacy for HSDD in
postmenopausal women, with transdermal formulations showing the best safety profile."

### Background (REQUIRED - 1 paragraph)
Explain the condition, its prevalence, and why this question matters clinically.

### Evidence Synthesis (REQUIRED - 2-4 paragraphs)
Weave together the evidence into a coherent narrative:
- Mechanism of Action: How does the intervention work?
- Clinical Evidence: What do the trials show? Be specific about effect sizes.
- Comparative Evidence: How does it compare to alternatives?

### Recommendations (REQUIRED - 3-5 bullet points)
Provide actionable clinical recommendations based on the evidence.

### Limitations (REQUIRED - 1 paragraph)
Acknowledge gaps, biases, and areas needing more research.

### References (REQUIRED)
List the key references in proper academic format.

## CRITICAL RULES
1. ONLY cite papers from the provided evidence - NEVER hallucinate references
2. Write in complete sentences and paragraphs
3. Avoid lists/bullets except in Recommendations section
4. Include specific statistics when available (p-values, effect sizes, CIs)
5. Acknowledge uncertainty honestly
"""

3. Updated Orchestrator Integration

# In src/orchestrators/simple.py

async def _generate_synthesis(
    self,
    query: str,
    evidence: list[Evidence],
    assessment: JudgeAssessment,
) -> str:
    """Generate narrative synthesis using LLM."""
    from src.agents.synthesis import SynthesisAgent

    synthesis_agent = SynthesisAgent(domain=self.domain)

    report = await synthesis_agent.synthesize(
        query=query,
        evidence=evidence,
        assessment=assessment,
        domain=self.domain,
    )

    return report.to_markdown()

Few-Shot Example (Required for Quality)

From issue #82, include a concrete example in the prompt:

FEW_SHOT_EXAMPLE = """
## Example: Strong Evidence Synthesis

INPUT:
- Query: "Alprostadil for erectile dysfunction"
- Evidence: 15 papers including meta-analysis of 8 RCTs (N=3,247)
- Mechanism Score: 9/10
- Clinical Score: 9/10

OUTPUT:

### Executive Summary

Alprostadil (prostaglandin E1) represents a well-established second-line treatment
for erectile dysfunction, with meta-analytic evidence demonstrating 87% efficacy
in achieving erections sufficient for intercourse. It offers a PDE5-independent
mechanism particularly valuable for patients who do not respond to oral therapies.

### Background

Erectile dysfunction affects approximately 30 million men in the United States,
with prevalence increasing with age. While PDE5 inhibitors (sildenafil, tadalafil)
remain first-line therapy, approximately 30% of patients are non-responders or
have contraindications. Alprostadil provides an alternative mechanism of action
through direct smooth muscle relaxation.

### Evidence Synthesis

**Mechanism of Action**

Alprostadil works through a distinct pathway from PDE5 inhibitors. It binds to
EP receptors on cavernosal smooth muscle, activating adenylate cyclase and
increasing intracellular cAMP. This leads to smooth muscle relaxation and
penile erection independent of nitric oxide signaling. As noted by Smith et al.
(2019), this mechanism explains its efficacy in patients with endothelial
dysfunction or nerve damage.

**Clinical Evidence**

A meta-analysis by Johnson et al. (2020) pooled data from 8 randomized controlled
trials (N=3,247) comparing intracavernosal alprostadil to placebo. The primary
endpoint of erection sufficient for intercourse was achieved in 87% of alprostadil
patients versus 12% placebo (RR 7.25, 95% CI: 5.8-9.1, p<0.001). The number
needed to treat (NNT) was 1.3, indicating robust effect size.

Subgroup analysis revealed consistent efficacy across etiologies:
- Vascular ED: 85% response rate
- Neurogenic ED: 91% response rate
- Post-prostatectomy: 82% response rate

### Recommendations

1. Consider alprostadil as second-line therapy when PDE5 inhibitors fail or are contraindicated
2. Start with 10 μg intracavernosal injection, titrate up to 40 μg based on response
3. Provide in-office training for self-injection technique
4. Monitor for penile fibrosis with long-term use (occurs in 3-5% of patients)

### Limitations

Long-term data beyond 2 years is limited. Head-to-head comparisons with
newer therapies (low-intensity shockwave) are lacking. Most trials excluded
patients with severe cardiovascular disease, limiting generalizability.
The intraurethral formulation (MUSE) has lower efficacy (43%) than injection.

### References

1. Smith AB et al. (2019). Alprostadil mechanism of action in erectile tissue.
   J Urol. https://pubmed.ncbi.nlm.nih.gov/12345678/
2. Johnson CD et al. (2020). Meta-analysis of intracavernosal alprostadil.
   J Sex Med. https://pubmed.ncbi.nlm.nih.gov/23456789/
"""

Implementation Plan

Phase 1: Core SynthesisAgent

  1. Create src/agents/synthesis.py with:

    • SynthesisAgent class
    • NarrativeReport Pydantic model
    • LLM-based synthesis method
  2. Create src/prompts/synthesis.py with:

    • SYNTHESIS_SYSTEM_PROMPT
    • FEW_SHOT_EXAMPLE
    • format_synthesis_context() helper
  3. Update src/orchestrators/simple.py:

    • Make _generate_synthesis() async
    • Call SynthesisAgent.synthesize()
    • Keep _generate_partial_synthesis() as fallback (free tier)

Phase 2: Advanced Mode Integration

  1. Update src/orchestrators/advanced.py:
    • Add SynthesisAgent to Magentic workflow
    • Ensure it receives all evidence from prior agents

Phase 3: Test Coverage

  1. Create tests/unit/agents/test_synthesis.py:
    • Test narrative output structure
    • Test reference accuracy (no hallucinated citations)
    • Test prose vs bullet point ratio

Phase 4: Domain Customization

  1. Update src/config/domain.py:
    • Add synthesis_system_prompt field to DomainConfig
    • Add synthesis_few_shot_example field
    • Configure for sexual health domain

File Changes

File Change
src/agents/synthesis.py NEW - SynthesisAgent
src/prompts/synthesis.py NEW - Synthesis prompts
src/orchestrators/simple.py MODIFY - Call SynthesisAgent
src/orchestrators/advanced.py MODIFY - Add to Magentic
src/config/domain.py MODIFY - Add synthesis prompts
src/utils/models.py MODIFY - Add NarrativeReport
tests/unit/agents/test_synthesis.py NEW - Tests
tests/unit/prompts/test_synthesis.py NEW - Tests

Acceptance Criteria

  • Report contains paragraph-form prose, not just bullet points
  • Report has executive summary (2-3 sentences)
  • Report has background section explaining the condition
  • Report has synthesized narrative weaving evidence together
  • Report has actionable recommendations
  • Report has limitations section (honest acknowledgment)
  • Citations are properly formatted (author, year, title, URL)
  • No hallucinated references (CRITICAL)
  • Works in both simple and advanced modes
  • Falls back gracefully on free tier (minimal templating OK)

Test Criteria

def test_report_is_narrative_not_bullets():
    """Report should be mostly prose, not bullet points."""
    report = synthesis_agent.synthesize(...)

    # Count paragraphs vs bullet points
    paragraphs = len([p for p in report.split('\n\n') if len(p) > 100])
    bullets = report.count('\n- ')

    # Prose should dominate
    assert paragraphs > bullets, "Report should be narrative, not bullet list"

def test_references_not_hallucinated():
    """All references must come from provided evidence."""
    evidence_urls = {e.citation.url for e in evidence}
    report = synthesis_agent.synthesize(...)

    for ref in report.references:
        assert ref.url in evidence_urls, f"Hallucinated reference: {ref.url}"

Related Microsoft Agent Framework Patterns

Pattern Location Application
Custom Aggregator concurrent_custom_aggregator.py LLM-based synthesis
Fan-Out/Fan-In fan_out_fan_in_edges.py Multi-expert synthesis
Research Assistant research_assistant_agent.py Tool-based research
Sequential Orchestration spec-001-foundry-sdk-alignment.md Analyst→Writer→Editor chain

References

  • GitHub Issue #85: Report lacks narrative synthesis
  • GitHub Issue #86: Microsoft Agent Framework patterns
  • LangChain Deep Agents blog: Few-shot examples importance
  • Open Deep Research Architecture: Scoping + Synthesis pattern