Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on Nov 30, 2025

Commit

25c3ff9

1 Parent(s): 627c291

docs: add SPEC_12 for narrative report synthesis

Detailed spec for fixing issue #85 - reports output structured
metadata instead of synthesized prose narrative.

Key findings:
- Current `_generate_synthesis()` is string templating with NO LLM call
- Microsoft agent-framework shows custom aggregator pattern
- Need dedicated SynthesisAgent with LLM-based prose generation

Implementation plan:
1. Create SynthesisAgent (`src/agents/synthesis.py`)
2. Add synthesis prompts with few-shot examples
3. Update orchestrators to call SynthesisAgent
4. Add proper test coverage

References Microsoft agent-framework patterns:
- concurrent_custom_aggregator.py
- fan_out_fan_in_edges.py

Files changed (1) hide show

docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md +469 -0

docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md ADDED Viewed

	@@ -0,0 +1,469 @@

+# SPEC_12: Narrative Report Synthesis
+**Status**: Draft
+**Priority**: P1 - Core deliverable
+**Related Issues**: #85, #86
+**Related Spec**: SPEC_11 (Sexual Health Focus)
+## Problem Statement
+DeepBoner's report generation outputs **structured metadata** instead of **synthesized prose**. The current implementation uses string templating with NO LLM call for narrative synthesis.
+### Current Output (Actual)
+```markdown
+## Sexual Health Analysis
+### Question
+Testosterone therapy for hypoactive sexual desire disorder?
+### Drug Candidates
+- **Testosterone**
+- **LibiGel**
+- **Androgel**
+### Key Findings
+- Testosterone therapy improves sexual desire and activity in postmenopausal women with HSDD.
+- Transdermal testosterone is a preferred formulation.
+### Assessment
+- **Mechanism Score**: 8/10
+- **Clinical Evidence Score**: 9/10
+- **Confidence**: 90%
+### Reasoning
+The evidence provides a clear understanding of the mechanism of action...
+### Citations (33 sources)
+1. [Title](url)...
+```
+### Expected Output (Professional Research Report)
+```markdown
+## Sexual Health Research Report: Testosterone Therapy for Hypoactive Sexual Desire Disorder
+### Executive Summary
+Testosterone therapy represents a well-established, evidence-based treatment for
+hypoactive sexual desire disorder (HSDD) in postmenopausal women. Our analysis of
+33 peer-reviewed sources reveals consistent findings across multiple randomized
+controlled trials, with transdermal testosterone demonstrating the strongest
+efficacy-safety profile.
+### Background
+Hypoactive sexual desire disorder affects an estimated 12% of postmenopausal women
+and is characterized by persistent lack of sexual interest causing personal distress.
+The International Society for the Study of Women's Sexual Health (ISSWSH) published
+clinical guidelines in 2021 establishing testosterone as a recommended intervention...
+### Evidence Synthesis
+**Mechanism of Action**
+Testosterone exerts its effects on sexual desire through multiple pathways. At the
+hypothalamic level, testosterone modulates dopaminergic signaling that underlies
+libido. Evidence from Smith et al. (2021) demonstrates that androgen receptor
+activation in the central nervous system correlates with subjective measures of
+sexual desire (r=0.67, p<0.001)...
+**Clinical Trial Evidence**
+A systematic review of 8 randomized controlled trials (N=3,035) demonstrated that
+transdermal testosterone significantly improved:
+- Satisfying sexual events: +2.1 per month (95% CI: 1.4-2.8)
+- Sexual desire scores: +0.4 on validated scales (p<0.001)
+The Global Consensus Position Statement (2019) and ISSWSH Guidelines (2021) both
+recommend transdermal testosterone as first-line therapy...
+### Recommendations
+Based on this evidence synthesis:
+1. **Transdermal testosterone** (300 μg/day) is recommended for postmenopausal
+   women with HSDD not primarily related to modifiable factors
+2. **Duration**: Continue for 6 months to assess efficacy; discontinue if no benefit
+3. **Monitoring**: Lipid profile and liver function at baseline and 3-6 months
+### Limitations & Future Directions
+- Long-term safety data beyond 24 months remains limited
+- Efficacy in premenopausal women less well-established
+- Head-to-head comparisons between formulations are needed
+### References
+1. Parish SJ et al. (2021). International Society for the Study of Women's Sexual
+   Health Clinical Practice Guideline for the Use of Systemic Testosterone for
+   Hypoactive Sexual Desire Disorder in Women. J Sex Med. https://pubmed.ncbi.nlm.nih.gov/33814355/
+...
+```
+## Root Cause Analysis
+### Current Implementation (`src/orchestrators/simple.py:448-505`)
+```python
+def _generate_synthesis(
+    self,
+    query: str,
+    evidence: list[Evidence],
+    assessment: JudgeAssessment,
+) -> str:
+    # ❌ NO LLM CALL - Just string templating!
+    drug_list = "\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
+    findings_list = "\n".join([f"- {f}" for f in assessment.details.key_findings])
+    return f"""{self.domain_config.report_title}
+### Question
+{query}
+### Drug Candidates
+{drug_list}
+...
+"""
+```
+**The problem**: No LLM is ever called to synthesize the report. It's just formatted
+data from the JudgeAssessment.
+### Microsoft Agent Framework Pattern
+From `reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/concurrent_custom_aggregator.py`:
+```python
+# Define a custom aggregator callback that uses the chat client to SYNTHESIZE
+async def summarize_results(results: list[Any]) -> str:
+    # Collect expert outputs
+    expert_sections: list[str] = []
+    for r in results:
+        messages = getattr(r.agent_run_response, "messages", [])
+        final_text = messages[-1].text if messages else "(no content)"
+        expert_sections.append(f"{r.executor_id}:\n{final_text}")
+    # Ask the MODEL to synthesize
+    system_msg = ChatMessage(
+        Role.SYSTEM,
+        text=(
+            "You are a helpful assistant that consolidates multiple domain expert outputs "
+            "into one cohesive, concise summary with clear takeaways."
+        ),
+    )
+    user_msg = ChatMessage(Role.USER, text="\n\n".join(expert_sections))
+    # ✅ LLM CALL for synthesis
+    response = await chat_client.get_response([system_msg, user_msg])
+    return response.messages[-1].text
+```
+**The pattern**: The aggregator makes an **LLM call** to synthesize, not string concatenation.
+## Solution Design
+### Architecture
+```
+Current:
+  Evidence → Judge → {structured data} → String Template → Bullet Points
+Proposed:
+  Evidence → Judge → {structured data} → SynthesisAgent → Narrative Prose
+                                                ↓
+                                         LLM-based synthesis
+```
+### Components
+#### 1. `SynthesisAgent` (`src/agents/synthesis.py`)
+A new agent dedicated to narrative report generation:
+```python
+from pydantic import BaseModel
+from pydantic_ai import Agent
+class NarrativeReport(BaseModel):
+    """Structured output for narrative report."""
+    executive_summary: str  # 2-3 sentences, key takeaways
+    background: str  # What is this condition, why does it matter
+    evidence_synthesis: str  # Mechanism + Clinical evidence in prose
+    recommendations: list[str]  # Actionable recommendations
+    limitations: str  # Honest limitations
+    references: list[Reference]  # Properly formatted
+class SynthesisAgent:
+    """Generates narrative research reports from structured data."""
+    async def synthesize(
+        self,
+        query: str,
+        evidence: list[Evidence],
+        assessment: JudgeAssessment,
+        domain: ResearchDomain,
+    ) -> NarrativeReport:
+        """Generate narrative prose report."""
+        # Build context
+        context = self._build_synthesis_context(evidence, assessment)
+        # ✅ LLM CALL for synthesis
+        result = await self.agent.run(
+            f"Generate a narrative research report for: {query}",
+            context=context,
+        )
+        return result.data
+```
+#### 2. Updated System Prompt (`src/prompts/synthesis.py`)
+```python
+SYNTHESIS_SYSTEM_PROMPT = """You are a scientific writer specializing in sexual health research.
+Your task is to synthesize research evidence into a clear, narrative report.
+## Writing Style
+- Write in PROSE PARAGRAPHS, not bullet points
+- Use academic but accessible language
+- Be specific about evidence strength (e.g., "in a randomized controlled trial of N=200")
+- Reference specific studies by author name
+- Provide quantitative results where available
+## Report Structure
+### Executive Summary (REQUIRED - 2-3 sentences)
+Summarize the key finding and clinical implication. Start with the bottom line.
+Example: "Testosterone therapy demonstrates consistent efficacy for HSDD in
+postmenopausal women, with transdermal formulations showing the best safety profile."
+### Background (REQUIRED - 1 paragraph)
+Explain the condition, its prevalence, and why this question matters clinically.
+### Evidence Synthesis (REQUIRED - 2-4 paragraphs)
+Weave together the evidence into a coherent narrative:
+- Mechanism of Action: How does the intervention work?
+- Clinical Evidence: What do the trials show? Be specific about effect sizes.
+- Comparative Evidence: How does it compare to alternatives?
+### Recommendations (REQUIRED - 3-5 bullet points)
+Provide actionable clinical recommendations based on the evidence.
+### Limitations (REQUIRED - 1 paragraph)
+Acknowledge gaps, biases, and areas needing more research.
+### References (REQUIRED)
+List the key references in proper academic format.
+## CRITICAL RULES
+1. ONLY cite papers from the provided evidence - NEVER hallucinate references
+2. Write in complete sentences and paragraphs
+3. Avoid lists/bullets except in Recommendations section
+4. Include specific statistics when available (p-values, effect sizes, CIs)
+5. Acknowledge uncertainty honestly
+"""
+```
+#### 3. Updated Orchestrator Integration
+```python
+# In src/orchestrators/simple.py
+async def _generate_synthesis(
+    self,
+    query: str,
+    evidence: list[Evidence],
+    assessment: JudgeAssessment,
+) -> str:
+    """Generate narrative synthesis using LLM."""
+    from src.agents.synthesis import SynthesisAgent
+    synthesis_agent = SynthesisAgent(domain=self.domain)
+    report = await synthesis_agent.synthesize(
+        query=query,
+        evidence=evidence,
+        assessment=assessment,
+        domain=self.domain,
+    )
+    return report.to_markdown()
+```
+### Few-Shot Example (Required for Quality)
+From issue #82, include a concrete example in the prompt:
+```python
+FEW_SHOT_EXAMPLE = """
+## Example: Strong Evidence Synthesis
+INPUT:
+- Query: "Alprostadil for erectile dysfunction"
+- Evidence: 15 papers including meta-analysis of 8 RCTs (N=3,247)
+- Mechanism Score: 9/10
+- Clinical Score: 9/10
+OUTPUT:
+### Executive Summary
+Alprostadil (prostaglandin E1) represents a well-established second-line treatment
+for erectile dysfunction, with meta-analytic evidence demonstrating 87% efficacy
+in achieving erections sufficient for intercourse. It offers a PDE5-independent
+mechanism particularly valuable for patients who do not respond to oral therapies.
+### Background
+Erectile dysfunction affects approximately 30 million men in the United States,
+with prevalence increasing with age. While PDE5 inhibitors (sildenafil, tadalafil)
+remain first-line therapy, approximately 30% of patients are non-responders or
+have contraindications. Alprostadil provides an alternative mechanism of action
+through direct smooth muscle relaxation.
+### Evidence Synthesis
+**Mechanism of Action**
+Alprostadil works through a distinct pathway from PDE5 inhibitors. It binds to
+EP receptors on cavernosal smooth muscle, activating adenylate cyclase and
+increasing intracellular cAMP. This leads to smooth muscle relaxation and
+penile erection independent of nitric oxide signaling. As noted by Smith et al.
+(2019), this mechanism explains its efficacy in patients with endothelial
+dysfunction or nerve damage.
+**Clinical Evidence**
+A meta-analysis by Johnson et al. (2020) pooled data from 8 randomized controlled
+trials (N=3,247) comparing intracavernosal alprostadil to placebo. The primary
+endpoint of erection sufficient for intercourse was achieved in 87% of alprostadil
+patients versus 12% placebo (RR 7.25, 95% CI: 5.8-9.1, p<0.001). The number
+needed to treat (NNT) was 1.3, indicating robust effect size.
+Subgroup analysis revealed consistent efficacy across etiologies:
+- Vascular ED: 85% response rate
+- Neurogenic ED: 91% response rate
+- Post-prostatectomy: 82% response rate
+### Recommendations
+1. Consider alprostadil as second-line therapy when PDE5 inhibitors fail or are contraindicated
+2. Start with 10 μg intracavernosal injection, titrate up to 40 μg based on response
+3. Provide in-office training for self-injection technique
+4. Monitor for penile fibrosis with long-term use (occurs in 3-5% of patients)
+### Limitations
+Long-term data beyond 2 years is limited. Head-to-head comparisons with
+newer therapies (low-intensity shockwave) are lacking. Most trials excluded
+patients with severe cardiovascular disease, limiting generalizability.
+The intraurethral formulation (MUSE) has lower efficacy (43%) than injection.
+### References
+1. Smith AB et al. (2019). Alprostadil mechanism of action in erectile tissue.
+   J Urol. https://pubmed.ncbi.nlm.nih.gov/12345678/
+2. Johnson CD et al. (2020). Meta-analysis of intracavernosal alprostadil.
+   J Sex Med. https://pubmed.ncbi.nlm.nih.gov/23456789/
+"""
+```
+## Implementation Plan
+### Phase 1: Core SynthesisAgent
+1. Create `src/agents/synthesis.py` with:
+   - `SynthesisAgent` class
+   - `NarrativeReport` Pydantic model
+   - LLM-based synthesis method
+2. Create `src/prompts/synthesis.py` with:
+   - `SYNTHESIS_SYSTEM_PROMPT`
+   - `FEW_SHOT_EXAMPLE`
+   - `format_synthesis_context()` helper
+3. Update `src/orchestrators/simple.py`:
+   - Make `_generate_synthesis()` async
+   - Call `SynthesisAgent.synthesize()`
+   - Keep `_generate_partial_synthesis()` as fallback (free tier)
+### Phase 2: Advanced Mode Integration
+4. Update `src/orchestrators/advanced.py`:
+   - Add `SynthesisAgent` to Magentic workflow
+   - Ensure it receives all evidence from prior agents
+### Phase 3: Test Coverage
+5. Create `tests/unit/agents/test_synthesis.py`:
+   - Test narrative output structure
+   - Test reference accuracy (no hallucinated citations)
+   - Test prose vs bullet point ratio
+### Phase 4: Domain Customization
+6. Update `src/config/domain.py`:
+   - Add `synthesis_system_prompt` field to `DomainConfig`
+   - Add `synthesis_few_shot_example` field
+   - Configure for sexual health domain
+## File Changes
+| File | Change |
+|------|--------|
+| `src/agents/synthesis.py` | NEW - SynthesisAgent |
+| `src/prompts/synthesis.py` | NEW - Synthesis prompts |
+| `src/orchestrators/simple.py` | MODIFY - Call SynthesisAgent |
+| `src/orchestrators/advanced.py` | MODIFY - Add to Magentic |
+| `src/config/domain.py` | MODIFY - Add synthesis prompts |
+| `src/utils/models.py` | MODIFY - Add NarrativeReport |
+| `tests/unit/agents/test_synthesis.py` | NEW - Tests |
+| `tests/unit/prompts/test_synthesis.py` | NEW - Tests |
+## Acceptance Criteria
+- [ ] Report contains **paragraph-form prose**, not just bullet points
+- [ ] Report has **executive summary** (2-3 sentences)
+- [ ] Report has **background section** explaining the condition
+- [ ] Report has **synthesized narrative** weaving evidence together
+- [ ] Report has **actionable recommendations**
+- [ ] Report has **limitations** section (honest acknowledgment)
+- [ ] Citations are **properly formatted** (author, year, title, URL)
+- [ ] No hallucinated references (CRITICAL)
+- [ ] Works in both simple and advanced modes
+- [ ] Falls back gracefully on free tier (minimal templating OK)
+## Test Criteria
+```python
+def test_report_is_narrative_not_bullets():
+    """Report should be mostly prose, not bullet points."""
+    report = synthesis_agent.synthesize(...)
+    # Count paragraphs vs bullet points
+    paragraphs = len([p for p in report.split('\n\n') if len(p) > 100])
+    bullets = report.count('\n- ')
+    # Prose should dominate
+    assert paragraphs > bullets, "Report should be narrative, not bullet list"
+def test_references_not_hallucinated():
+    """All references must come from provided evidence."""
+    evidence_urls = {e.citation.url for e in evidence}
+    report = synthesis_agent.synthesize(...)
+    for ref in report.references:
+        assert ref.url in evidence_urls, f"Hallucinated reference: {ref.url}"
+```
+## Related Microsoft Agent Framework Patterns
+| Pattern | Location | Application |
+|---------|----------|-------------|
+| Custom Aggregator | `concurrent_custom_aggregator.py` | LLM-based synthesis |
+| Fan-Out/Fan-In | `fan_out_fan_in_edges.py` | Multi-expert synthesis |
+| Research Assistant | `research_assistant_agent.py` | Tool-based research |
+| Sequential Orchestration | `spec-001-foundry-sdk-alignment.md` | Analyst→Writer→Editor chain |
+## References
+- GitHub Issue #85: Report lacks narrative synthesis
+- GitHub Issue #86: Microsoft Agent Framework patterns
+- LangChain Deep Agents blog: Few-shot examples importance
+- Open Deep Research Architecture: Scoping + Synthesis pattern