VibecoderMcSwaggins commited on
Commit
25c3ff9
·
1 Parent(s): 627c291

docs: add SPEC_12 for narrative report synthesis

Browse files

Detailed spec for fixing issue #85 - reports output structured
metadata instead of synthesized prose narrative.

Key findings:
- Current `_generate_synthesis()` is string templating with NO LLM call
- Microsoft agent-framework shows custom aggregator pattern
- Need dedicated SynthesisAgent with LLM-based prose generation

Implementation plan:
1. Create SynthesisAgent (`src/agents/synthesis.py`)
2. Add synthesis prompts with few-shot examples
3. Update orchestrators to call SynthesisAgent
4. Add proper test coverage

References Microsoft agent-framework patterns:
- concurrent_custom_aggregator.py
- fan_out_fan_in_edges.py

docs/specs/SPEC_12_NARRATIVE_SYNTHESIS.md ADDED
@@ -0,0 +1,469 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC_12: Narrative Report Synthesis
2
+
3
+ **Status**: Draft
4
+ **Priority**: P1 - Core deliverable
5
+ **Related Issues**: #85, #86
6
+ **Related Spec**: SPEC_11 (Sexual Health Focus)
7
+
8
+ ## Problem Statement
9
+
10
+ DeepBoner's report generation outputs **structured metadata** instead of **synthesized prose**. The current implementation uses string templating with NO LLM call for narrative synthesis.
11
+
12
+ ### Current Output (Actual)
13
+
14
+ ```markdown
15
+ ## Sexual Health Analysis
16
+
17
+ ### Question
18
+ Testosterone therapy for hypoactive sexual desire disorder?
19
+
20
+ ### Drug Candidates
21
+ - **Testosterone**
22
+ - **LibiGel**
23
+ - **Androgel**
24
+
25
+ ### Key Findings
26
+ - Testosterone therapy improves sexual desire and activity in postmenopausal women with HSDD.
27
+ - Transdermal testosterone is a preferred formulation.
28
+
29
+ ### Assessment
30
+ - **Mechanism Score**: 8/10
31
+ - **Clinical Evidence Score**: 9/10
32
+ - **Confidence**: 90%
33
+
34
+ ### Reasoning
35
+ The evidence provides a clear understanding of the mechanism of action...
36
+
37
+ ### Citations (33 sources)
38
+ 1. [Title](url)...
39
+ ```
40
+
41
+ ### Expected Output (Professional Research Report)
42
+
43
+ ```markdown
44
+ ## Sexual Health Research Report: Testosterone Therapy for Hypoactive Sexual Desire Disorder
45
+
46
+ ### Executive Summary
47
+
48
+ Testosterone therapy represents a well-established, evidence-based treatment for
49
+ hypoactive sexual desire disorder (HSDD) in postmenopausal women. Our analysis of
50
+ 33 peer-reviewed sources reveals consistent findings across multiple randomized
51
+ controlled trials, with transdermal testosterone demonstrating the strongest
52
+ efficacy-safety profile.
53
+
54
+ ### Background
55
+
56
+ Hypoactive sexual desire disorder affects an estimated 12% of postmenopausal women
57
+ and is characterized by persistent lack of sexual interest causing personal distress.
58
+ The International Society for the Study of Women's Sexual Health (ISSWSH) published
59
+ clinical guidelines in 2021 establishing testosterone as a recommended intervention...
60
+
61
+ ### Evidence Synthesis
62
+
63
+ **Mechanism of Action**
64
+
65
+ Testosterone exerts its effects on sexual desire through multiple pathways. At the
66
+ hypothalamic level, testosterone modulates dopaminergic signaling that underlies
67
+ libido. Evidence from Smith et al. (2021) demonstrates that androgen receptor
68
+ activation in the central nervous system correlates with subjective measures of
69
+ sexual desire (r=0.67, p<0.001)...
70
+
71
+ **Clinical Trial Evidence**
72
+
73
+ A systematic review of 8 randomized controlled trials (N=3,035) demonstrated that
74
+ transdermal testosterone significantly improved:
75
+ - Satisfying sexual events: +2.1 per month (95% CI: 1.4-2.8)
76
+ - Sexual desire scores: +0.4 on validated scales (p<0.001)
77
+
78
+ The Global Consensus Position Statement (2019) and ISSWSH Guidelines (2021) both
79
+ recommend transdermal testosterone as first-line therapy...
80
+
81
+ ### Recommendations
82
+
83
+ Based on this evidence synthesis:
84
+ 1. **Transdermal testosterone** (300 μg/day) is recommended for postmenopausal
85
+ women with HSDD not primarily related to modifiable factors
86
+ 2. **Duration**: Continue for 6 months to assess efficacy; discontinue if no benefit
87
+ 3. **Monitoring**: Lipid profile and liver function at baseline and 3-6 months
88
+
89
+ ### Limitations & Future Directions
90
+
91
+ - Long-term safety data beyond 24 months remains limited
92
+ - Efficacy in premenopausal women less well-established
93
+ - Head-to-head comparisons between formulations are needed
94
+
95
+ ### References
96
+
97
+ 1. Parish SJ et al. (2021). International Society for the Study of Women's Sexual
98
+ Health Clinical Practice Guideline for the Use of Systemic Testosterone for
99
+ Hypoactive Sexual Desire Disorder in Women. J Sex Med. https://pubmed.ncbi.nlm.nih.gov/33814355/
100
+ ...
101
+ ```
102
+
103
+ ## Root Cause Analysis
104
+
105
+ ### Current Implementation (`src/orchestrators/simple.py:448-505`)
106
+
107
+ ```python
108
+ def _generate_synthesis(
109
+ self,
110
+ query: str,
111
+ evidence: list[Evidence],
112
+ assessment: JudgeAssessment,
113
+ ) -> str:
114
+ # ❌ NO LLM CALL - Just string templating!
115
+ drug_list = "\n".join([f"- **{d}**" for d in assessment.details.drug_candidates])
116
+ findings_list = "\n".join([f"- {f}" for f in assessment.details.key_findings])
117
+
118
+ return f"""{self.domain_config.report_title}
119
+ ### Question
120
+ {query}
121
+ ### Drug Candidates
122
+ {drug_list}
123
+ ...
124
+ """
125
+ ```
126
+
127
+ **The problem**: No LLM is ever called to synthesize the report. It's just formatted
128
+ data from the JudgeAssessment.
129
+
130
+ ### Microsoft Agent Framework Pattern
131
+
132
+ From `reference_repos/agent-framework/python/samples/getting_started/workflows/orchestration/concurrent_custom_aggregator.py`:
133
+
134
+ ```python
135
+ # Define a custom aggregator callback that uses the chat client to SYNTHESIZE
136
+ async def summarize_results(results: list[Any]) -> str:
137
+ # Collect expert outputs
138
+ expert_sections: list[str] = []
139
+ for r in results:
140
+ messages = getattr(r.agent_run_response, "messages", [])
141
+ final_text = messages[-1].text if messages else "(no content)"
142
+ expert_sections.append(f"{r.executor_id}:\n{final_text}")
143
+
144
+ # Ask the MODEL to synthesize
145
+ system_msg = ChatMessage(
146
+ Role.SYSTEM,
147
+ text=(
148
+ "You are a helpful assistant that consolidates multiple domain expert outputs "
149
+ "into one cohesive, concise summary with clear takeaways."
150
+ ),
151
+ )
152
+ user_msg = ChatMessage(Role.USER, text="\n\n".join(expert_sections))
153
+
154
+ # ✅ LLM CALL for synthesis
155
+ response = await chat_client.get_response([system_msg, user_msg])
156
+ return response.messages[-1].text
157
+ ```
158
+
159
+ **The pattern**: The aggregator makes an **LLM call** to synthesize, not string concatenation.
160
+
161
+ ## Solution Design
162
+
163
+ ### Architecture
164
+
165
+ ```
166
+ Current:
167
+ Evidence → Judge → {structured data} → String Template → Bullet Points
168
+
169
+ Proposed:
170
+ Evidence → Judge → {structured data} → SynthesisAgent → Narrative Prose
171
+
172
+ LLM-based synthesis
173
+ ```
174
+
175
+ ### Components
176
+
177
+ #### 1. `SynthesisAgent` (`src/agents/synthesis.py`)
178
+
179
+ A new agent dedicated to narrative report generation:
180
+
181
+ ```python
182
+ from pydantic import BaseModel
183
+ from pydantic_ai import Agent
184
+
185
+ class NarrativeReport(BaseModel):
186
+ """Structured output for narrative report."""
187
+ executive_summary: str # 2-3 sentences, key takeaways
188
+ background: str # What is this condition, why does it matter
189
+ evidence_synthesis: str # Mechanism + Clinical evidence in prose
190
+ recommendations: list[str] # Actionable recommendations
191
+ limitations: str # Honest limitations
192
+ references: list[Reference] # Properly formatted
193
+
194
+ class SynthesisAgent:
195
+ """Generates narrative research reports from structured data."""
196
+
197
+ async def synthesize(
198
+ self,
199
+ query: str,
200
+ evidence: list[Evidence],
201
+ assessment: JudgeAssessment,
202
+ domain: ResearchDomain,
203
+ ) -> NarrativeReport:
204
+ """Generate narrative prose report."""
205
+ # Build context
206
+ context = self._build_synthesis_context(evidence, assessment)
207
+
208
+ # ✅ LLM CALL for synthesis
209
+ result = await self.agent.run(
210
+ f"Generate a narrative research report for: {query}",
211
+ context=context,
212
+ )
213
+ return result.data
214
+ ```
215
+
216
+ #### 2. Updated System Prompt (`src/prompts/synthesis.py`)
217
+
218
+ ```python
219
+ SYNTHESIS_SYSTEM_PROMPT = """You are a scientific writer specializing in sexual health research.
220
+ Your task is to synthesize research evidence into a clear, narrative report.
221
+
222
+ ## Writing Style
223
+ - Write in PROSE PARAGRAPHS, not bullet points
224
+ - Use academic but accessible language
225
+ - Be specific about evidence strength (e.g., "in a randomized controlled trial of N=200")
226
+ - Reference specific studies by author name
227
+ - Provide quantitative results where available
228
+
229
+ ## Report Structure
230
+
231
+ ### Executive Summary (REQUIRED - 2-3 sentences)
232
+ Summarize the key finding and clinical implication. Start with the bottom line.
233
+ Example: "Testosterone therapy demonstrates consistent efficacy for HSDD in
234
+ postmenopausal women, with transdermal formulations showing the best safety profile."
235
+
236
+ ### Background (REQUIRED - 1 paragraph)
237
+ Explain the condition, its prevalence, and why this question matters clinically.
238
+
239
+ ### Evidence Synthesis (REQUIRED - 2-4 paragraphs)
240
+ Weave together the evidence into a coherent narrative:
241
+ - Mechanism of Action: How does the intervention work?
242
+ - Clinical Evidence: What do the trials show? Be specific about effect sizes.
243
+ - Comparative Evidence: How does it compare to alternatives?
244
+
245
+ ### Recommendations (REQUIRED - 3-5 bullet points)
246
+ Provide actionable clinical recommendations based on the evidence.
247
+
248
+ ### Limitations (REQUIRED - 1 paragraph)
249
+ Acknowledge gaps, biases, and areas needing more research.
250
+
251
+ ### References (REQUIRED)
252
+ List the key references in proper academic format.
253
+
254
+ ## CRITICAL RULES
255
+ 1. ONLY cite papers from the provided evidence - NEVER hallucinate references
256
+ 2. Write in complete sentences and paragraphs
257
+ 3. Avoid lists/bullets except in Recommendations section
258
+ 4. Include specific statistics when available (p-values, effect sizes, CIs)
259
+ 5. Acknowledge uncertainty honestly
260
+ """
261
+ ```
262
+
263
+ #### 3. Updated Orchestrator Integration
264
+
265
+ ```python
266
+ # In src/orchestrators/simple.py
267
+
268
+ async def _generate_synthesis(
269
+ self,
270
+ query: str,
271
+ evidence: list[Evidence],
272
+ assessment: JudgeAssessment,
273
+ ) -> str:
274
+ """Generate narrative synthesis using LLM."""
275
+ from src.agents.synthesis import SynthesisAgent
276
+
277
+ synthesis_agent = SynthesisAgent(domain=self.domain)
278
+
279
+ report = await synthesis_agent.synthesize(
280
+ query=query,
281
+ evidence=evidence,
282
+ assessment=assessment,
283
+ domain=self.domain,
284
+ )
285
+
286
+ return report.to_markdown()
287
+ ```
288
+
289
+ ### Few-Shot Example (Required for Quality)
290
+
291
+ From issue #82, include a concrete example in the prompt:
292
+
293
+ ```python
294
+ FEW_SHOT_EXAMPLE = """
295
+ ## Example: Strong Evidence Synthesis
296
+
297
+ INPUT:
298
+ - Query: "Alprostadil for erectile dysfunction"
299
+ - Evidence: 15 papers including meta-analysis of 8 RCTs (N=3,247)
300
+ - Mechanism Score: 9/10
301
+ - Clinical Score: 9/10
302
+
303
+ OUTPUT:
304
+
305
+ ### Executive Summary
306
+
307
+ Alprostadil (prostaglandin E1) represents a well-established second-line treatment
308
+ for erectile dysfunction, with meta-analytic evidence demonstrating 87% efficacy
309
+ in achieving erections sufficient for intercourse. It offers a PDE5-independent
310
+ mechanism particularly valuable for patients who do not respond to oral therapies.
311
+
312
+ ### Background
313
+
314
+ Erectile dysfunction affects approximately 30 million men in the United States,
315
+ with prevalence increasing with age. While PDE5 inhibitors (sildenafil, tadalafil)
316
+ remain first-line therapy, approximately 30% of patients are non-responders or
317
+ have contraindications. Alprostadil provides an alternative mechanism of action
318
+ through direct smooth muscle relaxation.
319
+
320
+ ### Evidence Synthesis
321
+
322
+ **Mechanism of Action**
323
+
324
+ Alprostadil works through a distinct pathway from PDE5 inhibitors. It binds to
325
+ EP receptors on cavernosal smooth muscle, activating adenylate cyclase and
326
+ increasing intracellular cAMP. This leads to smooth muscle relaxation and
327
+ penile erection independent of nitric oxide signaling. As noted by Smith et al.
328
+ (2019), this mechanism explains its efficacy in patients with endothelial
329
+ dysfunction or nerve damage.
330
+
331
+ **Clinical Evidence**
332
+
333
+ A meta-analysis by Johnson et al. (2020) pooled data from 8 randomized controlled
334
+ trials (N=3,247) comparing intracavernosal alprostadil to placebo. The primary
335
+ endpoint of erection sufficient for intercourse was achieved in 87% of alprostadil
336
+ patients versus 12% placebo (RR 7.25, 95% CI: 5.8-9.1, p<0.001). The number
337
+ needed to treat (NNT) was 1.3, indicating robust effect size.
338
+
339
+ Subgroup analysis revealed consistent efficacy across etiologies:
340
+ - Vascular ED: 85% response rate
341
+ - Neurogenic ED: 91% response rate
342
+ - Post-prostatectomy: 82% response rate
343
+
344
+ ### Recommendations
345
+
346
+ 1. Consider alprostadil as second-line therapy when PDE5 inhibitors fail or are contraindicated
347
+ 2. Start with 10 μg intracavernosal injection, titrate up to 40 μg based on response
348
+ 3. Provide in-office training for self-injection technique
349
+ 4. Monitor for penile fibrosis with long-term use (occurs in 3-5% of patients)
350
+
351
+ ### Limitations
352
+
353
+ Long-term data beyond 2 years is limited. Head-to-head comparisons with
354
+ newer therapies (low-intensity shockwave) are lacking. Most trials excluded
355
+ patients with severe cardiovascular disease, limiting generalizability.
356
+ The intraurethral formulation (MUSE) has lower efficacy (43%) than injection.
357
+
358
+ ### References
359
+
360
+ 1. Smith AB et al. (2019). Alprostadil mechanism of action in erectile tissue.
361
+ J Urol. https://pubmed.ncbi.nlm.nih.gov/12345678/
362
+ 2. Johnson CD et al. (2020). Meta-analysis of intracavernosal alprostadil.
363
+ J Sex Med. https://pubmed.ncbi.nlm.nih.gov/23456789/
364
+ """
365
+ ```
366
+
367
+ ## Implementation Plan
368
+
369
+ ### Phase 1: Core SynthesisAgent
370
+
371
+ 1. Create `src/agents/synthesis.py` with:
372
+ - `SynthesisAgent` class
373
+ - `NarrativeReport` Pydantic model
374
+ - LLM-based synthesis method
375
+
376
+ 2. Create `src/prompts/synthesis.py` with:
377
+ - `SYNTHESIS_SYSTEM_PROMPT`
378
+ - `FEW_SHOT_EXAMPLE`
379
+ - `format_synthesis_context()` helper
380
+
381
+ 3. Update `src/orchestrators/simple.py`:
382
+ - Make `_generate_synthesis()` async
383
+ - Call `SynthesisAgent.synthesize()`
384
+ - Keep `_generate_partial_synthesis()` as fallback (free tier)
385
+
386
+ ### Phase 2: Advanced Mode Integration
387
+
388
+ 4. Update `src/orchestrators/advanced.py`:
389
+ - Add `SynthesisAgent` to Magentic workflow
390
+ - Ensure it receives all evidence from prior agents
391
+
392
+ ### Phase 3: Test Coverage
393
+
394
+ 5. Create `tests/unit/agents/test_synthesis.py`:
395
+ - Test narrative output structure
396
+ - Test reference accuracy (no hallucinated citations)
397
+ - Test prose vs bullet point ratio
398
+
399
+ ### Phase 4: Domain Customization
400
+
401
+ 6. Update `src/config/domain.py`:
402
+ - Add `synthesis_system_prompt` field to `DomainConfig`
403
+ - Add `synthesis_few_shot_example` field
404
+ - Configure for sexual health domain
405
+
406
+ ## File Changes
407
+
408
+ | File | Change |
409
+ |------|--------|
410
+ | `src/agents/synthesis.py` | NEW - SynthesisAgent |
411
+ | `src/prompts/synthesis.py` | NEW - Synthesis prompts |
412
+ | `src/orchestrators/simple.py` | MODIFY - Call SynthesisAgent |
413
+ | `src/orchestrators/advanced.py` | MODIFY - Add to Magentic |
414
+ | `src/config/domain.py` | MODIFY - Add synthesis prompts |
415
+ | `src/utils/models.py` | MODIFY - Add NarrativeReport |
416
+ | `tests/unit/agents/test_synthesis.py` | NEW - Tests |
417
+ | `tests/unit/prompts/test_synthesis.py` | NEW - Tests |
418
+
419
+ ## Acceptance Criteria
420
+
421
+ - [ ] Report contains **paragraph-form prose**, not just bullet points
422
+ - [ ] Report has **executive summary** (2-3 sentences)
423
+ - [ ] Report has **background section** explaining the condition
424
+ - [ ] Report has **synthesized narrative** weaving evidence together
425
+ - [ ] Report has **actionable recommendations**
426
+ - [ ] Report has **limitations** section (honest acknowledgment)
427
+ - [ ] Citations are **properly formatted** (author, year, title, URL)
428
+ - [ ] No hallucinated references (CRITICAL)
429
+ - [ ] Works in both simple and advanced modes
430
+ - [ ] Falls back gracefully on free tier (minimal templating OK)
431
+
432
+ ## Test Criteria
433
+
434
+ ```python
435
+ def test_report_is_narrative_not_bullets():
436
+ """Report should be mostly prose, not bullet points."""
437
+ report = synthesis_agent.synthesize(...)
438
+
439
+ # Count paragraphs vs bullet points
440
+ paragraphs = len([p for p in report.split('\n\n') if len(p) > 100])
441
+ bullets = report.count('\n- ')
442
+
443
+ # Prose should dominate
444
+ assert paragraphs > bullets, "Report should be narrative, not bullet list"
445
+
446
+ def test_references_not_hallucinated():
447
+ """All references must come from provided evidence."""
448
+ evidence_urls = {e.citation.url for e in evidence}
449
+ report = synthesis_agent.synthesize(...)
450
+
451
+ for ref in report.references:
452
+ assert ref.url in evidence_urls, f"Hallucinated reference: {ref.url}"
453
+ ```
454
+
455
+ ## Related Microsoft Agent Framework Patterns
456
+
457
+ | Pattern | Location | Application |
458
+ |---------|----------|-------------|
459
+ | Custom Aggregator | `concurrent_custom_aggregator.py` | LLM-based synthesis |
460
+ | Fan-Out/Fan-In | `fan_out_fan_in_edges.py` | Multi-expert synthesis |
461
+ | Research Assistant | `research_assistant_agent.py` | Tool-based research |
462
+ | Sequential Orchestration | `spec-001-foundry-sdk-alignment.md` | Analyst→Writer→Editor chain |
463
+
464
+ ## References
465
+
466
+ - GitHub Issue #85: Report lacks narrative synthesis
467
+ - GitHub Issue #86: Microsoft Agent Framework patterns
468
+ - LangChain Deep Agents blog: Few-shot examples importance
469
+ - Open Deep Research Architecture: Scoping + Synthesis pattern