Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /implementation /13_phase_modal_integration.md

VibecoderMcSwaggins

fix: replace remaining bioRxiv refs with Europe PMC in docs

a2ca5f6 5 months ago

preview code

raw

history blame

40.7 kB

	# Phase 13 Implementation Spec: Modal Pipeline Integration

	Goal: Wire existing Modal code execution into the agent pipeline.
	Philosophy: "Sandboxed execution makes AI-generated code trustworthy."
	Prerequisite: Phase 12 complete (MCP server working)
	Priority: P1 - HIGH VALUE ($2,500 Modal Innovation Award)
	Estimated Time: 2-3 hours

	---

	## 1. Why Modal Integration?

	### Current State Analysis

	Mario already implemented `src/tools/code_execution.py`:

	\| Component \| Status \| Notes \|
	\|-----------\|--------\|-------\|
	\| `ModalCodeExecutor` class \| Built \| Executes Python in Modal sandbox \|
	\| `SANDBOX_LIBRARIES` \| Defined \| pandas, numpy, scipy, etc. \|
	\| `execute()` method \| Implemented \| Stdout/stderr capture \|
	\| `execute_with_return()` \| Implemented \| Returns `result` variable \|
	\| `AnalysisAgent` \| Built \| Uses Modal for statistical analysis \|
	\| Pipeline Integration \| MISSING \| Not wired into main orchestrator \|

	### What's Missing

	```text
	Current Flow:
	User Query → Orchestrator → Search → Judge → [Report] → Done

	With Modal:
	User Query → Orchestrator → Search → Judge → [Analysis*] → Report → Done
	↓
	Modal Sandbox Execution
	```

	*The AnalysisAgent exists but is NOT called by either orchestrator.

	---

	## 2. Critical Dependency Analysis

	### The Problem (Senior Feedback)

	```python
	# src/agents/analysis_agent.py - Line 8
	from agent_framework import (
	AgentRunResponse,
	BaseAgent,
	...
	)
	```

	```toml
	# pyproject.toml - agent-framework is OPTIONAL
	[project.optional-dependencies]
	magentic = [
	"agent-framework-core",
	]
	```

	If we import `AnalysisAgent` in the simple orchestrator without the `magentic` extra installed, the app CRASHES on startup.

	### The SOLID Solution

	Single Responsibility Principle: Decouple Modal execution logic from `agent_framework`.

	```text
	BEFORE (Coupled):
	AnalysisAgent (requires agent_framework)
	↓
	ModalCodeExecutor

	AFTER (Decoupled):
	StatisticalAnalyzer (no agent_framework dependency) ← Simple mode uses this
	↓
	ModalCodeExecutor
	↑
	AnalysisAgent (wraps StatisticalAnalyzer) ← Magentic mode uses this
	```

	Key insight: Create `src/services/statistical_analyzer.py` with ZERO agent_framework imports.

	---

	## 3. Prize Opportunity

	### Modal Innovation Award: $2,500

	Judging Criteria:
	1. Sandbox Isolation - Code runs in container, not local
	2. Scientific Computing - Real pandas/scipy analysis
	3. Safety - Can't access local filesystem
	4. Speed - Modal's fast cold starts

	### What We Need to Show

	```python
	# LLM generates analysis code
	code = """
	import pandas as pd
	import scipy.stats as stats

	data = pd.DataFrame({
	'study': ['Study1', 'Study2', 'Study3'],
	'effect_size': [0.45, 0.52, 0.38],
	'sample_size': [120, 85, 200]
	})

	weighted_mean = (data['effect_size'] * data['sample_size']).sum() / data['sample_size'].sum()
	t_stat, p_value = stats.ttest_1samp(data['effect_size'], 0)

	print(f"Weighted Effect Size: {weighted_mean:.3f}")
	print(f"P-value: {p_value:.4f}")

	result = "SUPPORTED" if p_value < 0.05 else "INCONCLUSIVE"
	"""

	# Executed SAFELY in Modal sandbox
	executor = get_code_executor()
	output = executor.execute(code) # Runs in isolated container!
	```

	---

	## 4. Technical Specification

	### 4.1 Dependencies

	```toml
	# pyproject.toml - NO CHANGES to dependencies
	# StatisticalAnalyzer uses only:
	# - pydantic-ai (already in main deps)
	# - modal (already in main deps)
	# - src.tools.code_execution (no agent_framework)
	```

	### 4.2 Environment Variables

	```bash
	# .env
	MODAL_TOKEN_ID=your-token-id
	MODAL_TOKEN_SECRET=your-token-secret
	```

	### 4.3 Integration Points

	\| Integration Point \| File \| Change Required \|
	\|-------------------\|------\|-----------------\|
	\| New Service \| `src/services/statistical_analyzer.py` \| CREATE (no agent_framework) \|
	\| Simple Orchestrator \| `src/orchestrator.py` \| Use `StatisticalAnalyzer` \|
	\| Config \| `src/utils/config.py` \| Add `enable_modal_analysis` setting \|
	\| AnalysisAgent \| `src/agents/analysis_agent.py` \| Refactor to wrap `StatisticalAnalyzer` \|
	\| MCP Tool \| `src/mcp_tools.py` \| Add `analyze_hypothesis` tool \|

	---

	## 5. Implementation

	### 5.1 Configuration Update (`src/utils/config.py`)

	```python
	class Settings(BaseSettings):
	# ... existing settings ...

	# Modal Configuration
	modal_token_id: str \| None = None
	modal_token_secret: str \| None = None
	enable_modal_analysis: bool = False # Opt-in for hackathon demo

	@property
	def modal_available(self) -> bool:
	"""Check if Modal credentials are configured."""
	return bool(self.modal_token_id and self.modal_token_secret)
	```

	### 5.2 StatisticalAnalyzer Service (`src/services/statistical_analyzer.py`)

	This is the key fix - NO agent_framework imports.

	```python
	"""Statistical analysis service using Modal code execution.

	This module provides Modal-based statistical analysis WITHOUT depending on
	agent_framework. This allows it to be used in the simple orchestrator mode
	without requiring the magentic optional dependency.

	The AnalysisAgent (in src/agents/) wraps this service for magentic mode.
	"""

	import asyncio
	import re
	from functools import partial
	from typing import Any

	from pydantic import BaseModel, Field
	from pydantic_ai import Agent

	from src.agent_factory.judges import get_model
	from src.tools.code_execution import (
	CodeExecutionError,
	get_code_executor,
	get_sandbox_library_prompt,
	)
	from src.utils.models import Evidence


	class AnalysisResult(BaseModel):
	"""Result of statistical analysis."""

	verdict: str = Field(
	description="SUPPORTED, REFUTED, or INCONCLUSIVE",
	)
	confidence: float = Field(ge=0.0, le=1.0, description="Confidence in verdict (0-1)")
	statistical_evidence: str = Field(
	description="Summary of statistical findings from code execution"
	)
	code_generated: str = Field(description="Python code that was executed")
	execution_output: str = Field(description="Output from code execution")
	key_findings: list[str] = Field(default_factory=list, description="Key takeaways")
	limitations: list[str] = Field(default_factory=list, description="Limitations")


	class StatisticalAnalyzer:
	"""Performs statistical analysis using Modal code execution.

	This service:
	1. Generates Python code for statistical analysis using LLM
	2. Executes code in Modal sandbox
	3. Interprets results
	4. Returns verdict (SUPPORTED/REFUTED/INCONCLUSIVE)

	Note: This class has NO agent_framework dependency, making it safe
	to use in the simple orchestrator without the magentic extra.
	"""

	def __init__(self) -> None:
	"""Initialize the analyzer."""
	self._code_executor: Any = None
	self._agent: Agent[None, str] \| None = None

	def _get_code_executor(self) -> Any:
	"""Lazy initialization of code executor."""
	if self._code_executor is None:
	self._code_executor = get_code_executor()
	return self._code_executor

	def _get_agent(self) -> Agent[None, str]:
	"""Lazy initialization of LLM agent for code generation."""
	if self._agent is None:
	library_versions = get_sandbox_library_prompt()
	self._agent = Agent(
	model=get_model(),
	output_type=str,
	system_prompt=f"""You are a biomedical data scientist.

	Generate Python code to analyze research evidence and test hypotheses.

	Guidelines:
	1. Use pandas, numpy, scipy.stats for analysis
	2. Print clear, interpretable results
	3. Include statistical tests (t-tests, chi-square, etc.)
	4. Calculate effect sizes and confidence intervals
	5. Keep code concise (<50 lines)
	6. Set 'result' variable to SUPPORTED, REFUTED, or INCONCLUSIVE

	Available libraries:
	{library_versions}

	Output format: Return ONLY executable Python code, no explanations.""",
	)
	return self._agent

	async def analyze(
	self,
	query: str,
	evidence: list[Evidence],
	hypothesis: dict[str, Any] \| None = None,
	) -> AnalysisResult:
	"""Run statistical analysis on evidence.

	Args:
	query: The research question
	evidence: List of Evidence objects to analyze
	hypothesis: Optional hypothesis dict with drug, target, pathway, effect

	Returns:
	AnalysisResult with verdict and statistics
	"""
	# Build analysis prompt
	evidence_summary = self._summarize_evidence(evidence[:10])
	hypothesis_text = ""
	if hypothesis:
	hypothesis_text = f"""
	Hypothesis: {hypothesis.get('drug', 'Unknown')} → {hypothesis.get('target', '?')} → {hypothesis.get('pathway', '?')} → {hypothesis.get('effect', '?')}
	Confidence: {hypothesis.get('confidence', 0.5):.0%}
	"""

	prompt = f"""Generate Python code to statistically analyze:

	Research Question: {query}
	{hypothesis_text}

	Evidence Summary:
	{evidence_summary}

	Generate executable Python code to analyze this evidence."""

	try:
	# Generate code
	agent = self._get_agent()
	code_result = await agent.run(prompt)
	generated_code = code_result.output

	# Execute in Modal sandbox
	loop = asyncio.get_running_loop()
	executor = self._get_code_executor()
	execution = await loop.run_in_executor(
	None, partial(executor.execute, generated_code, timeout=120)
	)

	if not execution["success"]:
	return AnalysisResult(
	verdict="INCONCLUSIVE",
	confidence=0.0,
	statistical_evidence=f"Execution failed: {execution['error']}",
	code_generated=generated_code,
	execution_output=execution.get("stderr", ""),
	key_findings=[],
	limitations=["Code execution failed"],
	)

	# Interpret results
	return self._interpret_results(generated_code, execution)

	except CodeExecutionError as e:
	return AnalysisResult(
	verdict="INCONCLUSIVE",
	confidence=0.0,
	statistical_evidence=str(e),
	code_generated="",
	execution_output="",
	key_findings=[],
	limitations=[f"Analysis error: {e}"],
	)

	def _summarize_evidence(self, evidence: list[Evidence]) -> str:
	"""Summarize evidence for code generation prompt."""
	if not evidence:
	return "No evidence available."

	lines = []
	for i, ev in enumerate(evidence[:5], 1):
	lines.append(f"{i}. {ev.content[:200]}...")
	lines.append(f" Source: {ev.citation.title}")
	lines.append(f" Relevance: {ev.relevance:.0%}\n")

	return "\n".join(lines)

	def _interpret_results(
	self,
	code: str,
	execution: dict[str, Any],
	) -> AnalysisResult:
	"""Interpret code execution results."""
	stdout = execution["stdout"]
	stdout_upper = stdout.upper()

	# Extract verdict with robust word-boundary matching
	verdict = "INCONCLUSIVE"
	if re.search(r"\bSUPPORTED\b", stdout_upper) and not re.search(
	r"\b(?:NOT\|UN)SUPPORTED\b", stdout_upper
	):
	verdict = "SUPPORTED"
	elif re.search(r"\bREFUTED\b", stdout_upper):
	verdict = "REFUTED"

	# Extract key findings
	key_findings = []
	for line in stdout.split("\n"):
	line_lower = line.lower()
	if any(kw in line_lower for kw in ["p-value", "significant", "effect", "mean"]):
	key_findings.append(line.strip())

	# Calculate confidence from p-values
	confidence = self._calculate_confidence(stdout)

	return AnalysisResult(
	verdict=verdict,
	confidence=confidence,
	statistical_evidence=stdout.strip(),
	code_generated=code,
	execution_output=stdout,
	key_findings=key_findings[:5],
	limitations=[
	"Analysis based on summary data only",
	"Limited to available evidence",
	"Statistical tests assume data independence",
	],
	)

	def _calculate_confidence(self, output: str) -> float:
	"""Calculate confidence based on statistical results."""
	p_values = re.findall(r"p[-\s]?value[:\s]+(\d+\.?\d*)", output.lower())

	if p_values:
	try:
	min_p = min(float(p) for p in p_values)
	if min_p < 0.001:
	return 0.95
	elif min_p < 0.01:
	return 0.90
	elif min_p < 0.05:
	return 0.80
	else:
	return 0.60
	except ValueError:
	pass

	return 0.70 # Default


	# Singleton for reuse
	_analyzer: StatisticalAnalyzer \| None = None


	def get_statistical_analyzer() -> StatisticalAnalyzer:
	"""Get or create singleton StatisticalAnalyzer instance."""
	global _analyzer
	if _analyzer is None:
	_analyzer = StatisticalAnalyzer()
	return _analyzer
	```

	### 5.3 Simple Orchestrator Update (`src/orchestrator.py`)

	Uses `StatisticalAnalyzer` directly - NO agent_framework import.

	```python
	"""Main orchestrator with optional Modal analysis."""

	from src.utils.config import settings

	# ... existing imports ...


	class Orchestrator:
	"""Search-Judge-Analyze orchestration loop."""

	def __init__(
	self,
	search_handler: SearchHandlerProtocol,
	judge_handler: JudgeHandlerProtocol,
	config: OrchestratorConfig \| None = None,
	enable_analysis: bool = False, # New parameter
	) -> None:
	self.search = search_handler
	self.judge = judge_handler
	self.config = config or OrchestratorConfig()
	self.history: list[dict[str, Any]] = []
	self._enable_analysis = enable_analysis and settings.modal_available

	# Lazy-load analysis (NO agent_framework dependency!)
	self._analyzer: Any = None

	def _get_analyzer(self) -> Any:
	"""Lazy initialization of StatisticalAnalyzer.

	Note: This imports from src.services, NOT src.agents,
	so it works without the magentic optional dependency.
	"""
	if self._analyzer is None:
	from src.services.statistical_analyzer import get_statistical_analyzer

	self._analyzer = get_statistical_analyzer()
	return self._analyzer

	async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
	"""Main orchestration loop with optional Modal analysis."""
	# ... existing search/judge loop ...

	# After judge says "synthesize", optionally run analysis
	if self._enable_analysis and assessment.recommendation == "synthesize":
	yield AgentEvent(
	type="analyzing",
	message="Running statistical analysis in Modal sandbox...",
	data={},
	iteration=iteration,
	)

	try:
	analyzer = self._get_analyzer()

	# Run Modal analysis (no agent_framework needed!)
	analysis_result = await analyzer.analyze(
	query=query,
	evidence=all_evidence,
	hypothesis=None, # Could add hypothesis generation later
	)

	yield AgentEvent(
	type="analysis_complete",
	message=f"Analysis verdict: {analysis_result.verdict}",
	data=analysis_result.model_dump(),
	iteration=iteration,
	)

	except Exception as e:
	yield AgentEvent(
	type="error",
	message=f"Modal analysis failed: {e}",
	data={"error": str(e)},
	iteration=iteration,
	)

	# Continue to synthesis...
	```

	### 5.4 Refactor AnalysisAgent (`src/agents/analysis_agent.py`)

	Wrap `StatisticalAnalyzer` for magentic mode.

	```python
	"""Analysis agent for statistical analysis using Modal code execution.

	This agent wraps StatisticalAnalyzer for use in magentic multi-agent mode.
	The core logic is in src/services/statistical_analyzer.py to avoid
	coupling agent_framework to the simple orchestrator.
	"""

	from collections.abc import AsyncIterable
	from typing import TYPE_CHECKING, Any

	from agent_framework import (
	AgentRunResponse,
	AgentRunResponseUpdate,
	AgentThread,
	BaseAgent,
	ChatMessage,
	Role,
	)

	from src.services.statistical_analyzer import (
	AnalysisResult,
	get_statistical_analyzer,
	)
	from src.utils.models import Evidence

	if TYPE_CHECKING:
	from src.services.embeddings import EmbeddingService


	class AnalysisAgent(BaseAgent): # type: ignore[misc]
	"""Wraps StatisticalAnalyzer for magentic multi-agent mode."""

	def __init__(
	self,
	evidence_store: dict[str, Any],
	embedding_service: "EmbeddingService \| None" = None,
	) -> None:
	super().__init__(
	name="AnalysisAgent",
	description="Performs statistical analysis using Modal sandbox",
	)
	self._evidence_store = evidence_store
	self._embeddings = embedding_service
	self._analyzer = get_statistical_analyzer()

	async def run(
	self,
	messages: str \| ChatMessage \| list[str] \| list[ChatMessage] \| None = None,
	*,
	thread: AgentThread \| None = None,
	**kwargs: Any,
	) -> AgentRunResponse:
	"""Analyze evidence and return verdict."""
	query = self._extract_query(messages)
	hypotheses = self._evidence_store.get("hypotheses", [])
	evidence = self._evidence_store.get("current", [])

	if not evidence:
	return self._error_response("No evidence available.")

	# Get primary hypothesis if available
	hypothesis_dict = None
	if hypotheses:
	h = hypotheses[0]
	hypothesis_dict = {
	"drug": getattr(h, "drug", "Unknown"),
	"target": getattr(h, "target", "?"),
	"pathway": getattr(h, "pathway", "?"),
	"effect": getattr(h, "effect", "?"),
	"confidence": getattr(h, "confidence", 0.5),
	}

	# Delegate to StatisticalAnalyzer
	result = await self._analyzer.analyze(
	query=query,
	evidence=evidence,
	hypothesis=hypothesis_dict,
	)

	# Store in shared context
	self._evidence_store["analysis"] = result.model_dump()

	# Format response
	response_text = self._format_response(result)

	return AgentRunResponse(
	messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
	response_id=f"analysis-{result.verdict.lower()}",
	additional_properties={"analysis": result.model_dump()},
	)

	def _format_response(self, result: AnalysisResult) -> str:
	"""Format analysis result as markdown."""
	lines = [
	"## Statistical Analysis Complete\n",
	f"### Verdict: {result.verdict}",
	f"Confidence: {result.confidence:.0%}\n",
	"### Key Findings",
	]
	for finding in result.key_findings:
	lines.append(f"- {finding}")

	lines.extend([
	"\n### Statistical Evidence",
	"```",
	result.statistical_evidence,
	"```",
	])
	return "\n".join(lines)

	def _error_response(self, message: str) -> AgentRunResponse:
	"""Create error response."""
	return AgentRunResponse(
	messages=[ChatMessage(role=Role.ASSISTANT, text=f"Error: {message}")],
	response_id="analysis-error",
	)

	def _extract_query(
	self, messages: str \| ChatMessage \| list[str] \| list[ChatMessage] \| None
	) -> str:
	"""Extract query from messages."""
	if isinstance(messages, str):
	return messages
	elif isinstance(messages, ChatMessage):
	return messages.text or ""
	elif isinstance(messages, list):
	for msg in reversed(messages):
	if isinstance(msg, ChatMessage) and msg.role == Role.USER:
	return msg.text or ""
	elif isinstance(msg, str):
	return msg
	return ""

	async def run_stream(
	self,
	messages: str \| ChatMessage \| list[str] \| list[ChatMessage] \| None = None,
	*,
	thread: AgentThread \| None = None,
	**kwargs: Any,
	) -> AsyncIterable[AgentRunResponseUpdate]:
	"""Streaming wrapper."""
	result = await self.run(messages, thread=thread, **kwargs)
	yield AgentRunResponseUpdate(messages=result.messages, response_id=result.response_id)
	```

	### 5.5 MCP Tool for Modal Analysis (`src/mcp_tools.py`)

	Add to existing MCP tools:

	```python
	async def analyze_hypothesis(
	drug: str,
	condition: str,
	evidence_summary: str,
	) -> str:
	"""Perform statistical analysis of drug repurposing hypothesis using Modal.

	Executes AI-generated Python code in a secure Modal sandbox to analyze
	the statistical evidence for a drug repurposing hypothesis.

	Args:
	drug: The drug being evaluated (e.g., "metformin")
	condition: The target condition (e.g., "Alzheimer's disease")
	evidence_summary: Summary of evidence to analyze

	Returns:
	Analysis result with verdict (SUPPORTED/REFUTED/INCONCLUSIVE) and statistics
	"""
	from src.services.statistical_analyzer import get_statistical_analyzer
	from src.utils.config import settings
	from src.utils.models import Citation, Evidence

	if not settings.modal_available:
	return "Error: Modal credentials not configured. Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET."

	# Create evidence from summary
	evidence = [
	Evidence(
	content=evidence_summary,
	citation=Citation(
	source="pubmed",
	title=f"Evidence for {drug} in {condition}",
	url="https://example.com",
	date="2024-01-01",
	authors=["User Provided"],
	),
	relevance=0.9,
	)
	]

	analyzer = get_statistical_analyzer()
	result = await analyzer.analyze(
	query=f"Can {drug} treat {condition}?",
	evidence=evidence,
	hypothesis={"drug": drug, "target": "unknown", "pathway": "unknown", "effect": condition},
	)

	return f"""## Statistical Analysis: {drug} for {condition}

	### Verdict: {result.verdict}
	Confidence: {result.confidence:.0%}

	### Key Findings
	{chr(10).join(f"- {f}" for f in result.key_findings) or "- No specific findings extracted"}

	### Execution Output
	```
	{result.execution_output}
	```

	### Generated Code
	```python
	{result.code_generated}
	```

	Executed in Modal Sandbox - Isolated, secure, reproducible.
	"""
	```

	### 5.6 Demo Scripts

	#### `examples/modal_demo/verify_sandbox.py`

	```python
	#!/usr/bin/env python3
	"""Verify that Modal sandbox is properly isolated.

	This script proves to judges that code runs in Modal, not locally.
	NO agent_framework dependency - uses only src.tools.code_execution.

	Usage:
	uv run python examples/modal_demo/verify_sandbox.py
	"""

	import asyncio
	from functools import partial

	from src.tools.code_execution import get_code_executor
	from src.utils.config import settings


	async def main() -> None:
	"""Verify Modal sandbox isolation."""
	if not settings.modal_available:
	print("Error: Modal credentials not configured.")
	print("Set MODAL_TOKEN_ID and MODAL_TOKEN_SECRET in .env")
	return

	executor = get_code_executor()
	loop = asyncio.get_running_loop()

	print("=" * 60)
	print("Modal Sandbox Isolation Verification")
	print("=" * 60 + "\n")

	# Test 1: Hostname
	print("Test 1: Check hostname (should NOT be your machine)")
	code1 = "import socket; print(f'Hostname: {socket.gethostname()}')"
	result1 = await loop.run_in_executor(None, partial(executor.execute, code1))
	print(f" {result1['stdout'].strip()}\n")

	# Test 2: Scientific libraries
	print("Test 2: Verify scientific libraries")
	code2 = """
	import pandas as pd
	import numpy as np
	import scipy
	print(f"pandas: {pd.__version__}")
	print(f"numpy: {np.__version__}")
	print(f"scipy: {scipy.__version__}")
	"""
	result2 = await loop.run_in_executor(None, partial(executor.execute, code2))
	print(f" {result2['stdout'].strip()}\n")

	# Test 3: Network blocked
	print("Test 3: Verify network isolation")
	code3 = """
	import urllib.request
	try:
	urllib.request.urlopen("https://google.com", timeout=2)
	print("Network: ALLOWED (unexpected!)")
	except Exception:
	print("Network: BLOCKED (as expected)")
	"""
	result3 = await loop.run_in_executor(None, partial(executor.execute, code3))
	print(f" {result3['stdout'].strip()}\n")

	# Test 4: Real statistics
	print("Test 4: Execute statistical analysis")
	code4 = """
	import pandas as pd
	import scipy.stats as stats

	data = pd.DataFrame({'effect': [0.42, 0.38, 0.51]})
	mean = data['effect'].mean()
	t_stat, p_val = stats.ttest_1samp(data['effect'], 0)

	print(f"Mean Effect: {mean:.3f}")
	print(f"P-value: {p_val:.4f}")
	print(f"Verdict: {'SUPPORTED' if p_val < 0.05 else 'INCONCLUSIVE'}")
	"""
	result4 = await loop.run_in_executor(None, partial(executor.execute, code4))
	print(f" {result4['stdout'].strip()}\n")

	print("=" * 60)
	print("All tests complete - Modal sandbox verified!")
	print("=" * 60)


	if __name__ == "__main__":
	asyncio.run(main())
	```

	#### `examples/modal_demo/run_analysis.py`

	```python
	#!/usr/bin/env python3
	"""Demo: Modal-powered statistical analysis.

	This script uses StatisticalAnalyzer directly (NO agent_framework dependency).

	Usage:
	uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"
	"""

	import argparse
	import asyncio
	import os
	import sys

	from src.services.statistical_analyzer import get_statistical_analyzer
	from src.tools.pubmed import PubMedTool
	from src.utils.config import settings


	async def main() -> None:
	"""Run the Modal analysis demo."""
	parser = argparse.ArgumentParser(description="Modal Analysis Demo")
	parser.add_argument("query", help="Research query")
	args = parser.parse_args()

	if not settings.modal_available:
	print("Error: Modal credentials not configured.")
	sys.exit(1)

	if not (os.getenv("OPENAI_API_KEY") or os.getenv("ANTHROPIC_API_KEY")):
	print("Error: No LLM API key found.")
	sys.exit(1)

	print(f"\n{'=' * 60}")
	print("DeepBoner Modal Analysis Demo")
	print(f"Query: {args.query}")
	print(f"{'=' * 60}\n")

	# Step 1: Gather Evidence
	print("Step 1: Gathering evidence from PubMed...")
	pubmed = PubMedTool()
	evidence = await pubmed.search(args.query, max_results=5)
	print(f" Found {len(evidence)} papers\n")

	# Step 2: Run Modal Analysis
	print("Step 2: Running statistical analysis in Modal sandbox...")
	analyzer = get_statistical_analyzer()
	result = await analyzer.analyze(query=args.query, evidence=evidence)

	# Step 3: Display Results
	print("\n" + "=" * 60)
	print("ANALYSIS RESULTS")
	print("=" * 60)
	print(f"\nVerdict: {result.verdict}")
	print(f"Confidence: {result.confidence:.0%}")
	print("\nKey Findings:")
	for finding in result.key_findings:
	print(f" - {finding}")

	print("\n[Demo Complete - Code executed in Modal, not locally]")


	if __name__ == "__main__":
	asyncio.run(main())
	```

	---

	## 6. TDD Test Suite

	### 6.1 Unit Tests (`tests/unit/services/test_statistical_analyzer.py`)

	```python
	"""Unit tests for StatisticalAnalyzer service."""

	from unittest.mock import AsyncMock, MagicMock, patch

	import pytest

	from src.services.statistical_analyzer import (
	AnalysisResult,
	StatisticalAnalyzer,
	get_statistical_analyzer,
	)
	from src.utils.models import Citation, Evidence


	@pytest.fixture
	def sample_evidence() -> list[Evidence]:
	"""Sample evidence for testing."""
	return [
	Evidence(
	content="Metformin shows effect size of 0.45.",
	citation=Citation(
	source="pubmed",
	title="Metformin Study",
	url="https://pubmed.ncbi.nlm.nih.gov/12345/",
	date="2024-01-15",
	authors=["Smith J"],
	),
	relevance=0.9,
	)
	]


	class TestStatisticalAnalyzer:
	"""Tests for StatisticalAnalyzer (no agent_framework dependency)."""

	def test_no_agent_framework_import(self) -> None:
	"""StatisticalAnalyzer must NOT import agent_framework."""
	import src.services.statistical_analyzer as module

	# Check module doesn't import agent_framework
	source = open(module.__file__).read()
	assert "agent_framework" not in source
	assert "BaseAgent" not in source

	@pytest.mark.asyncio
	async def test_analyze_returns_result(
	self, sample_evidence: list[Evidence]
	) -> None:
	"""analyze() should return AnalysisResult."""
	analyzer = StatisticalAnalyzer()

	with patch.object(analyzer, "_get_agent") as mock_agent, \
	patch.object(analyzer, "_get_code_executor") as mock_executor:

	# Mock LLM
	mock_agent.return_value.run = AsyncMock(
	return_value=MagicMock(output="print('SUPPORTED')")
	)

	# Mock Modal
	mock_executor.return_value.execute.return_value = {
	"stdout": "SUPPORTED\np-value: 0.01",
	"stderr": "",
	"success": True,
	}

	result = await analyzer.analyze("test query", sample_evidence)

	assert isinstance(result, AnalysisResult)
	assert result.verdict == "SUPPORTED"

	def test_singleton(self) -> None:
	"""get_statistical_analyzer should return singleton."""
	a1 = get_statistical_analyzer()
	a2 = get_statistical_analyzer()
	assert a1 is a2


	class TestAnalysisResult:
	"""Tests for AnalysisResult model."""

	def test_verdict_values(self) -> None:
	"""Verdict should be one of the expected values."""
	for verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]:
	result = AnalysisResult(
	verdict=verdict,
	confidence=0.8,
	statistical_evidence="test",
	code_generated="print('test')",
	execution_output="test",
	)
	assert result.verdict == verdict

	def test_confidence_bounds(self) -> None:
	"""Confidence must be 0.0-1.0."""
	with pytest.raises(ValueError):
	AnalysisResult(
	verdict="SUPPORTED",
	confidence=1.5, # Invalid
	statistical_evidence="test",
	code_generated="test",
	execution_output="test",
	)
	```

	### 6.2 Integration Test (`tests/integration/test_modal.py`)

	```python
	"""Integration tests for Modal (requires credentials)."""

	import pytest

	from src.utils.config import settings


	@pytest.mark.integration
	@pytest.mark.skipif(not settings.modal_available, reason="Modal not configured")
	class TestModalIntegration:
	"""Integration tests requiring Modal credentials."""

	@pytest.mark.asyncio
	async def test_sandbox_executes_code(self) -> None:
	"""Modal sandbox should execute Python code."""
	import asyncio
	from functools import partial

	from src.tools.code_execution import get_code_executor

	executor = get_code_executor()
	code = "import pandas as pd; print(pd.DataFrame({'a': [1,2,3]})['a'].sum())"

	loop = asyncio.get_running_loop()
	result = await loop.run_in_executor(
	None, partial(executor.execute, code, timeout=30)
	)

	assert result["success"]
	assert "6" in result["stdout"]

	@pytest.mark.asyncio
	async def test_statistical_analyzer_works(self) -> None:
	"""StatisticalAnalyzer should work end-to-end."""
	from src.services.statistical_analyzer import get_statistical_analyzer
	from src.utils.models import Citation, Evidence

	evidence = [
	Evidence(
	content="Drug shows 40% improvement in trial.",
	citation=Citation(
	source="pubmed",
	title="Test",
	url="https://test.com",
	date="2024-01-01",
	authors=["Test"],
	),
	relevance=0.9,
	)
	]

	analyzer = get_statistical_analyzer()
	result = await analyzer.analyze("test drug efficacy", evidence)

	assert result.verdict in ["SUPPORTED", "REFUTED", "INCONCLUSIVE"]
	assert 0.0 <= result.confidence <= 1.0
	```

	---

	## 7. Verification Commands

	```bash
	# 1. Verify NO agent_framework in StatisticalAnalyzer
	grep -r "agent_framework" src/services/statistical_analyzer.py
	# Should return nothing!

	# 2. Run unit tests (no Modal needed)
	uv run pytest tests/unit/services/test_statistical_analyzer.py -v

	# 3. Run verification script (requires Modal)
	uv run python examples/modal_demo/verify_sandbox.py

	# 4. Run analysis demo (requires Modal + LLM)
	uv run python examples/modal_demo/run_analysis.py "metformin alzheimer"

	# 5. Run integration tests
	uv run pytest tests/integration/test_modal.py -v -m integration

	# 6. Full test suite
	make check
	```

	---

	## 8. Definition of Done

	Phase 13 is COMPLETE when:

	- [ ] `src/services/statistical_analyzer.py` created (NO agent_framework)
	- [ ] `src/utils/config.py` has `enable_modal_analysis` setting
	- [ ] `src/orchestrator.py` uses `StatisticalAnalyzer` directly
	- [ ] `src/agents/analysis_agent.py` refactored to wrap `StatisticalAnalyzer`
	- [ ] `src/mcp_tools.py` has `analyze_hypothesis` tool
	- [ ] `examples/modal_demo/verify_sandbox.py` working
	- [ ] `examples/modal_demo/run_analysis.py` working
	- [ ] Unit tests pass WITHOUT magentic extra installed
	- [ ] Integration tests pass WITH Modal credentials
	- [ ] All lints pass

	---

	## 9. Architecture After Phase 13

	```text
	┌─────────────────────────────────────────────────────────────────┐
	│ MCP Clients │
	│ (Claude Desktop, Cursor, etc.) │
	└───────────────────────────┬─────────────────────────────────────┘
	│ MCP Protocol
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ Gradio App + MCP Server │
	│ ┌──────────────────────────────────────────────────────────┐ │
	│ │ MCP Tools: search_pubmed, search_trials, search_europepmc │ │
	│ │ search_all, analyze_hypothesis │ │
	│ └──────────────────────────────────────────────────────────┘ │
	└───────────────────────────┬─────────────────────────────────────┘
	│
	┌───────────────────┴───────────────────┐
	│ │
	▼ ▼
	┌───────────────────────┐ ┌───────────────────────────┐
	│ Simple Orchestrator │ │ Magentic Orchestrator │
	│ (no agent_framework) │ │ (with agent_framework) │
	│ │ │ │
	│ SearchHandler │ │ SearchAgent │
	│ JudgeHandler │ │ JudgeAgent │
	│ StatisticalAnalyzer ─┼────────────┼→ AnalysisAgent ───────────┤
	│ │ │ (wraps StatisticalAnalyzer)
	└───────────┬───────────┘ └───────────────────────────┘
	│
	▼
	┌──────────────────────────────────────────────────────────────────┐
	│ StatisticalAnalyzer │
	│ (src/services/statistical_analyzer.py) │
	│ NO agent_framework dependency │
	│ │
	│ 1. Generate code with pydantic-ai │
	│ 2. Execute in Modal sandbox │
	│ 3. Return AnalysisResult │
	└───────────────────────────┬──────────────────────────────────────┘
	│
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ Modal Sandbox │
	│ ┌─────────────────────────────────────────────────────────┐ │
	│ │ - pandas, numpy, scipy, sklearn, statsmodels │ │
	│ │ - Network: BLOCKED │ │
	│ │ - Filesystem: ISOLATED │ │
	│ │ - Timeout: ENFORCED │ │
	│ └─────────────────────────────────────────────────────────┘ │
	└─────────────────────────────────────────────────────────────────┘
	```

	This is the dependency-safe Modal stack.

	---

	## 10. Files Summary

	\| File \| Action \| Purpose \|
	\|------\|--------\|---------\|
	\| `src/services/statistical_analyzer.py` \| CREATE \| Core analysis (no agent_framework) \|
	\| `src/utils/config.py` \| MODIFY \| Add `enable_modal_analysis` \|
	\| `src/orchestrator.py` \| MODIFY \| Use `StatisticalAnalyzer` \|
	\| `src/agents/analysis_agent.py` \| MODIFY \| Wrap `StatisticalAnalyzer` \|
	\| `src/mcp_tools.py` \| MODIFY \| Add `analyze_hypothesis` \|
	\| `examples/modal_demo/verify_sandbox.py` \| CREATE \| Sandbox verification \|
	\| `examples/modal_demo/run_analysis.py` \| CREATE \| Demo script \|
	\| `tests/unit/services/test_statistical_analyzer.py` \| CREATE \| Unit tests \|
	\| `tests/integration/test_modal.py` \| CREATE \| Integration tests \|

	Key Fix: `StatisticalAnalyzer` has ZERO agent_framework imports, making it safe for the simple orchestrator.