Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /implementation /07_phase_hypothesis.md

VibecoderMcSwaggins

docs: comprehensive Phase 6-8 spec revisions (Senior Architect audit)

ec3d7dc 5 months ago

preview code

raw

history blame

19.9 kB

	# Phase 7 Implementation Spec: Hypothesis Agent

	Goal: Add an agent that generates scientific hypotheses to guide targeted searches.
	Philosophy: "Don't just find evidence—understand the mechanisms."
	Prerequisite: Phase 6 complete (Embeddings working)

	---

	## 1. Why Hypothesis Agent?

	Current limitation: Search is reactive, not hypothesis-driven.

	Current flow:
	1. User asks about "metformin alzheimer"
	2. Search finds papers
	3. Judge says "need more evidence"
	4. Search again with slightly different keywords

	With Hypothesis Agent:
	1. User asks about "metformin alzheimer"
	2. Search finds initial papers
	3. Hypothesis Agent analyzes: "Evidence suggests metformin → AMPK activation → autophagy → amyloid clearance"
	4. Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs"

	Key insight: Scientific research is hypothesis-driven. The agent should think like a researcher.

	---

	## 2. Architecture

	### Current (Phase 6)
	```
	User Query → Magentic Manager
	├── SearchAgent → Evidence
	└── JudgeAgent → Sufficient? → Synthesize/Continue
	```

	### Phase 7
	```
	User Query → Magentic Manager
	├── SearchAgent → Evidence
	├── HypothesisAgent → Mechanistic Hypotheses ← NEW
	└── JudgeAgent → Sufficient? → Synthesize/Continue
	↑
	Uses hypotheses to guide next search
	```

	### Shared Context Enhancement
	```python
	evidence_store = {
	"current": [],
	"embeddings": {},
	"vector_index": None,
	"hypotheses": [], # NEW: Generated hypotheses
	"tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence
	}
	```

	---

	## 3. Hypothesis Model

	### 3.1 Data Model (`src/utils/models.py`)

	```python
	class MechanismHypothesis(BaseModel):
	"""A scientific hypothesis about drug mechanism."""

	drug: str = Field(description="The drug being studied")
	target: str = Field(description="Molecular target (e.g., AMPK, mTOR)")
	pathway: str = Field(description="Biological pathway affected")
	effect: str = Field(description="Downstream effect on disease")
	confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis")
	supporting_evidence: list[str] = Field(
	default_factory=list,
	description="PMIDs or URLs supporting this hypothesis"
	)
	contradicting_evidence: list[str] = Field(
	default_factory=list,
	description="PMIDs or URLs contradicting this hypothesis"
	)
	search_suggestions: list[str] = Field(
	default_factory=list,
	description="Suggested searches to test this hypothesis"
	)

	def to_search_queries(self) -> list[str]:
	"""Generate search queries to test this hypothesis."""
	return [
	f"{self.drug} {self.target}",
	f"{self.target} {self.pathway}",
	f"{self.pathway} {self.effect}",
	*self.search_suggestions
	]
	```

	### 3.2 Hypothesis Assessment

	```python
	class HypothesisAssessment(BaseModel):
	"""Assessment of evidence against hypotheses."""

	hypotheses: list[MechanismHypothesis]
	primary_hypothesis: MechanismHypothesis \| None = Field(
	description="Most promising hypothesis based on current evidence"
	)
	knowledge_gaps: list[str] = Field(
	description="What we don't know yet"
	)
	recommended_searches: list[str] = Field(
	description="Searches to fill knowledge gaps"
	)
	```

	---

	## 4. Implementation

	### 4.0 Text Utilities (`src/utils/text_utils.py`)

	> Why These Utilities?
	>
	> The original spec used arbitrary truncation (`evidence[:10]` and `content[:300]`).
	> This loses important information randomly. These utilities provide:
	> 1. Sentence-aware truncation - cuts at sentence boundaries, not mid-word
	> 2. Diverse evidence selection - uses embeddings to select varied evidence (MMR)

	```python
	"""Text processing utilities for evidence handling."""
	from typing import TYPE_CHECKING

	if TYPE_CHECKING:
	from src.services.embeddings import EmbeddingService
	from src.utils.models import Evidence


	def truncate_at_sentence(text: str, max_chars: int = 300) -> str:
	"""Truncate text at sentence boundary, preserving meaning.

	Args:
	text: The text to truncate
	max_chars: Maximum characters (default 300)

	Returns:
	Text truncated at last complete sentence within limit
	"""
	if len(text) <= max_chars:
	return text

	# Find truncation point
	truncated = text[:max_chars]

	# Look for sentence endings: . ! ? followed by space or end
	for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']:
	last_sep = truncated.rfind(sep)
	if last_sep > max_chars // 2: # Don't truncate too aggressively
	return text[:last_sep + 1].strip()

	# Fallback: find last period
	last_period = truncated.rfind('.')
	if last_period > max_chars // 2:
	return text[:last_period + 1].strip()

	# Last resort: truncate at word boundary
	last_space = truncated.rfind(' ')
	if last_space > 0:
	return text[:last_space].strip() + "..."

	return truncated + "..."


	async def select_diverse_evidence(
	evidence: list["Evidence"],
	n: int,
	query: str,
	embeddings: "EmbeddingService \| None" = None
	) -> list["Evidence"]:
	"""Select n most diverse and relevant evidence items.

	Uses Maximal Marginal Relevance (MMR) when embeddings available,
	falls back to relevance_score sorting otherwise.

	Args:
	evidence: All available evidence
	n: Number of items to select
	query: Original query for relevance scoring
	embeddings: Optional EmbeddingService for semantic diversity

	Returns:
	Selected evidence items, diverse and relevant
	"""
	if not evidence:
	return []

	if n >= len(evidence):
	return evidence

	# Fallback: sort by relevance score if no embeddings
	if embeddings is None:
	return sorted(
	evidence,
	key=lambda e: e.relevance_score,
	reverse=True
	)[:n]

	# MMR: Maximal Marginal Relevance for diverse selection
	# Score = λ * relevance - (1-λ) * max_similarity_to_selected
	lambda_param = 0.7 # Balance relevance vs diversity

	# Get query embedding
	query_emb = await embeddings.embed(query)

	# Get all evidence embeddings
	evidence_embs = await embeddings.embed_batch([e.content for e in evidence])

	# Compute relevance scores (cosine similarity to query)
	from numpy import dot
	from numpy.linalg import norm
	cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b)))

	relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs]

	# Greedy MMR selection
	selected_indices: list[int] = []
	remaining = set(range(len(evidence)))

	for _ in range(n):
	best_score = float('-inf')
	best_idx = -1

	for idx in remaining:
	# Relevance component
	relevance = relevance_scores[idx]

	# Diversity component: max similarity to already selected
	if selected_indices:
	max_sim = max(
	cosine(evidence_embs[idx], evidence_embs[sel])
	for sel in selected_indices
	)
	else:
	max_sim = 0

	# MMR score
	mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim

	if mmr_score > best_score:
	best_score = mmr_score
	best_idx = idx

	if best_idx >= 0:
	selected_indices.append(best_idx)
	remaining.remove(best_idx)

	return [evidence[i] for i in selected_indices]
	```

	### 4.1 Hypothesis Prompts (`src/prompts/hypothesis.py`)

	```python
	"""Prompts for Hypothesis Agent."""
	from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence

	SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing.

	Your role is to generate mechanistic hypotheses based on evidence.

	A good hypothesis:
	1. Proposes a MECHANISM: Drug → Target → Pathway → Effect
	2. Is TESTABLE: Can be supported or refuted by literature search
	3. Is SPECIFIC: Names actual molecular targets and pathways
	4. Generates SEARCH QUERIES: Helps find more evidence

	Example hypothesis format:
	- Drug: Metformin
	- Target: AMPK (AMP-activated protein kinase)
	- Pathway: mTOR inhibition → autophagy activation
	- Effect: Enhanced clearance of amyloid-beta in Alzheimer's
	- Confidence: 0.7
	- Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"]

	Be specific. Use actual gene/protein names when possible."""


	async def format_hypothesis_prompt(
	query: str,
	evidence: list,
	embeddings=None
	) -> str:
	"""Format prompt for hypothesis generation.

	Uses smart evidence selection instead of arbitrary truncation.

	Args:
	query: The research query
	evidence: All collected evidence
	embeddings: Optional EmbeddingService for diverse selection
	"""
	# Select diverse, relevant evidence (not arbitrary first 10)
	selected = await select_diverse_evidence(
	evidence, n=10, query=query, embeddings=embeddings
	)

	# Format with sentence-aware truncation
	evidence_text = "\n".join([
	f"- {e.citation.title} ({e.citation.source}): {truncate_at_sentence(e.content, 300)}"
	for e in selected
	])

	return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses.

	## Evidence ({len(selected)} papers selected for diversity)
	{evidence_text}

	## Task
	1. Identify potential drug targets mentioned in the evidence
	2. Propose mechanism hypotheses (Drug → Target → Pathway → Effect)
	3. Rate confidence based on evidence strength
	4. Suggest searches to test each hypothesis

	Generate 2-4 hypotheses, prioritized by confidence."""
	```

	### 4.2 Hypothesis Agent (`src/agents/hypothesis_agent.py`)

	```python
	"""Hypothesis agent for mechanistic reasoning."""
	from collections.abc import AsyncIterable
	from typing import TYPE_CHECKING, Any

	from agent_framework import (
	AgentRunResponse,
	AgentRunResponseUpdate,
	AgentThread,
	BaseAgent,
	ChatMessage,
	Role,
	)
	from pydantic_ai import Agent

	from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt
	from src.utils.config import settings
	from src.utils.models import Evidence, HypothesisAssessment

	if TYPE_CHECKING:
	from src.services.embeddings import EmbeddingService


	class HypothesisAgent(BaseAgent):
	"""Generates mechanistic hypotheses based on evidence."""

	def __init__(
	self,
	evidence_store: dict[str, list[Evidence]],
	embedding_service: "EmbeddingService \| None" = None, # NEW: for diverse selection
	) -> None:
	super().__init__(
	name="HypothesisAgent",
	description="Generates scientific hypotheses about drug mechanisms to guide research",
	)
	self._evidence_store = evidence_store
	self._embeddings = embedding_service # Used for MMR evidence selection
	self._agent = Agent(
	model=settings.llm_provider, # Uses configured LLM
	output_type=HypothesisAssessment,
	system_prompt=SYSTEM_PROMPT,
	)

	async def run(
	self,
	messages: str \| ChatMessage \| list[str] \| list[ChatMessage] \| None = None,
	*,
	thread: AgentThread \| None = None,
	**kwargs: Any,
	) -> AgentRunResponse:
	"""Generate hypotheses based on current evidence."""
	# Extract query
	query = self._extract_query(messages)

	# Get current evidence
	evidence = self._evidence_store.get("current", [])

	if not evidence:
	return AgentRunResponse(
	messages=[ChatMessage(
	role=Role.ASSISTANT,
	text="No evidence available yet. Search for evidence first."
	)],
	response_id="hypothesis-no-evidence",
	)

	# Generate hypotheses with diverse evidence selection
	# NOTE: format_hypothesis_prompt is now async
	prompt = await format_hypothesis_prompt(
	query, evidence, embeddings=self._embeddings
	)
	result = await self._agent.run(prompt)
	assessment = result.output

	# Store hypotheses in shared context
	existing = self._evidence_store.get("hypotheses", [])
	self._evidence_store["hypotheses"] = existing + assessment.hypotheses

	# Format response
	response_text = self._format_response(assessment)

	return AgentRunResponse(
	messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)],
	response_id=f"hypothesis-{len(assessment.hypotheses)}",
	additional_properties={"assessment": assessment.model_dump()},
	)

	def _format_response(self, assessment: HypothesisAssessment) -> str:
	"""Format hypothesis assessment as markdown."""
	lines = ["## Generated Hypotheses\n"]

	for i, h in enumerate(assessment.hypotheses, 1):
	lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})")
	lines.append(f"Mechanism: {h.drug} → {h.target} → {h.pathway} → {h.effect}")
	lines.append(f"Suggested searches: {', '.join(h.search_suggestions)}\n")

	if assessment.primary_hypothesis:
	lines.append(f"### Primary Hypothesis")
	h = assessment.primary_hypothesis
	lines.append(f"{h.drug} → {h.target} → {h.pathway} → {h.effect}\n")

	if assessment.knowledge_gaps:
	lines.append("### Knowledge Gaps")
	for gap in assessment.knowledge_gaps:
	lines.append(f"- {gap}")

	if assessment.recommended_searches:
	lines.append("\n### Recommended Next Searches")
	for search in assessment.recommended_searches:
	lines.append(f"- `{search}`")

	return "\n".join(lines)

	def _extract_query(self, messages) -> str:
	"""Extract query from messages."""
	if isinstance(messages, str):
	return messages
	elif isinstance(messages, ChatMessage):
	return messages.text or ""
	elif isinstance(messages, list):
	for msg in reversed(messages):
	if isinstance(msg, ChatMessage) and msg.role == Role.USER:
	return msg.text or ""
	elif isinstance(msg, str):
	return msg
	return ""

	async def run_stream(
	self,
	messages: str \| ChatMessage \| list[str] \| list[ChatMessage] \| None = None,
	*,
	thread: AgentThread \| None = None,
	**kwargs: Any,
	) -> AsyncIterable[AgentRunResponseUpdate]:
	"""Streaming wrapper."""
	result = await self.run(messages, thread=thread, **kwargs)
	yield AgentRunResponseUpdate(
	messages=result.messages,
	response_id=result.response_id
	)
	```

	### 4.3 Update MagenticOrchestrator

	Add HypothesisAgent to the workflow:

	```python
	# In MagenticOrchestrator.__init__
	self._hypothesis_agent = HypothesisAgent(self._evidence_store)

	# In workflow building
	workflow = (
	MagenticBuilder()
	.participants(
	searcher=search_agent,
	hypothesizer=self._hypothesis_agent, # NEW
	judge=judge_agent,
	)
	.with_standard_manager(...)
	.build()
	)

	# Update task instruction
	task = f"""Research drug repurposing opportunities for: {query}

	Workflow:
	1. SearchAgent: Find initial evidence from PubMed and web
	2. HypothesisAgent: Generate mechanistic hypotheses (Drug → Target → Pathway → Effect)
	3. SearchAgent: Use hypothesis-suggested queries for targeted search
	4. JudgeAgent: Evaluate if evidence supports hypotheses
	5. Repeat until confident or max rounds

	Focus on:
	- Identifying specific molecular targets
	- Understanding mechanism of action
	- Finding supporting/contradicting evidence for hypotheses
	"""
	```

	---

	## 5. Directory Structure After Phase 7

	```
	src/
	├── agents/
	│ ├── search_agent.py
	│ ├── judge_agent.py
	│ └── hypothesis_agent.py # NEW
	├── prompts/
	│ ├── judge.py
	│ └── hypothesis.py # NEW
	├── services/
	│ └── embeddings.py
	└── utils/
	└── models.py # Updated with hypothesis models
	```

	---

	## 6. Tests

	### 6.1 Unit Tests (`tests/unit/agents/test_hypothesis_agent.py`)

	```python
	"""Unit tests for HypothesisAgent."""
	import pytest
	from unittest.mock import AsyncMock, MagicMock, patch

	from src.agents.hypothesis_agent import HypothesisAgent
	from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis


	@pytest.fixture
	def sample_evidence():
	return [
	Evidence(
	content="Metformin activates AMPK, which inhibits mTOR signaling...",
	citation=Citation(
	source="pubmed",
	title="Metformin and AMPK",
	url="https://pubmed.ncbi.nlm.nih.gov/12345/",
	date="2023"
	)
	)
	]


	@pytest.fixture
	def mock_assessment():
	return HypothesisAssessment(
	hypotheses=[
	MechanismHypothesis(
	drug="Metformin",
	target="AMPK",
	pathway="mTOR inhibition",
	effect="Reduced cancer cell proliferation",
	confidence=0.75,
	search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"]
	)
	],
	primary_hypothesis=None,
	knowledge_gaps=["Clinical trial data needed"],
	recommended_searches=["metformin clinical trial cancer"]
	)


	@pytest.mark.asyncio
	async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment):
	"""HypothesisAgent should generate mechanistic hypotheses."""
	store = {"current": sample_evidence, "hypotheses": []}

	with patch("src.agents.hypothesis_agent.Agent") as MockAgent:
	mock_result = MagicMock()
	mock_result.output = mock_assessment
	MockAgent.return_value.run = AsyncMock(return_value=mock_result)

	agent = HypothesisAgent(store)
	response = await agent.run("metformin cancer")

	assert "AMPK" in response.messages[0].text
	assert len(store["hypotheses"]) == 1


	@pytest.mark.asyncio
	async def test_hypothesis_agent_no_evidence():
	"""HypothesisAgent should handle empty evidence gracefully."""
	store = {"current": [], "hypotheses": []}
	agent = HypothesisAgent(store)

	response = await agent.run("test query")

	assert "No evidence" in response.messages[0].text
	```

	---

	## 7. Definition of Done

	Phase 7 is COMPLETE when:

	1. `MechanismHypothesis` and `HypothesisAssessment` models implemented
	2. `HypothesisAgent` generates hypotheses from evidence
	3. Hypotheses stored in shared context
	4. Search queries generated from hypotheses
	5. Magentic workflow includes HypothesisAgent
	6. All unit tests pass

	---

	## 8. Value Delivered

	\| Before (Phase 6) \| After (Phase 7) \|
	\|------------------\|-----------------\|
	\| Reactive search \| Hypothesis-driven search \|
	\| Generic queries \| Mechanism-targeted queries \|
	\| No scientific reasoning \| Drug → Target → Pathway → Effect \|
	\| Judge says "need more" \| Hypothesis says "search for X to test Y" \|

	Real example improvement:
	- Query: "metformin alzheimer"
	- Before: "metformin alzheimer mechanism", "metformin brain"
	- After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance"

	The search becomes scientifically targeted rather than keyword variations.