| # Phase 7 Implementation Spec: Hypothesis Agent |
|
|
| **Goal**: Add an agent that generates scientific hypotheses to guide targeted searches. |
| **Philosophy**: "Don't just find evidenceβunderstand the mechanisms." |
| **Prerequisite**: Phase 6 complete (Embeddings working) |
|
|
| --- |
|
|
| ## 1. Why Hypothesis Agent? |
|
|
| Current limitation: **Search is reactive, not hypothesis-driven.** |
|
|
| Current flow: |
| 1. User asks about "metformin alzheimer" |
| 2. Search finds papers |
| 3. Judge says "need more evidence" |
| 4. Search again with slightly different keywords |
|
|
| With Hypothesis Agent: |
| 1. User asks about "metformin alzheimer" |
| 2. Search finds initial papers |
| 3. **Hypothesis Agent analyzes**: "Evidence suggests metformin β AMPK activation β autophagy β amyloid clearance" |
| 4. Search can now target: "metformin AMPK", "autophagy neurodegeneration", "amyloid clearance drugs" |
|
|
| **Key insight**: Scientific research is hypothesis-driven. The agent should think like a researcher. |
|
|
| --- |
|
|
| ## 2. Architecture |
|
|
| ### Current (Phase 6) |
| ``` |
| User Query β Magentic Manager |
| βββ SearchAgent β Evidence |
| βββ JudgeAgent β Sufficient? β Synthesize/Continue |
| ``` |
|
|
| ### Phase 7 |
| ``` |
| User Query β Magentic Manager |
| βββ SearchAgent β Evidence |
| βββ HypothesisAgent β Mechanistic Hypotheses β NEW |
| βββ JudgeAgent β Sufficient? β Synthesize/Continue |
| β |
| Uses hypotheses to guide next search |
| ``` |
|
|
| ### Shared Context Enhancement |
| ```python |
| evidence_store = { |
| "current": [], |
| "embeddings": {}, |
| "vector_index": None, |
| "hypotheses": [], # NEW: Generated hypotheses |
| "tested_hypotheses": [], # NEW: Hypotheses with supporting/contradicting evidence |
| } |
| ``` |
|
|
| --- |
|
|
| ## 3. Hypothesis Model |
|
|
| ### 3.1 Data Model (`src/utils/models.py`) |
|
|
| ```python |
| class MechanismHypothesis(BaseModel): |
| """A scientific hypothesis about drug mechanism.""" |
| |
| drug: str = Field(description="The drug being studied") |
| target: str = Field(description="Molecular target (e.g., AMPK, mTOR)") |
| pathway: str = Field(description="Biological pathway affected") |
| effect: str = Field(description="Downstream effect on disease") |
| confidence: float = Field(ge=0, le=1, description="Confidence in hypothesis") |
| supporting_evidence: list[str] = Field( |
| default_factory=list, |
| description="PMIDs or URLs supporting this hypothesis" |
| ) |
| contradicting_evidence: list[str] = Field( |
| default_factory=list, |
| description="PMIDs or URLs contradicting this hypothesis" |
| ) |
| search_suggestions: list[str] = Field( |
| default_factory=list, |
| description="Suggested searches to test this hypothesis" |
| ) |
| |
| def to_search_queries(self) -> list[str]: |
| """Generate search queries to test this hypothesis.""" |
| return [ |
| f"{self.drug} {self.target}", |
| f"{self.target} {self.pathway}", |
| f"{self.pathway} {self.effect}", |
| *self.search_suggestions |
| ] |
| ``` |
|
|
| ### 3.2 Hypothesis Assessment |
|
|
| ```python |
| class HypothesisAssessment(BaseModel): |
| """Assessment of evidence against hypotheses.""" |
| |
| hypotheses: list[MechanismHypothesis] |
| primary_hypothesis: MechanismHypothesis | None = Field( |
| description="Most promising hypothesis based on current evidence" |
| ) |
| knowledge_gaps: list[str] = Field( |
| description="What we don't know yet" |
| ) |
| recommended_searches: list[str] = Field( |
| description="Searches to fill knowledge gaps" |
| ) |
| ``` |
|
|
| --- |
|
|
| ## 4. Implementation |
|
|
| ### 4.0 Text Utilities (`src/utils/text_utils.py`) |
| |
| > **Why These Utilities?** |
| > |
| > The original spec used arbitrary truncation (`evidence[:10]` and `content[:300]`). |
| > This loses important information randomly. These utilities provide: |
| > 1. **Sentence-aware truncation** - cuts at sentence boundaries, not mid-word |
| > 2. **Diverse evidence selection** - uses embeddings to select varied evidence (MMR) |
| |
| ```python |
| """Text processing utilities for evidence handling.""" |
| from typing import TYPE_CHECKING |
|
|
| if TYPE_CHECKING: |
| from src.services.embeddings import EmbeddingService |
| from src.utils.models import Evidence |
| |
| |
| def truncate_at_sentence(text: str, max_chars: int = 300) -> str: |
| """Truncate text at sentence boundary, preserving meaning. |
| |
| Args: |
| text: The text to truncate |
| max_chars: Maximum characters (default 300) |
| |
| Returns: |
| Text truncated at last complete sentence within limit |
| """ |
| if len(text) <= max_chars: |
| return text |
| |
| # Find truncation point |
| truncated = text[:max_chars] |
| |
| # Look for sentence endings: . ! ? followed by space or end |
| for sep in ['. ', '! ', '? ', '.\n', '!\n', '?\n']: |
| last_sep = truncated.rfind(sep) |
| if last_sep > max_chars // 2: # Don't truncate too aggressively |
| return text[:last_sep + 1].strip() |
| |
| # Fallback: find last period |
| last_period = truncated.rfind('.') |
| if last_period > max_chars // 2: |
| return text[:last_period + 1].strip() |
| |
| # Last resort: truncate at word boundary |
| last_space = truncated.rfind(' ') |
| if last_space > 0: |
| return text[:last_space].strip() + "..." |
| |
| return truncated + "..." |
| |
|
|
| async def select_diverse_evidence( |
| evidence: list["Evidence"], |
| n: int, |
| query: str, |
| embeddings: "EmbeddingService | None" = None |
| ) -> list["Evidence"]: |
| """Select n most diverse and relevant evidence items. |
| |
| Uses Maximal Marginal Relevance (MMR) when embeddings available, |
| falls back to relevance_score sorting otherwise. |
| |
| Args: |
| evidence: All available evidence |
| n: Number of items to select |
| query: Original query for relevance scoring |
| embeddings: Optional EmbeddingService for semantic diversity |
| |
| Returns: |
| Selected evidence items, diverse and relevant |
| """ |
| if not evidence: |
| return [] |
| |
| if n >= len(evidence): |
| return evidence |
| |
| # Fallback: sort by relevance score if no embeddings |
| if embeddings is None: |
| return sorted( |
| evidence, |
| key=lambda e: e.relevance_score, |
| reverse=True |
| )[:n] |
| |
| # MMR: Maximal Marginal Relevance for diverse selection |
| # Score = Ξ» * relevance - (1-Ξ») * max_similarity_to_selected |
| lambda_param = 0.7 # Balance relevance vs diversity |
| |
| # Get query embedding |
| query_emb = await embeddings.embed(query) |
| |
| # Get all evidence embeddings |
| evidence_embs = await embeddings.embed_batch([e.content for e in evidence]) |
| |
| # Compute relevance scores (cosine similarity to query) |
| from numpy import dot |
| from numpy.linalg import norm |
| cosine = lambda a, b: float(dot(a, b) / (norm(a) * norm(b))) |
| |
| relevance_scores = [cosine(query_emb, emb) for emb in evidence_embs] |
| |
| # Greedy MMR selection |
| selected_indices: list[int] = [] |
| remaining = set(range(len(evidence))) |
| |
| for _ in range(n): |
| best_score = float('-inf') |
| best_idx = -1 |
| |
| for idx in remaining: |
| # Relevance component |
| relevance = relevance_scores[idx] |
| |
| # Diversity component: max similarity to already selected |
| if selected_indices: |
| max_sim = max( |
| cosine(evidence_embs[idx], evidence_embs[sel]) |
| for sel in selected_indices |
| ) |
| else: |
| max_sim = 0 |
| |
| # MMR score |
| mmr_score = lambda_param * relevance - (1 - lambda_param) * max_sim |
| |
| if mmr_score > best_score: |
| best_score = mmr_score |
| best_idx = idx |
| |
| if best_idx >= 0: |
| selected_indices.append(best_idx) |
| remaining.remove(best_idx) |
| |
| return [evidence[i] for i in selected_indices] |
| ``` |
| |
| ### 4.1 Hypothesis Prompts (`src/prompts/hypothesis.py`) |
|
|
| ```python |
| """Prompts for Hypothesis Agent.""" |
| from src.utils.text_utils import truncate_at_sentence, select_diverse_evidence |
| |
| SYSTEM_PROMPT = """You are a biomedical research scientist specializing in drug repurposing. |
| |
| Your role is to generate mechanistic hypotheses based on evidence. |
| |
| A good hypothesis: |
| 1. Proposes a MECHANISM: Drug β Target β Pathway β Effect |
| 2. Is TESTABLE: Can be supported or refuted by literature search |
| 3. Is SPECIFIC: Names actual molecular targets and pathways |
| 4. Generates SEARCH QUERIES: Helps find more evidence |
| |
| Example hypothesis format: |
| - Drug: Metformin |
| - Target: AMPK (AMP-activated protein kinase) |
| - Pathway: mTOR inhibition β autophagy activation |
| - Effect: Enhanced clearance of amyloid-beta in Alzheimer's |
| - Confidence: 0.7 |
| - Search suggestions: ["metformin AMPK brain", "autophagy amyloid clearance"] |
| |
| Be specific. Use actual gene/protein names when possible.""" |
| |
| |
| async def format_hypothesis_prompt( |
| query: str, |
| evidence: list, |
| embeddings=None |
| ) -> str: |
| """Format prompt for hypothesis generation. |
| |
| Uses smart evidence selection instead of arbitrary truncation. |
| |
| Args: |
| query: The research query |
| evidence: All collected evidence |
| embeddings: Optional EmbeddingService for diverse selection |
| """ |
| # Select diverse, relevant evidence (not arbitrary first 10) |
| selected = await select_diverse_evidence( |
| evidence, n=10, query=query, embeddings=embeddings |
| ) |
| |
| # Format with sentence-aware truncation |
| evidence_text = "\n".join([ |
| f"- **{e.citation.title}** ({e.citation.source}): {truncate_at_sentence(e.content, 300)}" |
| for e in selected |
| ]) |
| |
| return f"""Based on the following evidence about "{query}", generate mechanistic hypotheses. |
| |
| ## Evidence ({len(selected)} papers selected for diversity) |
| {evidence_text} |
| |
| ## Task |
| 1. Identify potential drug targets mentioned in the evidence |
| 2. Propose mechanism hypotheses (Drug β Target β Pathway β Effect) |
| 3. Rate confidence based on evidence strength |
| 4. Suggest searches to test each hypothesis |
| |
| Generate 2-4 hypotheses, prioritized by confidence.""" |
| ``` |
|
|
| ### 4.2 Hypothesis Agent (`src/agents/hypothesis_agent.py`) |
| |
| ```python |
| """Hypothesis agent for mechanistic reasoning.""" |
| from collections.abc import AsyncIterable |
| from typing import TYPE_CHECKING, Any |
|
|
| from agent_framework import ( |
| AgentRunResponse, |
| AgentRunResponseUpdate, |
| AgentThread, |
| BaseAgent, |
| ChatMessage, |
| Role, |
| ) |
| from pydantic_ai import Agent |
|
|
| from src.prompts.hypothesis import SYSTEM_PROMPT, format_hypothesis_prompt |
| from src.utils.config import settings |
| from src.utils.models import Evidence, HypothesisAssessment |
| |
| if TYPE_CHECKING: |
| from src.services.embeddings import EmbeddingService |
| |
|
|
| class HypothesisAgent(BaseAgent): |
| """Generates mechanistic hypotheses based on evidence.""" |
| |
| def __init__( |
| self, |
| evidence_store: dict[str, list[Evidence]], |
| embedding_service: "EmbeddingService | None" = None, # NEW: for diverse selection |
| ) -> None: |
| super().__init__( |
| name="HypothesisAgent", |
| description="Generates scientific hypotheses about drug mechanisms to guide research", |
| ) |
| self._evidence_store = evidence_store |
| self._embeddings = embedding_service # Used for MMR evidence selection |
| self._agent = Agent( |
| model=settings.llm_provider, # Uses configured LLM |
| output_type=HypothesisAssessment, |
| system_prompt=SYSTEM_PROMPT, |
| ) |
| |
| async def run( |
| self, |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, |
| *, |
| thread: AgentThread | None = None, |
| **kwargs: Any, |
| ) -> AgentRunResponse: |
| """Generate hypotheses based on current evidence.""" |
| # Extract query |
| query = self._extract_query(messages) |
| |
| # Get current evidence |
| evidence = self._evidence_store.get("current", []) |
| |
| if not evidence: |
| return AgentRunResponse( |
| messages=[ChatMessage( |
| role=Role.ASSISTANT, |
| text="No evidence available yet. Search for evidence first." |
| )], |
| response_id="hypothesis-no-evidence", |
| ) |
| |
| # Generate hypotheses with diverse evidence selection |
| # NOTE: format_hypothesis_prompt is now async |
| prompt = await format_hypothesis_prompt( |
| query, evidence, embeddings=self._embeddings |
| ) |
| result = await self._agent.run(prompt) |
| assessment = result.output |
| |
| # Store hypotheses in shared context |
| existing = self._evidence_store.get("hypotheses", []) |
| self._evidence_store["hypotheses"] = existing + assessment.hypotheses |
| |
| # Format response |
| response_text = self._format_response(assessment) |
| |
| return AgentRunResponse( |
| messages=[ChatMessage(role=Role.ASSISTANT, text=response_text)], |
| response_id=f"hypothesis-{len(assessment.hypotheses)}", |
| additional_properties={"assessment": assessment.model_dump()}, |
| ) |
| |
| def _format_response(self, assessment: HypothesisAssessment) -> str: |
| """Format hypothesis assessment as markdown.""" |
| lines = ["## Generated Hypotheses\n"] |
| |
| for i, h in enumerate(assessment.hypotheses, 1): |
| lines.append(f"### Hypothesis {i} (Confidence: {h.confidence:.0%})") |
| lines.append(f"**Mechanism**: {h.drug} β {h.target} β {h.pathway} β {h.effect}") |
| lines.append(f"**Suggested searches**: {', '.join(h.search_suggestions)}\n") |
| |
| if assessment.primary_hypothesis: |
| lines.append(f"### Primary Hypothesis") |
| h = assessment.primary_hypothesis |
| lines.append(f"{h.drug} β {h.target} β {h.pathway} β {h.effect}\n") |
| |
| if assessment.knowledge_gaps: |
| lines.append("### Knowledge Gaps") |
| for gap in assessment.knowledge_gaps: |
| lines.append(f"- {gap}") |
| |
| if assessment.recommended_searches: |
| lines.append("\n### Recommended Next Searches") |
| for search in assessment.recommended_searches: |
| lines.append(f"- `{search}`") |
| |
| return "\n".join(lines) |
| |
| def _extract_query(self, messages) -> str: |
| """Extract query from messages.""" |
| if isinstance(messages, str): |
| return messages |
| elif isinstance(messages, ChatMessage): |
| return messages.text or "" |
| elif isinstance(messages, list): |
| for msg in reversed(messages): |
| if isinstance(msg, ChatMessage) and msg.role == Role.USER: |
| return msg.text or "" |
| elif isinstance(msg, str): |
| return msg |
| return "" |
| |
| async def run_stream( |
| self, |
| messages: str | ChatMessage | list[str] | list[ChatMessage] | None = None, |
| *, |
| thread: AgentThread | None = None, |
| **kwargs: Any, |
| ) -> AsyncIterable[AgentRunResponseUpdate]: |
| """Streaming wrapper.""" |
| result = await self.run(messages, thread=thread, **kwargs) |
| yield AgentRunResponseUpdate( |
| messages=result.messages, |
| response_id=result.response_id |
| ) |
| ``` |
| |
| ### 4.3 Update MagenticOrchestrator |
|
|
| Add HypothesisAgent to the workflow: |
|
|
| ```python |
| # In MagenticOrchestrator.__init__ |
| self._hypothesis_agent = HypothesisAgent(self._evidence_store) |
| |
| # In workflow building |
| workflow = ( |
| MagenticBuilder() |
| .participants( |
| searcher=search_agent, |
| hypothesizer=self._hypothesis_agent, # NEW |
| judge=judge_agent, |
| ) |
| .with_standard_manager(...) |
| .build() |
| ) |
| |
| # Update task instruction |
| task = f"""Research drug repurposing opportunities for: {query} |
| |
| Workflow: |
| 1. SearchAgent: Find initial evidence from PubMed and web |
| 2. HypothesisAgent: Generate mechanistic hypotheses (Drug β Target β Pathway β Effect) |
| 3. SearchAgent: Use hypothesis-suggested queries for targeted search |
| 4. JudgeAgent: Evaluate if evidence supports hypotheses |
| 5. Repeat until confident or max rounds |
| |
| Focus on: |
| - Identifying specific molecular targets |
| - Understanding mechanism of action |
| - Finding supporting/contradicting evidence for hypotheses |
| """ |
| ``` |
|
|
| --- |
|
|
| ## 5. Directory Structure After Phase 7 |
|
|
| ``` |
| src/ |
| βββ agents/ |
| β βββ search_agent.py |
| β βββ judge_agent.py |
| β βββ hypothesis_agent.py # NEW |
| βββ prompts/ |
| β βββ judge.py |
| β βββ hypothesis.py # NEW |
| βββ services/ |
| β βββ embeddings.py |
| βββ utils/ |
| βββ models.py # Updated with hypothesis models |
| ``` |
|
|
| --- |
|
|
| ## 6. Tests |
|
|
| ### 6.1 Unit Tests (`tests/unit/agents/test_hypothesis_agent.py`) |
|
|
| ```python |
| """Unit tests for HypothesisAgent.""" |
| import pytest |
| from unittest.mock import AsyncMock, MagicMock, patch |
| |
| from src.agents.hypothesis_agent import HypothesisAgent |
| from src.utils.models import Citation, Evidence, HypothesisAssessment, MechanismHypothesis |
| |
| |
| @pytest.fixture |
| def sample_evidence(): |
| return [ |
| Evidence( |
| content="Metformin activates AMPK, which inhibits mTOR signaling...", |
| citation=Citation( |
| source="pubmed", |
| title="Metformin and AMPK", |
| url="https://pubmed.ncbi.nlm.nih.gov/12345/", |
| date="2023" |
| ) |
| ) |
| ] |
| |
| |
| @pytest.fixture |
| def mock_assessment(): |
| return HypothesisAssessment( |
| hypotheses=[ |
| MechanismHypothesis( |
| drug="Metformin", |
| target="AMPK", |
| pathway="mTOR inhibition", |
| effect="Reduced cancer cell proliferation", |
| confidence=0.75, |
| search_suggestions=["metformin AMPK cancer", "mTOR cancer therapy"] |
| ) |
| ], |
| primary_hypothesis=None, |
| knowledge_gaps=["Clinical trial data needed"], |
| recommended_searches=["metformin clinical trial cancer"] |
| ) |
| |
| |
| @pytest.mark.asyncio |
| async def test_hypothesis_agent_generates_hypotheses(sample_evidence, mock_assessment): |
| """HypothesisAgent should generate mechanistic hypotheses.""" |
| store = {"current": sample_evidence, "hypotheses": []} |
| |
| with patch("src.agents.hypothesis_agent.Agent") as MockAgent: |
| mock_result = MagicMock() |
| mock_result.output = mock_assessment |
| MockAgent.return_value.run = AsyncMock(return_value=mock_result) |
| |
| agent = HypothesisAgent(store) |
| response = await agent.run("metformin cancer") |
| |
| assert "AMPK" in response.messages[0].text |
| assert len(store["hypotheses"]) == 1 |
| |
| |
| @pytest.mark.asyncio |
| async def test_hypothesis_agent_no_evidence(): |
| """HypothesisAgent should handle empty evidence gracefully.""" |
| store = {"current": [], "hypotheses": []} |
| agent = HypothesisAgent(store) |
| |
| response = await agent.run("test query") |
| |
| assert "No evidence" in response.messages[0].text |
| ``` |
|
|
| --- |
|
|
| ## 7. Definition of Done |
|
|
| Phase 7 is **COMPLETE** when: |
|
|
| 1. `MechanismHypothesis` and `HypothesisAssessment` models implemented |
| 2. `HypothesisAgent` generates hypotheses from evidence |
| 3. Hypotheses stored in shared context |
| 4. Search queries generated from hypotheses |
| 5. Magentic workflow includes HypothesisAgent |
| 6. All unit tests pass |
|
|
| --- |
|
|
| ## 8. Value Delivered |
|
|
| | Before (Phase 6) | After (Phase 7) | |
| |------------------|-----------------| |
| | Reactive search | Hypothesis-driven search | |
| | Generic queries | Mechanism-targeted queries | |
| | No scientific reasoning | Drug β Target β Pathway β Effect | |
| | Judge says "need more" | Hypothesis says "search for X to test Y" | |
|
|
| **Real example improvement:** |
| - Query: "metformin alzheimer" |
| - Before: "metformin alzheimer mechanism", "metformin brain" |
| - After: "metformin AMPK activation", "AMPK autophagy neurodegeneration", "autophagy amyloid clearance" |
|
|
| The search becomes **scientifically targeted** rather than keyword variations. |
|
|