SPEC_10: Domain-Agnostic Refactor
Status: DRAFT Priority: P1 Effort: Medium (2-3 hours) Related Issues: #75, #76
Problem Statement
The codebase has "drug repurposing" hardcoded in 16 locations (originally identified 15, plus 1 found in audit):
src/prompts/report.py:11 - SYSTEM_PROMPT
src/prompts/judge.py:5 - SYSTEM_PROMPT
src/prompts/judge.py:140 - Evidence scoring prompt (inside format_user_prompt)
src/prompts/hypothesis.py:11 - SYSTEM_PROMPT
src/orchestrators/simple.py:476 - Report header
src/orchestrators/simple.py:564 - Report header
src/orchestrators/advanced.py:159 - Task prompt
src/agents/magentic_agents.py:33 - Agent description
src/agents/magentic_agents.py:108 - Agent description
src/agents/search_agent.py:31 - Tool description
src/agents/tools.py:85 - Tool docstring
src/mcp_tools.py:27 - Example query
src/mcp_tools.py:116 - Docstring
src/mcp_tools.py:164 - Function docstring
src/mcp_tools.py:167 - Docstring
src/agent_factory/judges.py:21 - Imports format_user_prompt (needs update)
This violates:
- DRY - Same concept repeated 15+ times
- Open/Closed - Can't add domains without modifying multiple files
- Flexibility - Agent is locked to one domain
Solution: Centralized Domain Configuration
1. Create Domain Config Module
File: src/config/domain.py
"""Centralized domain configuration for research agents.
This module defines research domains and their associated prompts,
allowing the agent to operate in domain-agnostic or domain-specific modes.
Usage:
from src.config.domain import get_domain_config, ResearchDomain
# Get default (general) config
config = get_domain_config()
# Get specific domain
config = get_domain_config(ResearchDomain.SEXUAL_HEALTH)
# Use in prompts
system_prompt = config.judge_system_prompt
"""
from enum import Enum
from typing import ClassVar
from pydantic import BaseModel
class ResearchDomain(str, Enum):
"""Available research domains."""
GENERAL = "general"
DRUG_REPURPOSING = "drug_repurposing"
SEXUAL_HEALTH = "sexual_health"
class DomainConfig(BaseModel):
"""Configuration for a research domain.
Contains all domain-specific text used across the codebase,
ensuring consistency and single-source-of-truth.
"""
# Identity
name: str
description: str
# Report generation
report_title: str
report_focus: str
# Judge prompts
judge_system_prompt: str
judge_scoring_prompt: str
# Hypothesis prompts
hypothesis_system_prompt: str
# Report writer prompts
report_system_prompt: str
# Search context
search_description: str
search_example_query: str
# Agent descriptions (for Magentic mode)
search_agent_description: str
hypothesis_agent_description: str
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Domain Definitions
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
GENERAL_CONFIG = DomainConfig(
name="General Research",
description="General-purpose biomedical research agent",
report_title="## Research Analysis",
report_focus="comprehensive research synthesis",
judge_system_prompt="""You are an expert research judge.
Your role is to evaluate evidence quality, assess relevance to the research query,
and determine if sufficient evidence exists to synthesize findings.""",
judge_scoring_prompt="""Score this evidence for research relevance.
Provide ONLY scores and extracted data.""",
hypothesis_system_prompt="""You are a biomedical research scientist.
Your role is to generate evidence-based hypotheses from the literature,
identifying key mechanisms, targets, and potential therapeutic implications.""",
report_system_prompt="""You are a scientific writer specializing in research reports.
Your role is to synthesize evidence into clear, well-structured reports with
proper citations and evidence-based conclusions.""",
search_description="Searches biomedical literature for relevant evidence",
search_example_query="metformin aging mechanisms",
search_agent_description="Searches PubMed, ClinicalTrials.gov, and Europe PMC for evidence",
hypothesis_agent_description="Generates mechanistic hypotheses from evidence",
)
DRUG_REPURPOSING_CONFIG = DomainConfig(
name="Drug Repurposing",
description="Drug repurposing research specialist",
report_title="## Drug Repurposing Analysis",
report_focus="drug repurposing opportunities",
judge_system_prompt="""You are an expert drug repurposing research judge.
Your role is to evaluate evidence for drug repurposing potential, assess
mechanism plausibility, and determine if compounds warrant further investigation.""",
judge_scoring_prompt="""Score this evidence for drug repurposing potential.
Provide ONLY scores and extracted data.""",
hypothesis_system_prompt="""You are a biomedical research scientist specializing in drug repurposing.
Your role is to generate mechanistic hypotheses for how existing drugs might
treat new indications, based on shared pathways and targets.""",
report_system_prompt="""You are a scientific writer specializing in drug repurposing research reports.
Your role is to synthesize evidence into actionable drug repurposing recommendations
with clear mechanistic rationale and clinical translation potential.""",
search_description="Searches biomedical literature for drug repurposing evidence",
search_example_query="metformin alzheimer repurposing",
search_agent_description="Searches PubMed for drug repurposing evidence",
hypothesis_agent_description="Generates mechanistic hypotheses for drug repurposing",
)
SEXUAL_HEALTH_CONFIG = DomainConfig(
name="Sexual Health Research",
description="Sexual health and wellness research specialist",
report_title="## Sexual Health Analysis",
report_focus="sexual health and wellness interventions",
judge_system_prompt="""You are an expert sexual health research judge.
Your role is to evaluate evidence for sexual health interventions, assess
efficacy and safety data, and determine clinical applicability.""",
judge_scoring_prompt="""Score this evidence for sexual health relevance.
Provide ONLY scores and extracted data.""",
hypothesis_system_prompt="""You are a biomedical research scientist specializing in sexual health.
Your role is to generate evidence-based hypotheses for sexual health interventions,
identifying mechanisms of action and potential therapeutic applications.""",
report_system_prompt="""You are a scientific writer specializing in sexual health research reports.
Your role is to synthesize evidence into clear recommendations for sexual health
interventions with proper safety considerations.""",
search_description="Searches biomedical literature for sexual health evidence",
search_example_query="testosterone therapy female libido",
search_agent_description="Searches PubMed for sexual health evidence",
hypothesis_agent_description="Generates hypotheses for sexual health interventions",
)
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# Domain Registry
# βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
DOMAIN_CONFIGS: dict[ResearchDomain, DomainConfig] = {
ResearchDomain.GENERAL: GENERAL_CONFIG,
ResearchDomain.DRUG_REPURPOSING: DRUG_REPURPOSING_CONFIG,
ResearchDomain.SEXUAL_HEALTH: SEXUAL_HEALTH_CONFIG,
}
# Default domain
DEFAULT_DOMAIN = ResearchDomain.GENERAL
def get_domain_config(domain: ResearchDomain | str | None = None) -> DomainConfig:
"""Get configuration for a research domain.
Args:
domain: The research domain. Defaults to GENERAL if None.
Returns:
DomainConfig for the specified domain.
"""
if domain is None:
domain = DEFAULT_DOMAIN
if isinstance(domain, str):
try:
domain = ResearchDomain(domain)
except ValueError:
domain = DEFAULT_DOMAIN
return DOMAIN_CONFIGS[domain]
2. Update Settings to Include Domain
File: src/utils/config.py (add to Settings class)
from src.config.domain import ResearchDomain
class Settings(BaseSettings):
# ... existing fields ...
# Domain configuration
research_domain: ResearchDomain = ResearchDomain.GENERAL
3. Update All Hardcoded Locations
3.1 Prompts Module
src/prompts/report.py:
from src.config.domain import get_domain_config
def get_system_prompt(domain=None):
config = get_domain_config(domain)
return config.report_system_prompt
# Keep SYSTEM_PROMPT for backwards compatibility (uses default)
SYSTEM_PROMPT = get_system_prompt()
src/prompts/judge.py:
from src.config.domain import get_domain_config, ResearchDomain
def get_system_prompt(domain=None):
config = get_domain_config(domain)
return config.judge_system_prompt
def format_user_prompt(
question: str,
evidence: list[Evidence],
iteration: int = 0,
max_iterations: int = 10,
total_evidence_count: int | None = None,
domain: ResearchDomain | None = None, # NEW ARGUMENT
) -> str:
config = get_domain_config(domain)
# ... existing logic ...
# Inside f-string:
return f"""...
{config.judge_scoring_prompt}
DO NOT decide "synthesize" vs "continue" - that decision is made by the system.
...
"""
SYSTEM_PROMPT = get_system_prompt()
src/prompts/hypothesis.py:
from src.config.domain import get_domain_config
def get_system_prompt(domain=None):
config = get_domain_config(domain)
return config.hypothesis_system_prompt
SYSTEM_PROMPT = get_system_prompt()
3.2 Judge Factory
src/agent_factory/judges.py:
from src.config.domain import ResearchDomain
class JudgeHandler:
def __init__(self, model: Any = None, domain: ResearchDomain | None = None) -> None:
self.model = model or get_model()
self.domain = domain # Store domain
# ...
async def assess(self, ...):
# ...
if evidence:
user_prompt = format_user_prompt(
...,
domain=self.domain # Pass domain
)
3.3 Orchestrators
src/orchestrators/simple.py:
from src.config.domain import get_domain_config
class SimpleOrchestrator:
def __init__(self, domain=None, ...):
self.domain = domain
self.domain_config = get_domain_config(domain)
# Pass domain to JudgeHandler
self.judge = JudgeHandler(domain=domain)
def _format_report(self, ...):
return f"""{self.domain_config.report_title}
Query: {query}
...
"""
src/orchestrators/advanced.py:
from src.config.domain import get_domain_config
async def run_research(..., domain=None):
config = get_domain_config(domain)
task = f"""Research {config.report_focus} for: {query}
...
"""
3.4 Agents
src/agents/magentic_agents.py:
from src.config.domain import get_domain_config
def create_search_agent(domain=None):
config = get_domain_config(domain)
return Agent(
description=config.search_agent_description,
...
)
src/agents/search_agent.py and src/agents/tools.py:
Similar pattern - inject domain config.
3.5 MCP Tools
src/mcp_tools.py:
from src.config.domain import get_domain_config, ResearchDomain
@mcp.tool
async def search_pubmed(query: str, domain: str = "general"):
"""Search PubMed for biomedical literature.
Args:
query: Search query (e.g., "metformin alzheimer")
domain: Research domain (general, drug_repurposing, sexual_health)
"""
config = get_domain_config(ResearchDomain(domain))
# Use config.search_description in responses
4. Update Gradio UI
src/app.py - Add domain selector:
from src.config.domain import ResearchDomain, DOMAIN_CONFIGS
domain_dropdown = gr.Dropdown(
choices=[d.value for d in ResearchDomain],
value="general",
label="Research Domain",
info="Select research focus area"
)
Implementation Checklist
- Create
src/config/domain.pywith DomainConfig - Add
research_domainto Settings - Update
src/prompts/report.py - Update
src/prompts/judge.py(Add domain arg toformat_user_prompt) - Update
src/prompts/hypothesis.py - Update
src/agent_factory/judges.py(Pass domain toformat_user_prompt) - Update
src/orchestrators/simple.py(Pass domain toJudgeHandler) - Update
src/orchestrators/advanced.py - Update
src/agents/magentic_agents.py - Update
src/agents/search_agent.py - Update
src/agents/tools.py - Update
src/mcp_tools.py - Add domain selector to Gradio UI
- Update Tests:
tests/e2e/test_simple_mode.pycontains hardcoded "Drug Repurposing" assertions that will fail with default "General" domain.
Testing Strategy
Unit Tests
# tests/unit/config/test_domain.py
def test_get_domain_config_default():
config = get_domain_config()
assert config.name == "General Research"
def test_get_domain_config_drug_repurposing():
config = get_domain_config(ResearchDomain.DRUG_REPURPOSING)
assert "drug repurposing" in config.judge_system_prompt.lower()
def test_all_domains_have_required_fields():
for domain in ResearchDomain:
config = get_domain_config(domain)
assert config.report_title
assert config.judge_system_prompt
assert config.hypothesis_system_prompt
Integration Tests
# tests/integration/test_domain_switching.py
@pytest.mark.integration
async def test_simple_mode_respects_domain():
result = await run_simple_mode(
"metformin aging",
domain=ResearchDomain.GENERAL
)
assert "## Research Analysis" in result
result = await run_simple_mode(
"metformin aging",
domain=ResearchDomain.DRUG_REPURPOSING
)
assert "## Drug Repurposing Analysis" in result
Migration Path
- Phase 1: Create domain config, add to Settings (no breaking changes)
- Phase 2: Update prompts module to use config (backwards compatible)
- Phase 3: Update
JudgeHandlerandformat_user_prompt(requires careful threading of domain) - Phase 4: Update orchestrators and agents
- Phase 5: Update UI with domain selector and Fix Tests
Success Criteria
- Zero hardcoded "drug repurposing" strings in
src/(exceptdomain.py) - All existing tests pass (after updates)
- New domain can be added by only modifying
domain.py - Default behavior is "General Research"