Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

File size: 15,419 Bytes

# SPEC_10: Domain-Agnostic Refactor

**Status**: DRAFT
**Priority**: P1
**Effort**: Medium (2-3 hours)
**Related Issues**: #75, #76

## Problem Statement

The codebase has "drug repurposing" hardcoded in **16 locations** (originally identified 15, plus 1 found in audit):

```
src/prompts/report.py:11      - SYSTEM_PROMPT
src/prompts/judge.py:5        - SYSTEM_PROMPT
src/prompts/judge.py:140      - Evidence scoring prompt (inside format_user_prompt)
src/prompts/hypothesis.py:11  - SYSTEM_PROMPT
src/orchestrators/simple.py:476   - Report header
src/orchestrators/simple.py:564   - Report header
src/orchestrators/advanced.py:159 - Task prompt
src/agents/magentic_agents.py:33  - Agent description
src/agents/magentic_agents.py:108 - Agent description
src/agents/search_agent.py:31     - Tool description
src/agents/tools.py:85            - Tool docstring
src/mcp_tools.py:27               - Example query
src/mcp_tools.py:116              - Docstring
src/mcp_tools.py:164              - Function docstring
src/mcp_tools.py:167              - Docstring
src/agent_factory/judges.py:21    - Imports format_user_prompt (needs update)
```

This violates:
- **DRY** - Same concept repeated 15+ times
- **Open/Closed** - Can't add domains without modifying multiple files
- **Flexibility** - Agent is locked to one domain

## Solution: Centralized Domain Configuration

### 1. Create Domain Config Module

**File**: `src/config/domain.py`

```python
"""Centralized domain configuration for research agents.

This module defines research domains and their associated prompts,
allowing the agent to operate in domain-agnostic or domain-specific modes.

Usage:
    from src.config.domain import get_domain_config, ResearchDomain

    # Get default (general) config
    config = get_domain_config()

    # Get specific domain
    config = get_domain_config(ResearchDomain.SEXUAL_HEALTH)

    # Use in prompts
    system_prompt = config.judge_system_prompt
"""

from enum import Enum
from typing import ClassVar

from pydantic import BaseModel


class ResearchDomain(str, Enum):
    """Available research domains."""

    GENERAL = "general"
    DRUG_REPURPOSING = "drug_repurposing"
    SEXUAL_HEALTH = "sexual_health"


class DomainConfig(BaseModel):
    """Configuration for a research domain.

    Contains all domain-specific text used across the codebase,
    ensuring consistency and single-source-of-truth.
    """

    # Identity
    name: str
    description: str

    # Report generation
    report_title: str
    report_focus: str

    # Judge prompts
    judge_system_prompt: str
    judge_scoring_prompt: str

    # Hypothesis prompts
    hypothesis_system_prompt: str

    # Report writer prompts
    report_system_prompt: str

    # Search context
    search_description: str
    search_example_query: str

    # Agent descriptions (for Magentic mode)
    search_agent_description: str
    hypothesis_agent_description: str


# ─────────────────────────────────────────────────────────────────
# Domain Definitions
# ─────────────────────────────────────────────────────────────────

GENERAL_CONFIG = DomainConfig(
    name="General Research",
    description="General-purpose biomedical research agent",

    report_title="## Research Analysis",
    report_focus="comprehensive research synthesis",

    judge_system_prompt="""You are an expert research judge.
Your role is to evaluate evidence quality, assess relevance to the research query,
and determine if sufficient evidence exists to synthesize findings.""",

    judge_scoring_prompt="""Score this evidence for research relevance.
Provide ONLY scores and extracted data.""",

    hypothesis_system_prompt="""You are a biomedical research scientist.
Your role is to generate evidence-based hypotheses from the literature,
identifying key mechanisms, targets, and potential therapeutic implications.""",

    report_system_prompt="""You are a scientific writer specializing in research reports.
Your role is to synthesize evidence into clear, well-structured reports with
proper citations and evidence-based conclusions.""",

    search_description="Searches biomedical literature for relevant evidence",
    search_example_query="metformin aging mechanisms",

    search_agent_description="Searches PubMed, ClinicalTrials.gov, and Europe PMC for evidence",
    hypothesis_agent_description="Generates mechanistic hypotheses from evidence",
)

DRUG_REPURPOSING_CONFIG = DomainConfig(
    name="Drug Repurposing",
    description="Drug repurposing research specialist",

    report_title="## Drug Repurposing Analysis",
    report_focus="drug repurposing opportunities",

    judge_system_prompt="""You are an expert drug repurposing research judge.
Your role is to evaluate evidence for drug repurposing potential, assess
mechanism plausibility, and determine if compounds warrant further investigation.""",

    judge_scoring_prompt="""Score this evidence for drug repurposing potential.
Provide ONLY scores and extracted data.""",

    hypothesis_system_prompt="""You are a biomedical research scientist specializing in drug repurposing.
Your role is to generate mechanistic hypotheses for how existing drugs might
treat new indications, based on shared pathways and targets.""",

    report_system_prompt="""You are a scientific writer specializing in drug repurposing research reports.
Your role is to synthesize evidence into actionable drug repurposing recommendations
with clear mechanistic rationale and clinical translation potential.""",

    search_description="Searches biomedical literature for drug repurposing evidence",
    search_example_query="metformin alzheimer repurposing",

    search_agent_description="Searches PubMed for drug repurposing evidence",
    hypothesis_agent_description="Generates mechanistic hypotheses for drug repurposing",
)

SEXUAL_HEALTH_CONFIG = DomainConfig(
    name="Sexual Health Research",
    description="Sexual health and wellness research specialist",

    report_title="## Sexual Health Analysis",
    report_focus="sexual health and wellness interventions",

    judge_system_prompt="""You are an expert sexual health research judge.
Your role is to evaluate evidence for sexual health interventions, assess
efficacy and safety data, and determine clinical applicability.""",

    judge_scoring_prompt="""Score this evidence for sexual health relevance.
Provide ONLY scores and extracted data.""",

    hypothesis_system_prompt="""You are a biomedical research scientist specializing in sexual health.
Your role is to generate evidence-based hypotheses for sexual health interventions,
identifying mechanisms of action and potential therapeutic applications.""",

    report_system_prompt="""You are a scientific writer specializing in sexual health research reports.
Your role is to synthesize evidence into clear recommendations for sexual health
interventions with proper safety considerations.""",

    search_description="Searches biomedical literature for sexual health evidence",
    search_example_query="testosterone therapy female libido",

    search_agent_description="Searches PubMed for sexual health evidence",
    hypothesis_agent_description="Generates hypotheses for sexual health interventions",
)

# ─────────────────────────────────────────────────────────────────
# Domain Registry
# ─────────────────────────────────────────────────────────────────

DOMAIN_CONFIGS: dict[ResearchDomain, DomainConfig] = {
    ResearchDomain.GENERAL: GENERAL_CONFIG,
    ResearchDomain.DRUG_REPURPOSING: DRUG_REPURPOSING_CONFIG,
    ResearchDomain.SEXUAL_HEALTH: SEXUAL_HEALTH_CONFIG,
}

# Default domain
DEFAULT_DOMAIN = ResearchDomain.GENERAL


def get_domain_config(domain: ResearchDomain | str | None = None) -> DomainConfig:
    """Get configuration for a research domain.

    Args:
        domain: The research domain. Defaults to GENERAL if None.

    Returns:
        DomainConfig for the specified domain.
    """
    if domain is None:
        domain = DEFAULT_DOMAIN
    
    if isinstance(domain, str):
        try:
            domain = ResearchDomain(domain)
        except ValueError:
            domain = DEFAULT_DOMAIN

    return DOMAIN_CONFIGS[domain]
```

### 2. Update Settings to Include Domain

**File**: `src/utils/config.py` (add to Settings class)

```python
from src.config.domain import ResearchDomain

class Settings(BaseSettings):
    # ... existing fields ...

    # Domain configuration
    research_domain: ResearchDomain = ResearchDomain.GENERAL
```

### 3. Update All Hardcoded Locations

#### 3.1 Prompts Module

**`src/prompts/report.py`**:
```python
from src.config.domain import get_domain_config

def get_system_prompt(domain=None):
    config = get_domain_config(domain)
    return config.report_system_prompt

# Keep SYSTEM_PROMPT for backwards compatibility (uses default)
SYSTEM_PROMPT = get_system_prompt()
```

**`src/prompts/judge.py`**:
```python
from src.config.domain import get_domain_config, ResearchDomain

def get_system_prompt(domain=None):
    config = get_domain_config(domain)
    return config.judge_system_prompt

def format_user_prompt(
    question: str,
    evidence: list[Evidence],
    iteration: int = 0,
    max_iterations: int = 10,
    total_evidence_count: int | None = None,
    domain: ResearchDomain | None = None,  # NEW ARGUMENT
) -> str:
    config = get_domain_config(domain)
    # ... existing logic ...
    
    # Inside f-string:
    return f"""...
{config.judge_scoring_prompt}
DO NOT decide "synthesize" vs "continue" - that decision is made by the system.
...
"""

SYSTEM_PROMPT = get_system_prompt()
```

**`src/prompts/hypothesis.py`**:
```python
from src.config.domain import get_domain_config

def get_system_prompt(domain=None):
    config = get_domain_config(domain)
    return config.hypothesis_system_prompt

SYSTEM_PROMPT = get_system_prompt()
```

#### 3.2 Judge Factory

**`src/agent_factory/judges.py`**:
```python
from src.config.domain import ResearchDomain

class JudgeHandler:
    def __init__(self, model: Any = None, domain: ResearchDomain | None = None) -> None:
        self.model = model or get_model()
        self.domain = domain  # Store domain
        # ...

    async def assess(self, ...):
        # ...
        if evidence:
            user_prompt = format_user_prompt(
                ...,
                domain=self.domain  # Pass domain
            )
```

#### 3.3 Orchestrators

**`src/orchestrators/simple.py`**:
```python
from src.config.domain import get_domain_config

class SimpleOrchestrator:
    def __init__(self, domain=None, ...):
        self.domain = domain
        self.domain_config = get_domain_config(domain)
        
        # Pass domain to JudgeHandler
        self.judge = JudgeHandler(domain=domain)

    def _format_report(self, ...):
        return f"""{self.domain_config.report_title}
Query: {query}
...
"""
```

**`src/orchestrators/advanced.py`**:
```python
from src.config.domain import get_domain_config

async def run_research(..., domain=None):
    config = get_domain_config(domain)
    task = f"""Research {config.report_focus} for: {query}
    ...
    """
```

#### 3.4 Agents

**`src/agents/magentic_agents.py`**:
```python
from src.config.domain import get_domain_config

def create_search_agent(domain=None):
    config = get_domain_config(domain)
    return Agent(
        description=config.search_agent_description,
        ...
    )
```

**`src/agents/search_agent.py`** and **`src/agents/tools.py`**:
Similar pattern - inject domain config.

#### 3.5 MCP Tools

**`src/mcp_tools.py`**:
```python
from src.config.domain import get_domain_config, ResearchDomain

@mcp.tool
async def search_pubmed(query: str, domain: str = "general"):
    """Search PubMed for biomedical literature.

    Args:
        query: Search query (e.g., "metformin alzheimer")
        domain: Research domain (general, drug_repurposing, sexual_health)
    """
    config = get_domain_config(ResearchDomain(domain))
    # Use config.search_description in responses
```

### 4. Update Gradio UI

**`src/app.py`** - Add domain selector:

```python
from src.config.domain import ResearchDomain, DOMAIN_CONFIGS

domain_dropdown = gr.Dropdown(
    choices=[d.value for d in ResearchDomain],
    value="general",
    label="Research Domain",
    info="Select research focus area"
)
```

## Implementation Checklist

- [ ] Create `src/config/domain.py` with DomainConfig
- [ ] Add `research_domain` to Settings
- [ ] Update `src/prompts/report.py`
- [ ] Update `src/prompts/judge.py` (Add domain arg to `format_user_prompt`)
- [ ] Update `src/prompts/hypothesis.py`
- [ ] Update `src/agent_factory/judges.py` (Pass domain to `format_user_prompt`)
- [ ] Update `src/orchestrators/simple.py` (Pass domain to `JudgeHandler`)
- [ ] Update `src/orchestrators/advanced.py`
- [ ] Update `src/agents/magentic_agents.py`
- [ ] Update `src/agents/search_agent.py`
- [ ] Update `src/agents/tools.py`
- [ ] Update `src/mcp_tools.py`
- [ ] Add domain selector to Gradio UI
- [ ] **Update Tests**: `tests/e2e/test_simple_mode.py` contains hardcoded "Drug Repurposing" assertions that will fail with default "General" domain.

## Testing Strategy

### Unit Tests

```python
# tests/unit/config/test_domain.py

def test_get_domain_config_default():
    config = get_domain_config()
    assert config.name == "General Research"

def test_get_domain_config_drug_repurposing():
    config = get_domain_config(ResearchDomain.DRUG_REPURPOSING)
    assert "drug repurposing" in config.judge_system_prompt.lower()

def test_all_domains_have_required_fields():
    for domain in ResearchDomain:
        config = get_domain_config(domain)
        assert config.report_title
        assert config.judge_system_prompt
        assert config.hypothesis_system_prompt
```

### Integration Tests

```python
# tests/integration/test_domain_switching.py

@pytest.mark.integration
async def test_simple_mode_respects_domain():
    result = await run_simple_mode(
        "metformin aging",
        domain=ResearchDomain.GENERAL
    )
    assert "## Research Analysis" in result

    result = await run_simple_mode(
        "metformin aging",
        domain=ResearchDomain.DRUG_REPURPOSING
    )
    assert "## Drug Repurposing Analysis" in result
```

## Migration Path

1. **Phase 1**: Create domain config, add to Settings (no breaking changes)
2. **Phase 2**: Update prompts module to use config (backwards compatible)
3. **Phase 3**: Update `JudgeHandler` and `format_user_prompt` (requires careful threading of domain)
4. **Phase 4**: Update orchestrators and agents
5. **Phase 5**: Update UI with domain selector and Fix Tests

## Success Criteria

- [ ] Zero hardcoded "drug repurposing" strings in `src/` (except `domain.py`)
- [ ] All existing tests pass (after updates)
- [ ] New domain can be added by only modifying `domain.py`
- [ ] Default behavior is "General Research"