# Phase 3 Implementation Spec: Judge Vertical Slice

**Goal**: Implement the "Brain" of the agent — evaluating evidence quality.
**Philosophy**: "Structured Output or Bust."
**Estimated Effort**: 3-4 hours
**Prerequisite**: Phase 2 complete

---

## 1. The Slice Definition

This slice covers:
1. **Input**: Question + List of `Evidence`.
2. **Process**:
   - Construct prompt with evidence.
   - Call LLM (PydanticAI).
   - Parse into `JudgeAssessment`.
3. **Output**: `JudgeAssessment` object.

**Files**:
- `src/utils/models.py`: Add Judge models
- `src/prompts/judge.py`: Prompt templates
- `src/agent_factory/judges.py`: Handler logic

---

## 2. Models (`src/utils/models.py`)

Add these to the existing models file:

```python
class DrugCandidate(BaseModel):
    """A potential drug repurposing candidate."""
    drug_name: str
    original_indication: str
    proposed_indication: str
    mechanism: str
    evidence_strength: Literal["weak", "moderate", "strong"]

class JudgeAssessment(BaseModel):
    """The judge's assessment."""
    sufficient: bool
    recommendation: Literal["continue", "synthesize"]
    reasoning: str
    overall_quality_score: int
    coverage_score: int
    candidates: list[DrugCandidate] = Field(default_factory=list)
    next_search_queries: list[str] = Field(default_factory=list)
    gaps: list[str] = Field(default_factory=list)
```

---

## 3. Prompts (`src/prompts/judge.py`)

```python
"""Prompt templates for the Judge."""
from typing import List
from src.utils.models import Evidence

JUDGE_SYSTEM_PROMPT = """You are a biomedical research judge..."""

def build_judge_user_prompt(question: str, evidence: List[Evidence]) -> str:
    """Build the user prompt."""
    # ... implementation ...
```

---

## 4. Handler (`src/agent_factory/judges.py`)

```python
"""Judge handler - evaluates evidence quality."""
import structlog
from pydantic_ai import Agent
from tenacity import retry, stop_after_attempt

from src.shared.config import settings
from src.utils.exceptions import JudgeError
from src.utils.models import JudgeAssessment, Evidence
from src.prompts.judge import JUDGE_SYSTEM_PROMPT, build_judge_user_prompt

logger = structlog.get_logger()

# Initialize Agent
judge_agent = Agent(
    model=settings.llm_model,  # e.g. 'openai:gpt-4o'
    result_type=JudgeAssessment,
    system_prompt=JUDGE_SYSTEM_PROMPT,
)

class JudgeHandler:
    """Handles evidence assessment."""

    def __init__(self, agent=None):
        self.agent = agent or judge_agent

    async def assess(self, question: str, evidence: List[Evidence]) -> JudgeAssessment:
        """Assess evidence sufficiency."""
        prompt = build_judge_user_prompt(question, evidence)
        try:
            result = await self.agent.run(prompt)
            return result.data
        except Exception as e:
            raise JudgeError(f"Assessment failed: {e}")
```

---

## 5. TDD Workflow

### Test File: `tests/unit/agent_factory/test_judges.py`

```python
"""Unit tests for JudgeHandler."""
import pytest
from unittest.mock import AsyncMock, MagicMock

class TestJudgeHandler:
    @pytest.mark.asyncio
    async def test_assess_returns_assessment(self, mocker):
        from src.agent_factory.judges import JudgeHandler
        from src.utils.models import JudgeAssessment, Evidence, Citation

        # Mock PydanticAI agent result
        mock_result = MagicMock()
        mock_result.data = JudgeAssessment(
            sufficient=True,
            recommendation="synthesize",
            reasoning="Good",
            overall_quality_score=8,
            coverage_score=8
        )
        
        mock_agent = AsyncMock()
        mock_agent.run = AsyncMock(return_value=mock_result)

        handler = JudgeHandler(agent=mock_agent)
        result = await handler.assess("q", [])
        
        assert result.sufficient is True
```

---

## 6. Implementation Checklist

- [ ] Update `src/utils/models.py` with Judge models
- [ ] Create `src/prompts/judge.py`
- [ ] Implement `src/agent_factory/judges.py`
- [ ] Write tests in `tests/unit/agent_factory/test_judges.py`
- [ ] Run `uv run pytest tests/unit/agent_factory/`