# Clinical Trial Phase-Aware Workflow & Scoring

> **Inspired by:** KubeSRE's triage → investigate → fix → verify phase-order bonus (+0.2) and skip penalty. Bio Experiment's prerequisite rules as hard constraints. KubeSRE's judge persona scaling (junior/senior/principal) with curriculum tier.

## Clinical Trial Workflow Phases

A clinical trial follows a strict sequential workflow. The agent should learn this ordering through reward signal, not hard-coding. Phase-order bonuses and skip penalties teach the agent to follow the correct diagnostic sequence — the same pattern that won 1st place (KubeSRE's triage → investigate → fix → verify).

Our workflow maps KubeSRE's 5-phase SRE pattern to clinical trial design:

| KubeSRE Phase | Our Phase | Why it maps |
|---|---|---|
| Triage | literature_review + hypothesis | Understand the problem before acting |
| Investigation | phase_i_design + phase_i_analysis | Gather data to inform fix |
| Mitigation | phase_ii_design + regulatory | Implement the solution design |
| Fix | enrollment + monitoring | Execute and monitor |
| Verification | analysis + conclusion | Verify the outcome |

### Phase Definitions

| # | Phase | Description | Typical Actions |
|---|---|---|---|
| 0 | `literature_review` | Understand the disease, existing treatments, and target population | Review scenario description, note constraints |
| 1 | `hypothesis` | Form a hypothesis about the drug's mechanism and target population | Estimate expected effect size, identify potential responders |
| 2 | `phase_i_design` | Design Phase I safety/dose-finding study | run_dose_escalation, observe_safety_signal |
| 3 | `phase_i_analysis` | Analyze Phase I results | estimate_effect_size, identify MTD |
| 4 | `phase_ii_design` | Design Phase II efficacy trial based on Phase I findings | set_primary_endpoint, set_sample_size, set_inclusion_criteria, set_exclusion_criteria, set_dosing_schedule, set_control_arm, set_randomization_ratio, set_blinding |
| 5 | `regulatory` | Submit protocol for FDA review | submit_to_fda_review, request_protocol_amendment |
| 6 | `enrollment` | Enroll patients, begin trial execution | (implicit — happens after FDA approval) |
| 7 | `monitoring` | Run interim analyses, monitor safety | run_interim_analysis, modify_sample_size, add_biomarker_stratification |
| 8 | `analysis` | Run final statistical analysis | run_primary_analysis |
| 9 | `conclusion` | Synthesize results and conclusions | synthesize_conclusion |

### Phase Detection Heuristic

```python
def _detect_phase(action: TrialAction, history: list) -> str:
    """Classify action into clinical workflow phase."""
    action_type = action.action_type

    # Phase I
    if action_type in ("run_dose_escalation", "observe_safety_signal"):
        return "phase_i_design"
    if action_type == "estimate_effect_size":
        return "phase_i_analysis"

    # Phase II design
    if action_type in ("set_primary_endpoint", "set_sample_size",
                        "set_inclusion_criteria", "set_exclusion_criteria",
                        "set_dosing_schedule", "set_control_arm",
                        "set_randomization_ratio", "set_blinding"):
        return "phase_ii_design"

    # Regulatory
    if action_type in ("submit_to_fda_review", "request_protocol_amendment"):
        return "regulatory"

    # Monitoring
    if action_type in ("run_interim_analysis", "modify_sample_size",
                        "add_biomarker_stratification"):
        return "monitoring"

    # Analysis
    if action_type == "run_primary_analysis":
        return "analysis"

    # Conclusion
    if action_type == "synthesize_conclusion":
        return "conclusion"

    return "literature_review"
```

### Phase Order Map

```python
PHASE_ORDER = {
    "literature_review": 0,
    "hypothesis": 1,
    "phase_i_design": 2,
    "phase_i_analysis": 3,
    "phase_ii_design": 4,
    "regulatory": 5,
    "enrollment": 6,
    "monitoring": 7,
    "analysis": 8,
    "conclusion": 9,
}
```

## Phase-Order Scoring Rules

### Bonus: Correct Phase Ordering (+0.2)

If the current action's phase is equal to or one step ahead of the highest phase seen so far, the agent gets +0.2 bonus. This rewards following the natural clinical trial workflow.

```python
def _is_phase_order_correct(current_phase: str, history: list) -> bool:
    current_order = PHASE_ORDER[current_phase]
    if not history:
        return current_order <= 2  # literature_review, hypothesis, or phase_i_design are valid starts
    past_phases = [_detect_phase(h["action"], history[:i]) for i, h in enumerate(history)]
    max_past_order = max(PHASE_ORDER[p] for p in past_phases)
    return current_order <= max_past_order + 1
```

### Penalty: Skipping Phases (-0.3)

If the agent jumps ahead by 2+ phases (e.g., going from Phase I directly to analysis without Phase II design), it gets -0.3 penalty per skipped phase. This penalizes shortcutting the clinical workflow.

```python
def _get_skipped_phases(current_phase: str, history: list) -> list[str]:
    current_order = PHASE_ORDER[current_phase]
    if current_order <= 2:
        return []
    past_phases = {_detect_phase(h["action"], history[:i]) for i, h in enumerate(history)}
    past_phases.add(current_phase)
    skipped = []
    for phase, order in PHASE_ORDER.items():
        if order < current_order and phase not in past_phases:
            skipped.append(phase)
    return skipped
```

### Examples

**Good workflow** (total phase bonus: +2.0):
```
Step 1: run_dose_escalation       → phase_i_design  (correct start, +0.2)
Step 2: observe_safety_signal     → phase_i_design  (same phase, +0.2)
Step 3: estimate_effect_size      → phase_i_analysis (+0.2)
Step 4: set_primary_endpoint      → phase_ii_design (+0.2)
Step 5: set_sample_size           → phase_ii_design (same phase, +0.2)
Step 6: set_inclusion_criteria    → phase_ii_design (same phase, +0.2)
Step 7: submit_to_fda_review      → regulatory      (+0.2)
Step 8: run_interim_analysis      → monitoring      (+0.2)
Step 9: run_primary_analysis      → analysis        (+0.2)
Step 10: synthesize_conclusion    → conclusion      (+0.2)
```

**Bad workflow** (total phase penalty: -0.9):
```
Step 1: set_sample_size           → phase_ii_design (skipped phase_i_design, phase_i_analysis → -0.6)
Step 2: run_primary_analysis      → analysis        (skipped regulatory, monitoring → -0.6)
Step 3: synthesize_conclusion     → conclusion      (correct ordering from analysis, +0.2)
Net: -0.6 + -0.6 + 0.2 = -1.0
```

## Prerequisite Hard Constraints

In addition to soft phase-order scoring, some actions have hard prerequisites enforced by the rule engine. These are not reward signals — they block the action entirely and return an error:

| Action | Prerequisite | Rationale |
|---|---|---|
| `estimate_effect_size` | At least 1 `run_dose_escalation` completed | Can't estimate without Phase I data |
| `set_sample_size` | `estimate_effect_size` completed | Need effect size to calculate power |
| `submit_to_fda_review` | `set_primary_endpoint` + `set_sample_size` completed | FDA requires these minimum protocol elements |
| `run_interim_analysis` | `submit_to_fda_review` passed | Can't do interim before FDA approval |
| `run_primary_analysis` | `submit_to_fda_review` passed | Can't analyze before approved protocol |
| `synthesize_conclusion` | `run_primary_analysis` completed | Can't conclude without analysis |
| `modify_sample_size` | `run_interim_analysis` completed | Adaptation requires interim look |
| `add_biomarker_stratification` | `estimate_effect_size` completed | Need data to stratify on |

## Integration with Reward

The phase-order bonus/penalty is one component of the per-step reward:

```
r_step = (
    w_validity  × r_validity     +       # FDA rule compliance
    w_ordering  × r_ordering     +       # Phase-order bonus/penalty (THIS DOC)
    w_info_gain × r_info_gain    +       # Information gained
    w_efficiency × r_efficiency  +       # Budget/time efficiency
    w_novelty   × r_novelty      +       # Action diversity bonus
    w_penalty   × r_penalty      +       # Soft constraint violations
    γ × (φ(s') − φ(s))                  # Potential-based shaping
)
```

Where `r_ordering` = +0.2 if correct order, -0.3 × len(skipped_phases) if phases were skipped.

## Judge Persona Scaling by Difficulty

> **Inspired by:** KubeSRE's judge persona system: junior (lenient, gives hints) at low difficulty, senior (standard) at medium, principal (strict, penalizes inefficiency) at expert.

The phase-order evaluation strictness scales with curriculum tier:

| Tier | Judge Persona | Phase-Order Behavior |
|---|---|---|
| Warmup (0.0–0.25) | **Junior** | Lenient: allows 1 phase skip without penalty. Includes hint in observation (e.g., "Consider running dose escalation before setting sample size"). Phase bonus still +0.2. |
| Beginner (0.25–0.40) | **Junior→Senior** | Standard: phase skip penalty -0.3/skip. No hints. Phase bonus +0.2. |
| Intermediate (0.40–0.60) | **Senior** | Strict: phase skip penalty -0.3/skip. Phase bonus reduced to +0.15 (expects correct ordering as baseline). |
| Advanced (0.60–0.80) | **Senior→Principal** | Very strict: phase skip penalty -0.5/skip. Phase bonus +0.1. Penalizes redundant actions within a phase (-0.1 for repeating already-completed actions). |
| Expert (0.80–0.95) | **Principal** | Harshest: phase skip penalty -0.5/skip. Phase bonus +0.05 (near-zero, expects perfection). Redundancy penalty -0.15. Efficiency penalty for taking >N steps in any phase. |

This mirrors KubeSRE's approach: the junior judge at warmup is lenient and gives hints, while the principal judge at expert actively punishes inefficiency.

## Protocol Amendment & Recovery

> **Inspired by:** KubeSRE's mitigation phase — agents can partially fix before full resolution. Bio Experiment's `design_followup` meta-action.

The `request_protocol_amendment` action allows recovery from earlier mistakes:
- If FDA review fails, agent can amend the protocol and resubmit
- Amendment costs time and budget (realistic consequence)
- Successful recovery after failure gets a +0.3 **recovery bonus** (like KubeSRE's mitigation credit)
- Maximum 2 amendments per episode (prevents infinite retry loops)
- This teaches the agent that mistakes are recoverable but costly — a key real-world lesson