Spaces:

Roopalgn
/

openenv-clinical-trial

Sleeping

App Files Files Community

openenv-clinical-trial / docs /phase_workflow.md

Roopalgn

Add spec/contract files to main for Suyash: ARCHITECTURE, reward spec, scenarios, milestones, phase workflow

863b4cb about 2 months ago

preview code

raw

history blame

10.4 kB

Clinical Trial Phase-Aware Workflow & Scoring

Inspired by: KubeSRE's triage → investigate → fix → verify phase-order bonus (+0.2) and skip penalty. Bio Experiment's prerequisite rules as hard constraints. KubeSRE's judge persona scaling (junior/senior/principal) with curriculum tier.

Clinical Trial Workflow Phases

A clinical trial follows a strict sequential workflow. The agent should learn this ordering through reward signal, not hard-coding. Phase-order bonuses and skip penalties teach the agent to follow the correct diagnostic sequence — the same pattern that won 1st place (KubeSRE's triage → investigate → fix → verify).

Our workflow maps KubeSRE's 5-phase SRE pattern to clinical trial design:

KubeSRE Phase	Our Phase	Why it maps
Triage	literature_review + hypothesis	Understand the problem before acting
Investigation	phase_i_design + phase_i_analysis	Gather data to inform fix
Mitigation	phase_ii_design + regulatory	Implement the solution design
Fix	enrollment + monitoring	Execute and monitor
Verification	analysis + conclusion	Verify the outcome

Phase Definitions

#	Phase	Description	Typical Actions
0	`literature_review`	Understand the disease, existing treatments, and target population	Review scenario description, note constraints
1	`hypothesis`	Form a hypothesis about the drug's mechanism and target population	Estimate expected effect size, identify potential responders
2	`phase_i_design`	Design Phase I safety/dose-finding study	run_dose_escalation, observe_safety_signal
3	`phase_i_analysis`	Analyze Phase I results	estimate_effect_size, identify MTD
4	`phase_ii_design`	Design Phase II efficacy trial based on Phase I findings	set_primary_endpoint, set_sample_size, set_inclusion_criteria, set_exclusion_criteria, set_dosing_schedule, set_control_arm, set_randomization_ratio, set_blinding
5	`regulatory`	Submit protocol for FDA review	submit_to_fda_review, request_protocol_amendment
6	`enrollment`	Enroll patients, begin trial execution	(implicit — happens after FDA approval)
7	`monitoring`	Run interim analyses, monitor safety	run_interim_analysis, modify_sample_size, add_biomarker_stratification
8	`analysis`	Run final statistical analysis	run_primary_analysis
9	`conclusion`	Synthesize results and conclusions	synthesize_conclusion

Phase Detection Heuristic

def _detect_phase(action: TrialAction, history: list) -> str:
    """Classify action into clinical workflow phase."""
    action_type = action.action_type

    # Phase I
    if action_type in ("run_dose_escalation", "observe_safety_signal"):
        return "phase_i_design"
    if action_type == "estimate_effect_size":
        return "phase_i_analysis"

    # Phase II design
    if action_type in ("set_primary_endpoint", "set_sample_size",
                        "set_inclusion_criteria", "set_exclusion_criteria",
                        "set_dosing_schedule", "set_control_arm",
                        "set_randomization_ratio", "set_blinding"):
        return "phase_ii_design"

    # Regulatory
    if action_type in ("submit_to_fda_review", "request_protocol_amendment"):
        return "regulatory"

    # Monitoring
    if action_type in ("run_interim_analysis", "modify_sample_size",
                        "add_biomarker_stratification"):
        return "monitoring"

    # Analysis
    if action_type == "run_primary_analysis":
        return "analysis"

    # Conclusion
    if action_type == "synthesize_conclusion":
        return "conclusion"

    return "literature_review"

Phase Order Map

PHASE_ORDER = {
    "literature_review": 0,
    "hypothesis": 1,
    "phase_i_design": 2,
    "phase_i_analysis": 3,
    "phase_ii_design": 4,
    "regulatory": 5,
    "enrollment": 6,
    "monitoring": 7,
    "analysis": 8,
    "conclusion": 9,
}

Phase-Order Scoring Rules

Bonus: Correct Phase Ordering (+0.2)

If the current action's phase is equal to or one step ahead of the highest phase seen so far, the agent gets +0.2 bonus. This rewards following the natural clinical trial workflow.

def _is_phase_order_correct(current_phase: str, history: list) -> bool:
    current_order = PHASE_ORDER[current_phase]
    if not history:
        return current_order <= 2  # literature_review, hypothesis, or phase_i_design are valid starts
    past_phases = [_detect_phase(h["action"], history[:i]) for i, h in enumerate(history)]
    max_past_order = max(PHASE_ORDER[p] for p in past_phases)
    return current_order <= max_past_order + 1

Penalty: Skipping Phases (-0.3)

If the agent jumps ahead by 2+ phases (e.g., going from Phase I directly to analysis without Phase II design), it gets -0.3 penalty per skipped phase. This penalizes shortcutting the clinical workflow.

def _get_skipped_phases(current_phase: str, history: list) -> list[str]:
    current_order = PHASE_ORDER[current_phase]
    if current_order <= 2:
        return []
    past_phases = {_detect_phase(h["action"], history[:i]) for i, h in enumerate(history)}
    past_phases.add(current_phase)
    skipped = []
    for phase, order in PHASE_ORDER.items():
        if order < current_order and phase not in past_phases:
            skipped.append(phase)
    return skipped

Examples

Good workflow (total phase bonus: +2.0):

Step 1: run_dose_escalation       → phase_i_design  (correct start, +0.2)
Step 2: observe_safety_signal     → phase_i_design  (same phase, +0.2)
Step 3: estimate_effect_size      → phase_i_analysis (+0.2)
Step 4: set_primary_endpoint      → phase_ii_design (+0.2)
Step 5: set_sample_size           → phase_ii_design (same phase, +0.2)
Step 6: set_inclusion_criteria    → phase_ii_design (same phase, +0.2)
Step 7: submit_to_fda_review      → regulatory      (+0.2)
Step 8: run_interim_analysis      → monitoring      (+0.2)
Step 9: run_primary_analysis      → analysis        (+0.2)
Step 10: synthesize_conclusion    → conclusion      (+0.2)

Bad workflow (total phase penalty: -0.9):

Step 1: set_sample_size           → phase_ii_design (skipped phase_i_design, phase_i_analysis → -0.6)
Step 2: run_primary_analysis      → analysis        (skipped regulatory, monitoring → -0.6)
Step 3: synthesize_conclusion     → conclusion      (correct ordering from analysis, +0.2)
Net: -0.6 + -0.6 + 0.2 = -1.0

Prerequisite Hard Constraints

In addition to soft phase-order scoring, some actions have hard prerequisites enforced by the rule engine. These are not reward signals — they block the action entirely and return an error:

Action	Prerequisite	Rationale
`estimate_effect_size`	At least 1 `run_dose_escalation` completed	Can't estimate without Phase I data
`set_sample_size`	`estimate_effect_size` completed	Need effect size to calculate power
`submit_to_fda_review`	`set_primary_endpoint` + `set_sample_size` completed	FDA requires these minimum protocol elements
`run_interim_analysis`	`submit_to_fda_review` passed	Can't do interim before FDA approval
`run_primary_analysis`	`submit_to_fda_review` passed	Can't analyze before approved protocol
`synthesize_conclusion`	`run_primary_analysis` completed	Can't conclude without analysis
`modify_sample_size`	`run_interim_analysis` completed	Adaptation requires interim look
`add_biomarker_stratification`	`estimate_effect_size` completed	Need data to stratify on

Integration with Reward

The phase-order bonus/penalty is one component of the per-step reward:

r_step = (
    w_validity  × r_validity     +       # FDA rule compliance
    w_ordering  × r_ordering     +       # Phase-order bonus/penalty (THIS DOC)
    w_info_gain × r_info_gain    +       # Information gained
    w_efficiency × r_efficiency  +       # Budget/time efficiency
    w_novelty   × r_novelty      +       # Action diversity bonus
    w_penalty   × r_penalty      +       # Soft constraint violations
    γ × (φ(s') − φ(s))                  # Potential-based shaping
)

Where r_ordering = +0.2 if correct order, -0.3 × len(skipped_phases) if phases were skipped.

Judge Persona Scaling by Difficulty

Inspired by: KubeSRE's judge persona system: junior (lenient, gives hints) at low difficulty, senior (standard) at medium, principal (strict, penalizes inefficiency) at expert.

The phase-order evaluation strictness scales with curriculum tier:

Tier	Judge Persona	Phase-Order Behavior
Warmup (0.0–0.25)	Junior	Lenient: allows 1 phase skip without penalty. Includes hint in observation (e.g., "Consider running dose escalation before setting sample size"). Phase bonus still +0.2.
Beginner (0.25–0.40)	Junior→Senior	Standard: phase skip penalty -0.3/skip. No hints. Phase bonus +0.2.
Intermediate (0.40–0.60)	Senior	Strict: phase skip penalty -0.3/skip. Phase bonus reduced to +0.15 (expects correct ordering as baseline).
Advanced (0.60–0.80)	Senior→Principal	Very strict: phase skip penalty -0.5/skip. Phase bonus +0.1. Penalizes redundant actions within a phase (-0.1 for repeating already-completed actions).
Expert (0.80–0.95)	Principal	Harshest: phase skip penalty -0.5/skip. Phase bonus +0.05 (near-zero, expects perfection). Redundancy penalty -0.15. Efficiency penalty for taking >N steps in any phase.

This mirrors KubeSRE's approach: the junior judge at warmup is lenient and gives hints, while the principal judge at expert actively punishes inefficiency.

Protocol Amendment & Recovery

Inspired by: KubeSRE's mitigation phase — agents can partially fix before full resolution. Bio Experiment's design_followup meta-action.

The request_protocol_amendment action allows recovery from earlier mistakes:

If FDA review fails, agent can amend the protocol and resubmit
Amendment costs time and budget (realistic consequence)
Successful recovery after failure gets a +0.3 recovery bonus (like KubeSRE's mitigation credit)
Maximum 2 amendments per episode (prevents infinite retry loops)
This teaches the agent that mistakes are recoverable but costly — a key real-world lesson