replicalab / docs /fnd08_frozen_json_contract.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84

FND 08 Frozen JSON Contract

Status: completed on 2026-03-08 Owners: Person A and Person B Drafted by: Person B (Ayush) Remaining acceptance item: none

Source schema file: replicalab/models.py

Purpose

This document freezes the JSON contract for the shared ReplicaLab data models so downstream work can proceed without schema drift. It is the reference for:

  • Person A validators and environment state handling
  • Person B prompt formatting and action parsing
  • Person C API payload examples
  • Person D frontend and replay mocks

Tool-Capability Addendum

The richer-capability MVP adds bounded search, code-check, and image-inspection support below this frozen contract.

This addendum does not reopen the outward action schema from FND 08. The final outward actions remain ScientistAction and LabManagerAction. Bounded tool use will be represented through scenario or evidence metadata, environment-side tool traces, and StepResult.info or replay payloads rather than new outward action types for the MVP.

Global conventions

  • All JSON keys use snake_case.
  • Enum-like values use lowercase snake_case strings.
  • All top-level keys listed in this document must be present unless explicitly marked nullable.
  • Use null for an absent single object.
  • Use [] for a known empty collection.
  • Use {} only for flexible metadata objects such as info and reward_breakdown.
  • round_number is zero-based. 0 is the state immediately after reset().
  • duration_days and time_limit_days are whole calendar days.
  • difficulty values are easy, medium, or hard.
  • Component scores such as rigor, feasibility, and fidelity are floats in the inclusive range 0.0 to 1.0.

Shared nested objects

ConversationEntry

Each item in conversation_history or transcript must use this shape:

Field Type Required Notes
role str yes One of scientist, lab_manager, system
message str yes Human-readable turn text
round_number int yes Zero-based round index for the message
action_type str | null yes Mirrors the action type when the message comes from an agent, otherwise null

Protocol

When current_protocol is not null, it must use this shape:

Field Type Required Notes
sample_size int yes Non-negative integer
controls list[str] yes Empty list when no controls are specified yet
technique str yes Proposed experimental technique
duration_days int yes Whole calendar days
required_equipment list[str] yes Empty list when none is needed
required_reagents list[str] yes Empty list when none is needed
rationale str yes Short explanation for the protocol

RewardBreakdown

When reward_breakdown is present, it must use this shape:

Field Type Required Notes
rigor float yes Component score in 0.0 to 1.0
feasibility float yes Component score in 0.0 to 1.0
fidelity float yes Component score in 0.0 to 1.0
efficiency_bonus float yes Bonus term, 0.0 if unused
communication_bonus float yes Bonus term, 0.0 if unused
penalties dict[str, float] yes Per-penalty values keyed by penalty name

Model contracts

ScientistAction

Action types:

  • propose_protocol
  • revise_protocol
  • request_info
  • accept

Field contract:

Field Type Required Notes
action_type str yes Must be one of the values above
sample_size int yes Meaningful for propose_protocol and revise_protocol, otherwise 0
controls list[str] yes Meaningful for propose_protocol and revise_protocol, otherwise []
technique str yes Meaningful for propose_protocol and revise_protocol, otherwise ""
duration_days int yes Meaningful for propose_protocol and revise_protocol, otherwise 0
required_equipment list[str] yes Meaningful for propose_protocol and revise_protocol, otherwise []
required_reagents list[str] yes Meaningful for propose_protocol and revise_protocol, otherwise []
questions list[str] yes Meaningful for request_info, otherwise []
rationale str yes Required free-text explanation for protocol proposals and revisions; "" for accept

Canonical example:

{
  "action_type": "propose_protocol",
  "sample_size": 48,
  "controls": ["vehicle_control", "positive_control"],
  "technique": "wst1_assay",
  "duration_days": 5,
  "required_equipment": ["plate_reader", "co2_incubator"],
  "required_reagents": ["wst1", "dmso", "drug_x"],
  "questions": [],
  "rationale": "Keeps the core readout while using equipment commonly available in teaching labs."
}

LabManagerAction

Action types:

  • report_feasibility
  • suggest_alternative
  • reject
  • accept

Field contract:

Field Type Required Notes
action_type str yes Must be one of the values above
feasible bool yes Overall summary flag equal to the logical AND of the constraint dimension flags
budget_ok bool yes Whether the proposed protocol fits remaining budget
equipment_ok bool yes Whether required equipment is available in time
reagents_ok bool yes Whether required reagents are available
schedule_ok bool yes Whether the protocol fits the allowed timeline
staff_ok bool yes Whether staffing is sufficient
suggested_technique str yes Meaningful for suggest_alternative, otherwise ""
suggested_sample_size int yes Meaningful for suggest_alternative, otherwise 0
suggested_controls list[str] yes Meaningful for suggest_alternative, otherwise []
explanation str yes Human-readable explanation of the constraint outcome

Conditional rules:

  • action_type = accept implies feasible = true and all constraint flags are true.
  • action_type = reject implies feasible = false and at least one constraint flag is false.
  • action_type = suggest_alternative implies feasible = false and at least one of the suggestion fields carries a non-default value.

Canonical example:

{
  "action_type": "suggest_alternative",
  "feasible": false,
  "budget_ok": true,
  "equipment_ok": false,
  "reagents_ok": true,
  "schedule_ok": true,
  "staff_ok": true,
  "suggested_technique": "manual_cell_counting",
  "suggested_sample_size": 32,
  "suggested_controls": ["vehicle_control", "positive_control"],
  "explanation": "The plate reader is fully booked, so use manual counting and reduce the sample size to stay within the timeline."
}

ScientistObservation

Field Type Required Notes
paper_title str yes Study title
paper_hypothesis str yes Core hypothesis being replicated
paper_method str yes Short method summary
paper_key_finding str yes Main finding being targeted
experiment_goal str yes What the scientist is trying to preserve
conversation_history list[ConversationEntry] yes Empty list at reset
current_protocol Protocol | null yes null until a protocol exists
round_number int yes Zero-based current round
max_rounds int yes Max allowed rounds in the episode

Canonical example:

{
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}

LabManagerObservation

Field Type Required Notes
budget_total float yes Initial budget for the episode
budget_remaining float yes Current remaining budget
equipment_available list[str] yes Equipment that can be used
equipment_booked list[str] yes Equipment unavailable due to booking
reagents_in_stock list[str] yes Available reagents
reagents_out_of_stock list[str] yes Required but unavailable reagents
staff_count int yes Available staff count
time_limit_days int yes Whole calendar days remaining
safety_restrictions list[str] yes Constraints such as banned solvents or assay restrictions
conversation_history list[ConversationEntry] yes Empty list at reset
current_protocol Protocol | null yes null until a protocol exists
round_number int yes Zero-based current round
max_rounds int yes Max allowed rounds in the episode

Canonical example:

{
  "budget_total": 1200.0,
  "budget_remaining": 1200.0,
  "equipment_available": ["co2_incubator", "microscope"],
  "equipment_booked": ["plate_reader"],
  "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
  "reagents_out_of_stock": ["wst1"],
  "staff_count": 2,
  "time_limit_days": 7,
  "safety_restrictions": ["no_radioactive_reagents"],
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}

Observation

Wrapper behavior:

  • Serialized Observation objects always include both top-level keys: scientist and lab_manager.
  • In shared environment state, replay, and API payloads, both branches should normally be populated.
  • When a consumer is intentionally given only one role view, the non-owned branch must be null, not omitted.
Field Type Required Notes
scientist ScientistObservation | null yes Scientist-side view
lab_manager LabManagerObservation | null yes Lab-manager-side view

Canonical example:

{
  "scientist": {
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  },
  "lab_manager": {
    "budget_total": 1200.0,
    "budget_remaining": 1200.0,
    "equipment_available": ["co2_incubator", "microscope"],
    "equipment_booked": ["plate_reader"],
    "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
    "reagents_out_of_stock": ["wst1"],
    "staff_count": 2,
    "time_limit_days": 7,
    "safety_restrictions": ["no_radioactive_reagents"],
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  }
}

StepResult

Field Type Required Notes
observation Observation | null yes Present on normal steps and terminal steps; may be null only on hard failure
reward float yes Episode reward after the step; terminal reward on final step
done bool yes Whether the episode is terminal
info dict yes Flexible metadata object

Reserved info keys:

  • agreement_reached: bool
  • error: str | null
  • reward_breakdown: RewardBreakdown | null
  • judge_notes: str | null
  • verdict: str | null

Canonical example:

{
  "observation": {
    "scientist": null,
    "lab_manager": null
  },
  "reward": 6.72,
  "done": true,
  "info": {
    "agreement_reached": true,
    "error": null,
    "reward_breakdown": {
      "rigor": 0.9,
      "feasibility": 0.8,
      "fidelity": 0.85,
      "efficiency_bonus": 0.25,
      "communication_bonus": 0.15,
      "penalties": {
        "invalid_action": 0.0,
        "timeout": 0.0
      }
    },
    "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
    "verdict": "accept"
  }
}

EpisodeState

Field Type Required Notes
seed int yes Deterministic episode seed
scenario_template str yes Scenario family identifier
difficulty str yes easy, medium, or hard
paper_title str yes Study title
paper_hypothesis str yes Core hypothesis
paper_method str yes Method summary
paper_key_finding str yes Main finding
experiment_goal str yes Goal preserved through negotiation
lab_budget_total float yes Initial budget
lab_budget_remaining float yes Remaining budget
lab_equipment list[str] yes Equipment state
lab_reagents list[str] yes Reagent state
lab_staff_count int yes Available staff count
lab_time_limit_days int yes Whole calendar days remaining
current_protocol Protocol | null yes Current agreed or latest proposed protocol
conversation_history list[ConversationEntry] yes Negotiation history
round_number int yes Zero-based round counter
max_rounds int yes Maximum rounds allowed
done bool yes Terminal flag
agreement_reached bool yes Whether both sides reached agreement
reward float yes Final total reward or 0.0 until terminal scoring
rigor_score float yes Final component score or 0.0 until terminal scoring
feasibility_score float yes Final component score or 0.0 until terminal scoring
fidelity_score float yes Final component score or 0.0 until terminal scoring

Canonical example:

{
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "lab_budget_total": 1200.0,
  "lab_budget_remaining": 850.0,
  "lab_equipment": ["co2_incubator", "microscope"],
  "lab_reagents": ["dmso", "drug_x", "culture_media"],
  "lab_staff_count": 2,
  "lab_time_limit_days": 7,
  "current_protocol": {
    "sample_size": 32,
    "controls": ["vehicle_control", "positive_control"],
    "technique": "manual_cell_counting",
    "duration_days": 5,
    "required_equipment": ["microscope", "co2_incubator"],
    "required_reagents": ["dmso", "drug_x", "culture_media"],
    "rationale": "Uses available equipment while preserving control structure."
  },
  "conversation_history": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    }
  ],
  "round_number": 1,
  "max_rounds": 6,
  "done": false,
  "agreement_reached": false,
  "reward": 0.0,
  "rigor_score": 0.0,
  "feasibility_score": 0.0,
  "fidelity_score": 0.0
}

EpisodeLog

Field Type Required Notes
episode_id str yes Stable replay identifier
seed int yes Episode seed
scenario_template str yes Scenario family identifier
difficulty str yes easy, medium, or hard
final_state EpisodeState | null yes Must be populated for completed episodes
transcript list[ConversationEntry] yes Replayable transcript
reward_breakdown RewardBreakdown yes Final reward components
total_reward float yes Final total reward
rounds_used int yes Number of completed rounds
agreement_reached bool yes Final agreement flag
judge_notes str yes Human-readable audit summary
verdict str yes One of accept, revise, reject

Canonical example:

{
  "episode_id": "cell_biology-17-medium-0001",
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "final_state": {
    "seed": 17,
    "scenario_template": "cell_biology",
    "difficulty": "medium",
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "lab_budget_total": 1200.0,
    "lab_budget_remaining": 850.0,
    "lab_equipment": ["co2_incubator", "microscope"],
    "lab_reagents": ["dmso", "drug_x", "culture_media"],
    "lab_staff_count": 2,
    "lab_time_limit_days": 7,
    "current_protocol": {
      "sample_size": 32,
      "controls": ["vehicle_control", "positive_control"],
      "technique": "manual_cell_counting",
      "duration_days": 5,
      "required_equipment": ["microscope", "co2_incubator"],
      "required_reagents": ["dmso", "drug_x", "culture_media"],
      "rationale": "Uses available equipment while preserving control structure."
    },
    "conversation_history": [
      {
        "role": "scientist",
        "message": "I propose a manual counting protocol that keeps both controls.",
        "round_number": 0,
        "action_type": "propose_protocol"
      },
      {
        "role": "lab_manager",
        "message": "This alternative is feasible with current equipment and budget.",
        "round_number": 0,
        "action_type": "accept"
      }
    ],
    "round_number": 1,
    "max_rounds": 6,
    "done": true,
    "agreement_reached": true,
    "reward": 6.72,
    "rigor_score": 0.9,
    "feasibility_score": 0.8,
    "fidelity_score": 0.85
  },
  "transcript": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    },
    {
      "role": "lab_manager",
      "message": "This alternative is feasible with current equipment and budget.",
      "round_number": 0,
      "action_type": "accept"
    }
  ],
  "reward_breakdown": {
    "rigor": 0.9,
    "feasibility": 0.8,
    "fidelity": 0.85,
    "efficiency_bonus": 0.25,
    "communication_bonus": 0.15,
    "penalties": {
      "invalid_action": 0.0,
      "timeout": 0.0
    }
  },
  "total_reward": 6.72,
  "rounds_used": 1,
  "agreement_reached": true,
  "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
  "verdict": "accept"
}

Sign-off

Owner Status Notes
Person B (Ayush) signed off Draft matches current stubs and downstream parser needs
Kian (Person A) signed off Validator and environment-owner review completed; contract is frozen for MOD 01, MOD 03, FND 09, and downstream parser work