Spaces:

openenv-community
/

replicalab

Running

App Files Files Community

replicalab / docs /fnd08_frozen_json_contract.md

maxxie114

Initial HF Spaces deployment

80d8c84 2 days ago

preview code

raw

history blame contribute delete

19.7 kB

FND 08 Frozen JSON Contract

Status: completed on 2026-03-08 Owners: Person A and Person B Drafted by: Person B (Ayush) Remaining acceptance item: none

Source schema file: replicalab/models.py

Purpose

This document freezes the JSON contract for the shared ReplicaLab data models so downstream work can proceed without schema drift. It is the reference for:

Person A validators and environment state handling
Person B prompt formatting and action parsing
Person C API payload examples
Person D frontend and replay mocks

Tool-Capability Addendum

The richer-capability MVP adds bounded search, code-check, and image-inspection support below this frozen contract.

This addendum does not reopen the outward action schema from FND 08. The final outward actions remain ScientistAction and LabManagerAction. Bounded tool use will be represented through scenario or evidence metadata, environment-side tool traces, and StepResult.info or replay payloads rather than new outward action types for the MVP.

Global conventions

All JSON keys use snake_case.
Enum-like values use lowercase snake_case strings.
All top-level keys listed in this document must be present unless explicitly marked nullable.
Use null for an absent single object.
Use [] for a known empty collection.
Use {} only for flexible metadata objects such as info and reward_breakdown.
round_number is zero-based. 0 is the state immediately after reset().
duration_days and time_limit_days are whole calendar days.
difficulty values are easy, medium, or hard.
Component scores such as rigor, feasibility, and fidelity are floats in the inclusive range 0.0 to 1.0.

Shared nested objects

ConversationEntry

Each item in conversation_history or transcript must use this shape:

Field	Type	Required	Notes
`role`	`str`	yes	One of `scientist`, `lab_manager`, `system`
`message`	`str`	yes	Human-readable turn text
`round_number`	`int`	yes	Zero-based round index for the message
`action_type`	`str \| null`	yes	Mirrors the action type when the message comes from an agent, otherwise `null`

Protocol

When current_protocol is not null, it must use this shape:

Field	Type	Required	Notes
`sample_size`	`int`	yes	Non-negative integer
`controls`	`list[str]`	yes	Empty list when no controls are specified yet
`technique`	`str`	yes	Proposed experimental technique
`duration_days`	`int`	yes	Whole calendar days
`required_equipment`	`list[str]`	yes	Empty list when none is needed
`required_reagents`	`list[str]`	yes	Empty list when none is needed
`rationale`	`str`	yes	Short explanation for the protocol

RewardBreakdown

When reward_breakdown is present, it must use this shape:

Field	Type	Required	Notes
`rigor`	`float`	yes	Component score in `0.0` to `1.0`
`feasibility`	`float`	yes	Component score in `0.0` to `1.0`
`fidelity`	`float`	yes	Component score in `0.0` to `1.0`
`efficiency_bonus`	`float`	yes	Bonus term, `0.0` if unused
`communication_bonus`	`float`	yes	Bonus term, `0.0` if unused
`penalties`	`dict[str, float]`	yes	Per-penalty values keyed by penalty name

Model contracts

ScientistAction

Action types:

propose_protocol
revise_protocol
request_info
accept

Field contract:

Field	Type	Required	Notes
`action_type`	`str`	yes	Must be one of the values above
`sample_size`	`int`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0`
`controls`	`list[str]`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]`
`technique`	`str`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `""`
`duration_days`	`int`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0`
`required_equipment`	`list[str]`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]`
`required_reagents`	`list[str]`	yes	Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]`
`questions`	`list[str]`	yes	Meaningful for `request_info`, otherwise `[]`
`rationale`	`str`	yes	Required free-text explanation for protocol proposals and revisions; `""` for `accept`

Canonical example:

{
  "action_type": "propose_protocol",
  "sample_size": 48,
  "controls": ["vehicle_control", "positive_control"],
  "technique": "wst1_assay",
  "duration_days": 5,
  "required_equipment": ["plate_reader", "co2_incubator"],
  "required_reagents": ["wst1", "dmso", "drug_x"],
  "questions": [],
  "rationale": "Keeps the core readout while using equipment commonly available in teaching labs."
}

LabManagerAction

Action types:

report_feasibility
suggest_alternative
reject
accept

Field contract:

Field	Type	Required	Notes
`action_type`	`str`	yes	Must be one of the values above
`feasible`	`bool`	yes	Overall summary flag equal to the logical AND of the constraint dimension flags
`budget_ok`	`bool`	yes	Whether the proposed protocol fits remaining budget
`equipment_ok`	`bool`	yes	Whether required equipment is available in time
`reagents_ok`	`bool`	yes	Whether required reagents are available
`schedule_ok`	`bool`	yes	Whether the protocol fits the allowed timeline
`staff_ok`	`bool`	yes	Whether staffing is sufficient
`suggested_technique`	`str`	yes	Meaningful for `suggest_alternative`, otherwise `""`
`suggested_sample_size`	`int`	yes	Meaningful for `suggest_alternative`, otherwise `0`
`suggested_controls`	`list[str]`	yes	Meaningful for `suggest_alternative`, otherwise `[]`
`explanation`	`str`	yes	Human-readable explanation of the constraint outcome

Conditional rules:

action_type = accept implies feasible = true and all constraint flags are true.
action_type = reject implies feasible = false and at least one constraint flag is false.
action_type = suggest_alternative implies feasible = false and at least one of the suggestion fields carries a non-default value.

Canonical example:

{
  "action_type": "suggest_alternative",
  "feasible": false,
  "budget_ok": true,
  "equipment_ok": false,
  "reagents_ok": true,
  "schedule_ok": true,
  "staff_ok": true,
  "suggested_technique": "manual_cell_counting",
  "suggested_sample_size": 32,
  "suggested_controls": ["vehicle_control", "positive_control"],
  "explanation": "The plate reader is fully booked, so use manual counting and reduce the sample size to stay within the timeline."
}

ScientistObservation

Field	Type	Required	Notes
`paper_title`	`str`	yes	Study title
`paper_hypothesis`	`str`	yes	Core hypothesis being replicated
`paper_method`	`str`	yes	Short method summary
`paper_key_finding`	`str`	yes	Main finding being targeted
`experiment_goal`	`str`	yes	What the scientist is trying to preserve
`conversation_history`	`list[ConversationEntry]`	yes	Empty list at reset
`current_protocol`	`Protocol \| null`	yes	`null` until a protocol exists
`round_number`	`int`	yes	Zero-based current round
`max_rounds`	`int`	yes	Max allowed rounds in the episode

Canonical example:

{
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}

LabManagerObservation

Field	Type	Required	Notes
`budget_total`	`float`	yes	Initial budget for the episode
`budget_remaining`	`float`	yes	Current remaining budget
`equipment_available`	`list[str]`	yes	Equipment that can be used
`equipment_booked`	`list[str]`	yes	Equipment unavailable due to booking
`reagents_in_stock`	`list[str]`	yes	Available reagents
`reagents_out_of_stock`	`list[str]`	yes	Required but unavailable reagents
`staff_count`	`int`	yes	Available staff count
`time_limit_days`	`int`	yes	Whole calendar days remaining
`safety_restrictions`	`list[str]`	yes	Constraints such as banned solvents or assay restrictions
`conversation_history`	`list[ConversationEntry]`	yes	Empty list at reset
`current_protocol`	`Protocol \| null`	yes	`null` until a protocol exists
`round_number`	`int`	yes	Zero-based current round
`max_rounds`	`int`	yes	Max allowed rounds in the episode

Canonical example:

{
  "budget_total": 1200.0,
  "budget_remaining": 1200.0,
  "equipment_available": ["co2_incubator", "microscope"],
  "equipment_booked": ["plate_reader"],
  "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
  "reagents_out_of_stock": ["wst1"],
  "staff_count": 2,
  "time_limit_days": 7,
  "safety_restrictions": ["no_radioactive_reagents"],
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}

Observation

Wrapper behavior:

Serialized Observation objects always include both top-level keys: scientist and lab_manager.
In shared environment state, replay, and API payloads, both branches should normally be populated.
When a consumer is intentionally given only one role view, the non-owned branch must be null, not omitted.

Field	Type	Required	Notes
`scientist`	`ScientistObservation \| null`	yes	Scientist-side view
`lab_manager`	`LabManagerObservation \| null`	yes	Lab-manager-side view

Canonical example:

{
  "scientist": {
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  },
  "lab_manager": {
    "budget_total": 1200.0,
    "budget_remaining": 1200.0,
    "equipment_available": ["co2_incubator", "microscope"],
    "equipment_booked": ["plate_reader"],
    "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
    "reagents_out_of_stock": ["wst1"],
    "staff_count": 2,
    "time_limit_days": 7,
    "safety_restrictions": ["no_radioactive_reagents"],
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  }
}

StepResult

Field	Type	Required	Notes
`observation`	`Observation \| null`	yes	Present on normal steps and terminal steps; may be `null` only on hard failure
`reward`	`float`	yes	Episode reward after the step; terminal reward on final step
`done`	`bool`	yes	Whether the episode is terminal
`info`	`dict`	yes	Flexible metadata object

Reserved info keys:

agreement_reached: bool
error: str | null
reward_breakdown: RewardBreakdown | null
judge_notes: str | null
verdict: str | null

Canonical example:

{
  "observation": {
    "scientist": null,
    "lab_manager": null
  },
  "reward": 6.72,
  "done": true,
  "info": {
    "agreement_reached": true,
    "error": null,
    "reward_breakdown": {
      "rigor": 0.9,
      "feasibility": 0.8,
      "fidelity": 0.85,
      "efficiency_bonus": 0.25,
      "communication_bonus": 0.15,
      "penalties": {
        "invalid_action": 0.0,
        "timeout": 0.0
      }
    },
    "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
    "verdict": "accept"
  }
}

EpisodeState

Field	Type	Required	Notes
`seed`	`int`	yes	Deterministic episode seed
`scenario_template`	`str`	yes	Scenario family identifier
`difficulty`	`str`	yes	`easy`, `medium`, or `hard`
`paper_title`	`str`	yes	Study title
`paper_hypothesis`	`str`	yes	Core hypothesis
`paper_method`	`str`	yes	Method summary
`paper_key_finding`	`str`	yes	Main finding
`experiment_goal`	`str`	yes	Goal preserved through negotiation
`lab_budget_total`	`float`	yes	Initial budget
`lab_budget_remaining`	`float`	yes	Remaining budget
`lab_equipment`	`list[str]`	yes	Equipment state
`lab_reagents`	`list[str]`	yes	Reagent state
`lab_staff_count`	`int`	yes	Available staff count
`lab_time_limit_days`	`int`	yes	Whole calendar days remaining
`current_protocol`	`Protocol \| null`	yes	Current agreed or latest proposed protocol
`conversation_history`	`list[ConversationEntry]`	yes	Negotiation history
`round_number`	`int`	yes	Zero-based round counter
`max_rounds`	`int`	yes	Maximum rounds allowed
`done`	`bool`	yes	Terminal flag
`agreement_reached`	`bool`	yes	Whether both sides reached agreement
`reward`	`float`	yes	Final total reward or `0.0` until terminal scoring
`rigor_score`	`float`	yes	Final component score or `0.0` until terminal scoring
`feasibility_score`	`float`	yes	Final component score or `0.0` until terminal scoring
`fidelity_score`	`float`	yes	Final component score or `0.0` until terminal scoring

Canonical example:

{
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "lab_budget_total": 1200.0,
  "lab_budget_remaining": 850.0,
  "lab_equipment": ["co2_incubator", "microscope"],
  "lab_reagents": ["dmso", "drug_x", "culture_media"],
  "lab_staff_count": 2,
  "lab_time_limit_days": 7,
  "current_protocol": {
    "sample_size": 32,
    "controls": ["vehicle_control", "positive_control"],
    "technique": "manual_cell_counting",
    "duration_days": 5,
    "required_equipment": ["microscope", "co2_incubator"],
    "required_reagents": ["dmso", "drug_x", "culture_media"],
    "rationale": "Uses available equipment while preserving control structure."
  },
  "conversation_history": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    }
  ],
  "round_number": 1,
  "max_rounds": 6,
  "done": false,
  "agreement_reached": false,
  "reward": 0.0,
  "rigor_score": 0.0,
  "feasibility_score": 0.0,
  "fidelity_score": 0.0
}

EpisodeLog

Field	Type	Required	Notes
`episode_id`	`str`	yes	Stable replay identifier
`seed`	`int`	yes	Episode seed
`scenario_template`	`str`	yes	Scenario family identifier
`difficulty`	`str`	yes	`easy`, `medium`, or `hard`
`final_state`	`EpisodeState \| null`	yes	Must be populated for completed episodes
`transcript`	`list[ConversationEntry]`	yes	Replayable transcript
`reward_breakdown`	`RewardBreakdown`	yes	Final reward components
`total_reward`	`float`	yes	Final total reward
`rounds_used`	`int`	yes	Number of completed rounds
`agreement_reached`	`bool`	yes	Final agreement flag
`judge_notes`	`str`	yes	Human-readable audit summary
`verdict`	`str`	yes	One of `accept`, `revise`, `reject`

Canonical example:

{
  "episode_id": "cell_biology-17-medium-0001",
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "final_state": {
    "seed": 17,
    "scenario_template": "cell_biology",
    "difficulty": "medium",
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "lab_budget_total": 1200.0,
    "lab_budget_remaining": 850.0,
    "lab_equipment": ["co2_incubator", "microscope"],
    "lab_reagents": ["dmso", "drug_x", "culture_media"],
    "lab_staff_count": 2,
    "lab_time_limit_days": 7,
    "current_protocol": {
      "sample_size": 32,
      "controls": ["vehicle_control", "positive_control"],
      "technique": "manual_cell_counting",
      "duration_days": 5,
      "required_equipment": ["microscope", "co2_incubator"],
      "required_reagents": ["dmso", "drug_x", "culture_media"],
      "rationale": "Uses available equipment while preserving control structure."
    },
    "conversation_history": [
      {
        "role": "scientist",
        "message": "I propose a manual counting protocol that keeps both controls.",
        "round_number": 0,
        "action_type": "propose_protocol"
      },
      {
        "role": "lab_manager",
        "message": "This alternative is feasible with current equipment and budget.",
        "round_number": 0,
        "action_type": "accept"
      }
    ],
    "round_number": 1,
    "max_rounds": 6,
    "done": true,
    "agreement_reached": true,
    "reward": 6.72,
    "rigor_score": 0.9,
    "feasibility_score": 0.8,
    "fidelity_score": 0.85
  },
  "transcript": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    },
    {
      "role": "lab_manager",
      "message": "This alternative is feasible with current equipment and budget.",
      "round_number": 0,
      "action_type": "accept"
    }
  ],
  "reward_breakdown": {
    "rigor": 0.9,
    "feasibility": 0.8,
    "fidelity": 0.85,
    "efficiency_bonus": 0.25,
    "communication_bonus": 0.15,
    "penalties": {
      "invalid_action": 0.0,
      "timeout": 0.0
    }
  },
  "total_reward": 6.72,
  "rounds_used": 1,
  "agreement_reached": true,
  "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
  "verdict": "accept"
}

Sign-off

Owner	Status	Notes
Person B (Ayush)	signed off	Draft matches current stubs and downstream parser needs
Kian (Person A)	signed off	Validator and environment-owner review completed; contract is frozen for `MOD 01`, `MOD 03`, `FND 09`, and downstream parser work