Spaces:

openenv-community
/

replicalab

Running

App Files Files Community

replicalab / docs /fnd08_frozen_json_contract.md

maxxie114

Initial HF Spaces deployment

80d8c84 9 days ago

preview code

raw

history blame contribute delete

19.7 kB

	# FND 08 Frozen JSON Contract

	Status: completed on 2026-03-08
	Owners: Person A and Person B
	Drafted by: Person B (Ayush)
	Remaining acceptance item: none

	Source schema file: `replicalab/models.py`

	## Purpose

	This document freezes the JSON contract for the shared ReplicaLab data models so downstream work can proceed without schema drift. It is the reference for:

	- Person A validators and environment state handling
	- Person B prompt formatting and action parsing
	- Person C API payload examples
	- Person D frontend and replay mocks

	## Tool-Capability Addendum

	The richer-capability MVP adds bounded search, code-check, and image-inspection
	support below this frozen contract.

	This addendum does not reopen the outward action schema from `FND 08`.
	The final outward actions remain `ScientistAction` and `LabManagerAction`.
	Bounded tool use will be represented through scenario or evidence metadata,
	environment-side tool traces, and `StepResult.info` or replay payloads rather
	than new outward action types for the MVP.

	## Global conventions

	- All JSON keys use `snake_case`.
	- Enum-like values use lowercase snake_case strings.
	- All top-level keys listed in this document must be present unless explicitly marked nullable.
	- Use `null` for an absent single object.
	- Use `[]` for a known empty collection.
	- Use `{}` only for flexible metadata objects such as `info` and `reward_breakdown`.
	- `round_number` is zero-based. `0` is the state immediately after `reset()`.
	- `duration_days` and `time_limit_days` are whole calendar days.
	- `difficulty` values are `easy`, `medium`, or `hard`.
	- Component scores such as rigor, feasibility, and fidelity are floats in the inclusive range `0.0` to `1.0`.

	## Shared nested objects

	### ConversationEntry

	Each item in `conversation_history` or `transcript` must use this shape:

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `role` \| `str` \| yes \| One of `scientist`, `lab_manager`, `system` \|
	\| `message` \| `str` \| yes \| Human-readable turn text \|
	\| `round_number` \| `int` \| yes \| Zero-based round index for the message \|
	\| `action_type` \| `str \\| null` \| yes \| Mirrors the action type when the message comes from an agent, otherwise `null` \|

	### Protocol

	When `current_protocol` is not `null`, it must use this shape:

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `sample_size` \| `int` \| yes \| Non-negative integer \|
	\| `controls` \| `list[str]` \| yes \| Empty list when no controls are specified yet \|
	\| `technique` \| `str` \| yes \| Proposed experimental technique \|
	\| `duration_days` \| `int` \| yes \| Whole calendar days \|
	\| `required_equipment` \| `list[str]` \| yes \| Empty list when none is needed \|
	\| `required_reagents` \| `list[str]` \| yes \| Empty list when none is needed \|
	\| `rationale` \| `str` \| yes \| Short explanation for the protocol \|

	### RewardBreakdown

	When `reward_breakdown` is present, it must use this shape:

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `rigor` \| `float` \| yes \| Component score in `0.0` to `1.0` \|
	\| `feasibility` \| `float` \| yes \| Component score in `0.0` to `1.0` \|
	\| `fidelity` \| `float` \| yes \| Component score in `0.0` to `1.0` \|
	\| `efficiency_bonus` \| `float` \| yes \| Bonus term, `0.0` if unused \|
	\| `communication_bonus` \| `float` \| yes \| Bonus term, `0.0` if unused \|
	\| `penalties` \| `dict[str, float]` \| yes \| Per-penalty values keyed by penalty name \|

	## Model contracts

	### ScientistAction

	Action types:

	- `propose_protocol`
	- `revise_protocol`
	- `request_info`
	- `accept`

	Field contract:

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `action_type` \| `str` \| yes \| Must be one of the values above \|
	\| `sample_size` \| `int` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` \|
	\| `controls` \| `list[str]` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` \|
	\| `technique` \| `str` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `""` \|
	\| `duration_days` \| `int` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` \|
	\| `required_equipment` \| `list[str]` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` \|
	\| `required_reagents` \| `list[str]` \| yes \| Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` \|
	\| `questions` \| `list[str]` \| yes \| Meaningful for `request_info`, otherwise `[]` \|
	\| `rationale` \| `str` \| yes \| Required free-text explanation for protocol proposals and revisions; `""` for `accept` \|

	Canonical example:

	```json
	{
	"action_type": "propose_protocol",
	"sample_size": 48,
	"controls": ["vehicle_control", "positive_control"],
	"technique": "wst1_assay",
	"duration_days": 5,
	"required_equipment": ["plate_reader", "co2_incubator"],
	"required_reagents": ["wst1", "dmso", "drug_x"],
	"questions": [],
	"rationale": "Keeps the core readout while using equipment commonly available in teaching labs."
	}
	```

	### LabManagerAction

	Action types:

	- `report_feasibility`
	- `suggest_alternative`
	- `reject`
	- `accept`

	Field contract:

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `action_type` \| `str` \| yes \| Must be one of the values above \|
	\| `feasible` \| `bool` \| yes \| Overall summary flag equal to the logical AND of the constraint dimension flags \|
	\| `budget_ok` \| `bool` \| yes \| Whether the proposed protocol fits remaining budget \|
	\| `equipment_ok` \| `bool` \| yes \| Whether required equipment is available in time \|
	\| `reagents_ok` \| `bool` \| yes \| Whether required reagents are available \|
	\| `schedule_ok` \| `bool` \| yes \| Whether the protocol fits the allowed timeline \|
	\| `staff_ok` \| `bool` \| yes \| Whether staffing is sufficient \|
	\| `suggested_technique` \| `str` \| yes \| Meaningful for `suggest_alternative`, otherwise `""` \|
	\| `suggested_sample_size` \| `int` \| yes \| Meaningful for `suggest_alternative`, otherwise `0` \|
	\| `suggested_controls` \| `list[str]` \| yes \| Meaningful for `suggest_alternative`, otherwise `[]` \|
	\| `explanation` \| `str` \| yes \| Human-readable explanation of the constraint outcome \|

	Conditional rules:

	- `action_type = accept` implies `feasible = true` and all constraint flags are `true`.
	- `action_type = reject` implies `feasible = false` and at least one constraint flag is `false`.
	- `action_type = suggest_alternative` implies `feasible = false` and at least one of the suggestion fields carries a non-default value.

	Canonical example:

	```json
	{
	"action_type": "suggest_alternative",
	"feasible": false,
	"budget_ok": true,
	"equipment_ok": false,
	"reagents_ok": true,
	"schedule_ok": true,
	"staff_ok": true,
	"suggested_technique": "manual_cell_counting",
	"suggested_sample_size": 32,
	"suggested_controls": ["vehicle_control", "positive_control"],
	"explanation": "The plate reader is fully booked, so use manual counting and reduce the sample size to stay within the timeline."
	}
	```

	### ScientistObservation

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `paper_title` \| `str` \| yes \| Study title \|
	\| `paper_hypothesis` \| `str` \| yes \| Core hypothesis being replicated \|
	\| `paper_method` \| `str` \| yes \| Short method summary \|
	\| `paper_key_finding` \| `str` \| yes \| Main finding being targeted \|
	\| `experiment_goal` \| `str` \| yes \| What the scientist is trying to preserve \|
	\| `conversation_history` \| `list[ConversationEntry]` \| yes \| Empty list at reset \|
	\| `current_protocol` \| `Protocol \\| null` \| yes \| `null` until a protocol exists \|
	\| `round_number` \| `int` \| yes \| Zero-based current round \|
	\| `max_rounds` \| `int` \| yes \| Max allowed rounds in the episode \|

	Canonical example:

	```json
	{
	"paper_title": "Drug X reduces glioblastoma cell viability",
	"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
	"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
	"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
	"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
	"conversation_history": [],
	"current_protocol": null,
	"round_number": 0,
	"max_rounds": 6
	}
	```

	### LabManagerObservation

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `budget_total` \| `float` \| yes \| Initial budget for the episode \|
	\| `budget_remaining` \| `float` \| yes \| Current remaining budget \|
	\| `equipment_available` \| `list[str]` \| yes \| Equipment that can be used \|
	\| `equipment_booked` \| `list[str]` \| yes \| Equipment unavailable due to booking \|
	\| `reagents_in_stock` \| `list[str]` \| yes \| Available reagents \|
	\| `reagents_out_of_stock` \| `list[str]` \| yes \| Required but unavailable reagents \|
	\| `staff_count` \| `int` \| yes \| Available staff count \|
	\| `time_limit_days` \| `int` \| yes \| Whole calendar days remaining \|
	\| `safety_restrictions` \| `list[str]` \| yes \| Constraints such as banned solvents or assay restrictions \|
	\| `conversation_history` \| `list[ConversationEntry]` \| yes \| Empty list at reset \|
	\| `current_protocol` \| `Protocol \\| null` \| yes \| `null` until a protocol exists \|
	\| `round_number` \| `int` \| yes \| Zero-based current round \|
	\| `max_rounds` \| `int` \| yes \| Max allowed rounds in the episode \|

	Canonical example:

	```json
	{
	"budget_total": 1200.0,
	"budget_remaining": 1200.0,
	"equipment_available": ["co2_incubator", "microscope"],
	"equipment_booked": ["plate_reader"],
	"reagents_in_stock": ["dmso", "drug_x", "culture_media"],
	"reagents_out_of_stock": ["wst1"],
	"staff_count": 2,
	"time_limit_days": 7,
	"safety_restrictions": ["no_radioactive_reagents"],
	"conversation_history": [],
	"current_protocol": null,
	"round_number": 0,
	"max_rounds": 6
	}
	```

	### Observation

	Wrapper behavior:

	- Serialized `Observation` objects always include both top-level keys: `scientist` and `lab_manager`.
	- In shared environment state, replay, and API payloads, both branches should normally be populated.
	- When a consumer is intentionally given only one role view, the non-owned branch must be `null`, not omitted.

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `scientist` \| `ScientistObservation \\| null` \| yes \| Scientist-side view \|
	\| `lab_manager` \| `LabManagerObservation \\| null` \| yes \| Lab-manager-side view \|

	Canonical example:

	```json
	{
	"scientist": {
	"paper_title": "Drug X reduces glioblastoma cell viability",
	"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
	"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
	"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
	"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
	"conversation_history": [],
	"current_protocol": null,
	"round_number": 0,
	"max_rounds": 6
	},
	"lab_manager": {
	"budget_total": 1200.0,
	"budget_remaining": 1200.0,
	"equipment_available": ["co2_incubator", "microscope"],
	"equipment_booked": ["plate_reader"],
	"reagents_in_stock": ["dmso", "drug_x", "culture_media"],
	"reagents_out_of_stock": ["wst1"],
	"staff_count": 2,
	"time_limit_days": 7,
	"safety_restrictions": ["no_radioactive_reagents"],
	"conversation_history": [],
	"current_protocol": null,
	"round_number": 0,
	"max_rounds": 6
	}
	}
	```

	### StepResult

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `observation` \| `Observation \\| null` \| yes \| Present on normal steps and terminal steps; may be `null` only on hard failure \|
	\| `reward` \| `float` \| yes \| Episode reward after the step; terminal reward on final step \|
	\| `done` \| `bool` \| yes \| Whether the episode is terminal \|
	\| `info` \| `dict` \| yes \| Flexible metadata object \|

	Reserved `info` keys:

	- `agreement_reached`: `bool`
	- `error`: `str \| null`
	- `reward_breakdown`: `RewardBreakdown \| null`
	- `judge_notes`: `str \| null`
	- `verdict`: `str \| null`

	Canonical example:

	```json
	{
	"observation": {
	"scientist": null,
	"lab_manager": null
	},
	"reward": 6.72,
	"done": true,
	"info": {
	"agreement_reached": true,
	"error": null,
	"reward_breakdown": {
	"rigor": 0.9,
	"feasibility": 0.8,
	"fidelity": 0.85,
	"efficiency_bonus": 0.25,
	"communication_bonus": 0.15,
	"penalties": {
	"invalid_action": 0.0,
	"timeout": 0.0
	}
	},
	"judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
	"verdict": "accept"
	}
	}
	```

	### EpisodeState

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `seed` \| `int` \| yes \| Deterministic episode seed \|
	\| `scenario_template` \| `str` \| yes \| Scenario family identifier \|
	\| `difficulty` \| `str` \| yes \| `easy`, `medium`, or `hard` \|
	\| `paper_title` \| `str` \| yes \| Study title \|
	\| `paper_hypothesis` \| `str` \| yes \| Core hypothesis \|
	\| `paper_method` \| `str` \| yes \| Method summary \|
	\| `paper_key_finding` \| `str` \| yes \| Main finding \|
	\| `experiment_goal` \| `str` \| yes \| Goal preserved through negotiation \|
	\| `lab_budget_total` \| `float` \| yes \| Initial budget \|
	\| `lab_budget_remaining` \| `float` \| yes \| Remaining budget \|
	\| `lab_equipment` \| `list[str]` \| yes \| Equipment state \|
	\| `lab_reagents` \| `list[str]` \| yes \| Reagent state \|
	\| `lab_staff_count` \| `int` \| yes \| Available staff count \|
	\| `lab_time_limit_days` \| `int` \| yes \| Whole calendar days remaining \|
	\| `current_protocol` \| `Protocol \\| null` \| yes \| Current agreed or latest proposed protocol \|
	\| `conversation_history` \| `list[ConversationEntry]` \| yes \| Negotiation history \|
	\| `round_number` \| `int` \| yes \| Zero-based round counter \|
	\| `max_rounds` \| `int` \| yes \| Maximum rounds allowed \|
	\| `done` \| `bool` \| yes \| Terminal flag \|
	\| `agreement_reached` \| `bool` \| yes \| Whether both sides reached agreement \|
	\| `reward` \| `float` \| yes \| Final total reward or `0.0` until terminal scoring \|
	\| `rigor_score` \| `float` \| yes \| Final component score or `0.0` until terminal scoring \|
	\| `feasibility_score` \| `float` \| yes \| Final component score or `0.0` until terminal scoring \|
	\| `fidelity_score` \| `float` \| yes \| Final component score or `0.0` until terminal scoring \|

	Canonical example:

	```json
	{
	"seed": 17,
	"scenario_template": "cell_biology",
	"difficulty": "medium",
	"paper_title": "Drug X reduces glioblastoma cell viability",
	"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
	"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
	"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
	"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
	"lab_budget_total": 1200.0,
	"lab_budget_remaining": 850.0,
	"lab_equipment": ["co2_incubator", "microscope"],
	"lab_reagents": ["dmso", "drug_x", "culture_media"],
	"lab_staff_count": 2,
	"lab_time_limit_days": 7,
	"current_protocol": {
	"sample_size": 32,
	"controls": ["vehicle_control", "positive_control"],
	"technique": "manual_cell_counting",
	"duration_days": 5,
	"required_equipment": ["microscope", "co2_incubator"],
	"required_reagents": ["dmso", "drug_x", "culture_media"],
	"rationale": "Uses available equipment while preserving control structure."
	},
	"conversation_history": [
	{
	"role": "scientist",
	"message": "I propose a manual counting protocol that keeps both controls.",
	"round_number": 0,
	"action_type": "propose_protocol"
	}
	],
	"round_number": 1,
	"max_rounds": 6,
	"done": false,
	"agreement_reached": false,
	"reward": 0.0,
	"rigor_score": 0.0,
	"feasibility_score": 0.0,
	"fidelity_score": 0.0
	}
	```

	### EpisodeLog

	\| Field \| Type \| Required \| Notes \|
	\| --- \| --- \| --- \| --- \|
	\| `episode_id` \| `str` \| yes \| Stable replay identifier \|
	\| `seed` \| `int` \| yes \| Episode seed \|
	\| `scenario_template` \| `str` \| yes \| Scenario family identifier \|
	\| `difficulty` \| `str` \| yes \| `easy`, `medium`, or `hard` \|
	\| `final_state` \| `EpisodeState \\| null` \| yes \| Must be populated for completed episodes \|
	\| `transcript` \| `list[ConversationEntry]` \| yes \| Replayable transcript \|
	\| `reward_breakdown` \| `RewardBreakdown` \| yes \| Final reward components \|
	\| `total_reward` \| `float` \| yes \| Final total reward \|
	\| `rounds_used` \| `int` \| yes \| Number of completed rounds \|
	\| `agreement_reached` \| `bool` \| yes \| Final agreement flag \|
	\| `judge_notes` \| `str` \| yes \| Human-readable audit summary \|
	\| `verdict` \| `str` \| yes \| One of `accept`, `revise`, `reject` \|

	Canonical example:

	```json
	{
	"episode_id": "cell_biology-17-medium-0001",
	"seed": 17,
	"scenario_template": "cell_biology",
	"difficulty": "medium",
	"final_state": {
	"seed": 17,
	"scenario_template": "cell_biology",
	"difficulty": "medium",
	"paper_title": "Drug X reduces glioblastoma cell viability",
	"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
	"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
	"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
	"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
	"lab_budget_total": 1200.0,
	"lab_budget_remaining": 850.0,
	"lab_equipment": ["co2_incubator", "microscope"],
	"lab_reagents": ["dmso", "drug_x", "culture_media"],
	"lab_staff_count": 2,
	"lab_time_limit_days": 7,
	"current_protocol": {
	"sample_size": 32,
	"controls": ["vehicle_control", "positive_control"],
	"technique": "manual_cell_counting",
	"duration_days": 5,
	"required_equipment": ["microscope", "co2_incubator"],
	"required_reagents": ["dmso", "drug_x", "culture_media"],
	"rationale": "Uses available equipment while preserving control structure."
	},
	"conversation_history": [
	{
	"role": "scientist",
	"message": "I propose a manual counting protocol that keeps both controls.",
	"round_number": 0,
	"action_type": "propose_protocol"
	},
	{
	"role": "lab_manager",
	"message": "This alternative is feasible with current equipment and budget.",
	"round_number": 0,
	"action_type": "accept"
	}
	],
	"round_number": 1,
	"max_rounds": 6,
	"done": true,
	"agreement_reached": true,
	"reward": 6.72,
	"rigor_score": 0.9,
	"feasibility_score": 0.8,
	"fidelity_score": 0.85
	},
	"transcript": [
	{
	"role": "scientist",
	"message": "I propose a manual counting protocol that keeps both controls.",
	"round_number": 0,
	"action_type": "propose_protocol"
	},
	{
	"role": "lab_manager",
	"message": "This alternative is feasible with current equipment and budget.",
	"round_number": 0,
	"action_type": "accept"
	}
	],
	"reward_breakdown": {
	"rigor": 0.9,
	"feasibility": 0.8,
	"fidelity": 0.85,
	"efficiency_bonus": 0.25,
	"communication_bonus": 0.15,
	"penalties": {
	"invalid_action": 0.0,
	"timeout": 0.0
	}
	},
	"total_reward": 6.72,
	"rounds_used": 1,
	"agreement_reached": true,
	"judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
	"verdict": "accept"
	}
	```

	## Sign-off

	\| Owner \| Status \| Notes \|
	\| --- \| --- \| --- \|
	\| Person B (Ayush) \| signed off \| Draft matches current stubs and downstream parser needs \|
	\| Kian (Person A) \| signed off \| Validator and environment-owner review completed; contract is frozen for `MOD 01`, `MOD 03`, `FND 09`, and downstream parser work \|