replicalab / docs /fnd08_frozen_json_contract.md
maxxie114's picture
Initial HF Spaces deployment
80d8c84
# FND 08 Frozen JSON Contract
Status: completed on 2026-03-08
Owners: Person A and Person B
Drafted by: Person B (Ayush)
Remaining acceptance item: none
Source schema file: `replicalab/models.py`
## Purpose
This document freezes the JSON contract for the shared ReplicaLab data models so downstream work can proceed without schema drift. It is the reference for:
- Person A validators and environment state handling
- Person B prompt formatting and action parsing
- Person C API payload examples
- Person D frontend and replay mocks
## Tool-Capability Addendum
The richer-capability MVP adds bounded search, code-check, and image-inspection
support below this frozen contract.
This addendum does **not** reopen the outward action schema from `FND 08`.
The final outward actions remain `ScientistAction` and `LabManagerAction`.
Bounded tool use will be represented through scenario or evidence metadata,
environment-side tool traces, and `StepResult.info` or replay payloads rather
than new outward action types for the MVP.
## Global conventions
- All JSON keys use `snake_case`.
- Enum-like values use lowercase snake_case strings.
- All top-level keys listed in this document must be present unless explicitly marked nullable.
- Use `null` for an absent single object.
- Use `[]` for a known empty collection.
- Use `{}` only for flexible metadata objects such as `info` and `reward_breakdown`.
- `round_number` is zero-based. `0` is the state immediately after `reset()`.
- `duration_days` and `time_limit_days` are whole calendar days.
- `difficulty` values are `easy`, `medium`, or `hard`.
- Component scores such as rigor, feasibility, and fidelity are floats in the inclusive range `0.0` to `1.0`.
## Shared nested objects
### ConversationEntry
Each item in `conversation_history` or `transcript` must use this shape:
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `role` | `str` | yes | One of `scientist`, `lab_manager`, `system` |
| `message` | `str` | yes | Human-readable turn text |
| `round_number` | `int` | yes | Zero-based round index for the message |
| `action_type` | `str \| null` | yes | Mirrors the action type when the message comes from an agent, otherwise `null` |
### Protocol
When `current_protocol` is not `null`, it must use this shape:
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `sample_size` | `int` | yes | Non-negative integer |
| `controls` | `list[str]` | yes | Empty list when no controls are specified yet |
| `technique` | `str` | yes | Proposed experimental technique |
| `duration_days` | `int` | yes | Whole calendar days |
| `required_equipment` | `list[str]` | yes | Empty list when none is needed |
| `required_reagents` | `list[str]` | yes | Empty list when none is needed |
| `rationale` | `str` | yes | Short explanation for the protocol |
### RewardBreakdown
When `reward_breakdown` is present, it must use this shape:
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `rigor` | `float` | yes | Component score in `0.0` to `1.0` |
| `feasibility` | `float` | yes | Component score in `0.0` to `1.0` |
| `fidelity` | `float` | yes | Component score in `0.0` to `1.0` |
| `efficiency_bonus` | `float` | yes | Bonus term, `0.0` if unused |
| `communication_bonus` | `float` | yes | Bonus term, `0.0` if unused |
| `penalties` | `dict[str, float]` | yes | Per-penalty values keyed by penalty name |
## Model contracts
### ScientistAction
Action types:
- `propose_protocol`
- `revise_protocol`
- `request_info`
- `accept`
Field contract:
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `action_type` | `str` | yes | Must be one of the values above |
| `sample_size` | `int` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` |
| `controls` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `technique` | `str` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `""` |
| `duration_days` | `int` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` |
| `required_equipment` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `required_reagents` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `questions` | `list[str]` | yes | Meaningful for `request_info`, otherwise `[]` |
| `rationale` | `str` | yes | Required free-text explanation for protocol proposals and revisions; `""` for `accept` |
Canonical example:
```json
{
"action_type": "propose_protocol",
"sample_size": 48,
"controls": ["vehicle_control", "positive_control"],
"technique": "wst1_assay",
"duration_days": 5,
"required_equipment": ["plate_reader", "co2_incubator"],
"required_reagents": ["wst1", "dmso", "drug_x"],
"questions": [],
"rationale": "Keeps the core readout while using equipment commonly available in teaching labs."
}
```
### LabManagerAction
Action types:
- `report_feasibility`
- `suggest_alternative`
- `reject`
- `accept`
Field contract:
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `action_type` | `str` | yes | Must be one of the values above |
| `feasible` | `bool` | yes | Overall summary flag equal to the logical AND of the constraint dimension flags |
| `budget_ok` | `bool` | yes | Whether the proposed protocol fits remaining budget |
| `equipment_ok` | `bool` | yes | Whether required equipment is available in time |
| `reagents_ok` | `bool` | yes | Whether required reagents are available |
| `schedule_ok` | `bool` | yes | Whether the protocol fits the allowed timeline |
| `staff_ok` | `bool` | yes | Whether staffing is sufficient |
| `suggested_technique` | `str` | yes | Meaningful for `suggest_alternative`, otherwise `""` |
| `suggested_sample_size` | `int` | yes | Meaningful for `suggest_alternative`, otherwise `0` |
| `suggested_controls` | `list[str]` | yes | Meaningful for `suggest_alternative`, otherwise `[]` |
| `explanation` | `str` | yes | Human-readable explanation of the constraint outcome |
Conditional rules:
- `action_type = accept` implies `feasible = true` and all constraint flags are `true`.
- `action_type = reject` implies `feasible = false` and at least one constraint flag is `false`.
- `action_type = suggest_alternative` implies `feasible = false` and at least one of the suggestion fields carries a non-default value.
Canonical example:
```json
{
"action_type": "suggest_alternative",
"feasible": false,
"budget_ok": true,
"equipment_ok": false,
"reagents_ok": true,
"schedule_ok": true,
"staff_ok": true,
"suggested_technique": "manual_cell_counting",
"suggested_sample_size": 32,
"suggested_controls": ["vehicle_control", "positive_control"],
"explanation": "The plate reader is fully booked, so use manual counting and reduce the sample size to stay within the timeline."
}
```
### ScientistObservation
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `paper_title` | `str` | yes | Study title |
| `paper_hypothesis` | `str` | yes | Core hypothesis being replicated |
| `paper_method` | `str` | yes | Short method summary |
| `paper_key_finding` | `str` | yes | Main finding being targeted |
| `experiment_goal` | `str` | yes | What the scientist is trying to preserve |
| `conversation_history` | `list[ConversationEntry]` | yes | Empty list at reset |
| `current_protocol` | `Protocol \| null` | yes | `null` until a protocol exists |
| `round_number` | `int` | yes | Zero-based current round |
| `max_rounds` | `int` | yes | Max allowed rounds in the episode |
Canonical example:
```json
{
"paper_title": "Drug X reduces glioblastoma cell viability",
"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
"conversation_history": [],
"current_protocol": null,
"round_number": 0,
"max_rounds": 6
}
```
### LabManagerObservation
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `budget_total` | `float` | yes | Initial budget for the episode |
| `budget_remaining` | `float` | yes | Current remaining budget |
| `equipment_available` | `list[str]` | yes | Equipment that can be used |
| `equipment_booked` | `list[str]` | yes | Equipment unavailable due to booking |
| `reagents_in_stock` | `list[str]` | yes | Available reagents |
| `reagents_out_of_stock` | `list[str]` | yes | Required but unavailable reagents |
| `staff_count` | `int` | yes | Available staff count |
| `time_limit_days` | `int` | yes | Whole calendar days remaining |
| `safety_restrictions` | `list[str]` | yes | Constraints such as banned solvents or assay restrictions |
| `conversation_history` | `list[ConversationEntry]` | yes | Empty list at reset |
| `current_protocol` | `Protocol \| null` | yes | `null` until a protocol exists |
| `round_number` | `int` | yes | Zero-based current round |
| `max_rounds` | `int` | yes | Max allowed rounds in the episode |
Canonical example:
```json
{
"budget_total": 1200.0,
"budget_remaining": 1200.0,
"equipment_available": ["co2_incubator", "microscope"],
"equipment_booked": ["plate_reader"],
"reagents_in_stock": ["dmso", "drug_x", "culture_media"],
"reagents_out_of_stock": ["wst1"],
"staff_count": 2,
"time_limit_days": 7,
"safety_restrictions": ["no_radioactive_reagents"],
"conversation_history": [],
"current_protocol": null,
"round_number": 0,
"max_rounds": 6
}
```
### Observation
Wrapper behavior:
- Serialized `Observation` objects always include both top-level keys: `scientist` and `lab_manager`.
- In shared environment state, replay, and API payloads, both branches should normally be populated.
- When a consumer is intentionally given only one role view, the non-owned branch must be `null`, not omitted.
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `scientist` | `ScientistObservation \| null` | yes | Scientist-side view |
| `lab_manager` | `LabManagerObservation \| null` | yes | Lab-manager-side view |
Canonical example:
```json
{
"scientist": {
"paper_title": "Drug X reduces glioblastoma cell viability",
"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
"conversation_history": [],
"current_protocol": null,
"round_number": 0,
"max_rounds": 6
},
"lab_manager": {
"budget_total": 1200.0,
"budget_remaining": 1200.0,
"equipment_available": ["co2_incubator", "microscope"],
"equipment_booked": ["plate_reader"],
"reagents_in_stock": ["dmso", "drug_x", "culture_media"],
"reagents_out_of_stock": ["wst1"],
"staff_count": 2,
"time_limit_days": 7,
"safety_restrictions": ["no_radioactive_reagents"],
"conversation_history": [],
"current_protocol": null,
"round_number": 0,
"max_rounds": 6
}
}
```
### StepResult
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `observation` | `Observation \| null` | yes | Present on normal steps and terminal steps; may be `null` only on hard failure |
| `reward` | `float` | yes | Episode reward after the step; terminal reward on final step |
| `done` | `bool` | yes | Whether the episode is terminal |
| `info` | `dict` | yes | Flexible metadata object |
Reserved `info` keys:
- `agreement_reached`: `bool`
- `error`: `str | null`
- `reward_breakdown`: `RewardBreakdown | null`
- `judge_notes`: `str | null`
- `verdict`: `str | null`
Canonical example:
```json
{
"observation": {
"scientist": null,
"lab_manager": null
},
"reward": 6.72,
"done": true,
"info": {
"agreement_reached": true,
"error": null,
"reward_breakdown": {
"rigor": 0.9,
"feasibility": 0.8,
"fidelity": 0.85,
"efficiency_bonus": 0.25,
"communication_bonus": 0.15,
"penalties": {
"invalid_action": 0.0,
"timeout": 0.0
}
},
"judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
"verdict": "accept"
}
}
```
### EpisodeState
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `seed` | `int` | yes | Deterministic episode seed |
| `scenario_template` | `str` | yes | Scenario family identifier |
| `difficulty` | `str` | yes | `easy`, `medium`, or `hard` |
| `paper_title` | `str` | yes | Study title |
| `paper_hypothesis` | `str` | yes | Core hypothesis |
| `paper_method` | `str` | yes | Method summary |
| `paper_key_finding` | `str` | yes | Main finding |
| `experiment_goal` | `str` | yes | Goal preserved through negotiation |
| `lab_budget_total` | `float` | yes | Initial budget |
| `lab_budget_remaining` | `float` | yes | Remaining budget |
| `lab_equipment` | `list[str]` | yes | Equipment state |
| `lab_reagents` | `list[str]` | yes | Reagent state |
| `lab_staff_count` | `int` | yes | Available staff count |
| `lab_time_limit_days` | `int` | yes | Whole calendar days remaining |
| `current_protocol` | `Protocol \| null` | yes | Current agreed or latest proposed protocol |
| `conversation_history` | `list[ConversationEntry]` | yes | Negotiation history |
| `round_number` | `int` | yes | Zero-based round counter |
| `max_rounds` | `int` | yes | Maximum rounds allowed |
| `done` | `bool` | yes | Terminal flag |
| `agreement_reached` | `bool` | yes | Whether both sides reached agreement |
| `reward` | `float` | yes | Final total reward or `0.0` until terminal scoring |
| `rigor_score` | `float` | yes | Final component score or `0.0` until terminal scoring |
| `feasibility_score` | `float` | yes | Final component score or `0.0` until terminal scoring |
| `fidelity_score` | `float` | yes | Final component score or `0.0` until terminal scoring |
Canonical example:
```json
{
"seed": 17,
"scenario_template": "cell_biology",
"difficulty": "medium",
"paper_title": "Drug X reduces glioblastoma cell viability",
"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
"lab_budget_total": 1200.0,
"lab_budget_remaining": 850.0,
"lab_equipment": ["co2_incubator", "microscope"],
"lab_reagents": ["dmso", "drug_x", "culture_media"],
"lab_staff_count": 2,
"lab_time_limit_days": 7,
"current_protocol": {
"sample_size": 32,
"controls": ["vehicle_control", "positive_control"],
"technique": "manual_cell_counting",
"duration_days": 5,
"required_equipment": ["microscope", "co2_incubator"],
"required_reagents": ["dmso", "drug_x", "culture_media"],
"rationale": "Uses available equipment while preserving control structure."
},
"conversation_history": [
{
"role": "scientist",
"message": "I propose a manual counting protocol that keeps both controls.",
"round_number": 0,
"action_type": "propose_protocol"
}
],
"round_number": 1,
"max_rounds": 6,
"done": false,
"agreement_reached": false,
"reward": 0.0,
"rigor_score": 0.0,
"feasibility_score": 0.0,
"fidelity_score": 0.0
}
```
### EpisodeLog
| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `episode_id` | `str` | yes | Stable replay identifier |
| `seed` | `int` | yes | Episode seed |
| `scenario_template` | `str` | yes | Scenario family identifier |
| `difficulty` | `str` | yes | `easy`, `medium`, or `hard` |
| `final_state` | `EpisodeState \| null` | yes | Must be populated for completed episodes |
| `transcript` | `list[ConversationEntry]` | yes | Replayable transcript |
| `reward_breakdown` | `RewardBreakdown` | yes | Final reward components |
| `total_reward` | `float` | yes | Final total reward |
| `rounds_used` | `int` | yes | Number of completed rounds |
| `agreement_reached` | `bool` | yes | Final agreement flag |
| `judge_notes` | `str` | yes | Human-readable audit summary |
| `verdict` | `str` | yes | One of `accept`, `revise`, `reject` |
Canonical example:
```json
{
"episode_id": "cell_biology-17-medium-0001",
"seed": 17,
"scenario_template": "cell_biology",
"difficulty": "medium",
"final_state": {
"seed": 17,
"scenario_template": "cell_biology",
"difficulty": "medium",
"paper_title": "Drug X reduces glioblastoma cell viability",
"paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
"paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
"paper_key_finding": "The highest dose reduced viability by about 40 percent.",
"experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
"lab_budget_total": 1200.0,
"lab_budget_remaining": 850.0,
"lab_equipment": ["co2_incubator", "microscope"],
"lab_reagents": ["dmso", "drug_x", "culture_media"],
"lab_staff_count": 2,
"lab_time_limit_days": 7,
"current_protocol": {
"sample_size": 32,
"controls": ["vehicle_control", "positive_control"],
"technique": "manual_cell_counting",
"duration_days": 5,
"required_equipment": ["microscope", "co2_incubator"],
"required_reagents": ["dmso", "drug_x", "culture_media"],
"rationale": "Uses available equipment while preserving control structure."
},
"conversation_history": [
{
"role": "scientist",
"message": "I propose a manual counting protocol that keeps both controls.",
"round_number": 0,
"action_type": "propose_protocol"
},
{
"role": "lab_manager",
"message": "This alternative is feasible with current equipment and budget.",
"round_number": 0,
"action_type": "accept"
}
],
"round_number": 1,
"max_rounds": 6,
"done": true,
"agreement_reached": true,
"reward": 6.72,
"rigor_score": 0.9,
"feasibility_score": 0.8,
"fidelity_score": 0.85
},
"transcript": [
{
"role": "scientist",
"message": "I propose a manual counting protocol that keeps both controls.",
"round_number": 0,
"action_type": "propose_protocol"
},
{
"role": "lab_manager",
"message": "This alternative is feasible with current equipment and budget.",
"round_number": 0,
"action_type": "accept"
}
],
"reward_breakdown": {
"rigor": 0.9,
"feasibility": 0.8,
"fidelity": 0.85,
"efficiency_bonus": 0.25,
"communication_bonus": 0.15,
"penalties": {
"invalid_action": 0.0,
"timeout": 0.0
}
},
"total_reward": 6.72,
"rounds_used": 1,
"agreement_reached": true,
"judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
"verdict": "accept"
}
```
## Sign-off
| Owner | Status | Notes |
| --- | --- | --- |
| Person B (Ayush) | signed off | Draft matches current stubs and downstream parser needs |
| Kian (Person A) | signed off | Validator and environment-owner review completed; contract is frozen for `MOD 01`, `MOD 03`, `FND 09`, and downstream parser work |