Spaces:

openenv-community
/

replicalab

Running

File size: 19,682 Bytes

80d8c84

# FND 08 Frozen JSON Contract

Status: completed on 2026-03-08
Owners: Person A and Person B
Drafted by: Person B (Ayush)
Remaining acceptance item: none

Source schema file: `replicalab/models.py`

## Purpose

This document freezes the JSON contract for the shared ReplicaLab data models so downstream work can proceed without schema drift. It is the reference for:

- Person A validators and environment state handling
- Person B prompt formatting and action parsing
- Person C API payload examples
- Person D frontend and replay mocks

## Tool-Capability Addendum

The richer-capability MVP adds bounded search, code-check, and image-inspection
support below this frozen contract.

This addendum does **not** reopen the outward action schema from `FND 08`.
The final outward actions remain `ScientistAction` and `LabManagerAction`.
Bounded tool use will be represented through scenario or evidence metadata,
environment-side tool traces, and `StepResult.info` or replay payloads rather
than new outward action types for the MVP.

## Global conventions

- All JSON keys use `snake_case`.
- Enum-like values use lowercase snake_case strings.
- All top-level keys listed in this document must be present unless explicitly marked nullable.
- Use `null` for an absent single object.
- Use `[]` for a known empty collection.
- Use `{}` only for flexible metadata objects such as `info` and `reward_breakdown`.
- `round_number` is zero-based. `0` is the state immediately after `reset()`.
- `duration_days` and `time_limit_days` are whole calendar days.
- `difficulty` values are `easy`, `medium`, or `hard`.
- Component scores such as rigor, feasibility, and fidelity are floats in the inclusive range `0.0` to `1.0`.

## Shared nested objects

### ConversationEntry

Each item in `conversation_history` or `transcript` must use this shape:

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `role` | `str` | yes | One of `scientist`, `lab_manager`, `system` |
| `message` | `str` | yes | Human-readable turn text |
| `round_number` | `int` | yes | Zero-based round index for the message |
| `action_type` | `str \| null` | yes | Mirrors the action type when the message comes from an agent, otherwise `null` |

### Protocol

When `current_protocol` is not `null`, it must use this shape:

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `sample_size` | `int` | yes | Non-negative integer |
| `controls` | `list[str]` | yes | Empty list when no controls are specified yet |
| `technique` | `str` | yes | Proposed experimental technique |
| `duration_days` | `int` | yes | Whole calendar days |
| `required_equipment` | `list[str]` | yes | Empty list when none is needed |
| `required_reagents` | `list[str]` | yes | Empty list when none is needed |
| `rationale` | `str` | yes | Short explanation for the protocol |

### RewardBreakdown

When `reward_breakdown` is present, it must use this shape:

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `rigor` | `float` | yes | Component score in `0.0` to `1.0` |
| `feasibility` | `float` | yes | Component score in `0.0` to `1.0` |
| `fidelity` | `float` | yes | Component score in `0.0` to `1.0` |
| `efficiency_bonus` | `float` | yes | Bonus term, `0.0` if unused |
| `communication_bonus` | `float` | yes | Bonus term, `0.0` if unused |
| `penalties` | `dict[str, float]` | yes | Per-penalty values keyed by penalty name |

## Model contracts

### ScientistAction

Action types:

- `propose_protocol`
- `revise_protocol`
- `request_info`
- `accept`

Field contract:

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `action_type` | `str` | yes | Must be one of the values above |
| `sample_size` | `int` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` |
| `controls` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `technique` | `str` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `""` |
| `duration_days` | `int` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `0` |
| `required_equipment` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `required_reagents` | `list[str]` | yes | Meaningful for `propose_protocol` and `revise_protocol`, otherwise `[]` |
| `questions` | `list[str]` | yes | Meaningful for `request_info`, otherwise `[]` |
| `rationale` | `str` | yes | Required free-text explanation for protocol proposals and revisions; `""` for `accept` |

Canonical example:

```json
{
  "action_type": "propose_protocol",
  "sample_size": 48,
  "controls": ["vehicle_control", "positive_control"],
  "technique": "wst1_assay",
  "duration_days": 5,
  "required_equipment": ["plate_reader", "co2_incubator"],
  "required_reagents": ["wst1", "dmso", "drug_x"],
  "questions": [],
  "rationale": "Keeps the core readout while using equipment commonly available in teaching labs."
}
```

### LabManagerAction

Action types:

- `report_feasibility`
- `suggest_alternative`
- `reject`
- `accept`

Field contract:

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `action_type` | `str` | yes | Must be one of the values above |
| `feasible` | `bool` | yes | Overall summary flag equal to the logical AND of the constraint dimension flags |
| `budget_ok` | `bool` | yes | Whether the proposed protocol fits remaining budget |
| `equipment_ok` | `bool` | yes | Whether required equipment is available in time |
| `reagents_ok` | `bool` | yes | Whether required reagents are available |
| `schedule_ok` | `bool` | yes | Whether the protocol fits the allowed timeline |
| `staff_ok` | `bool` | yes | Whether staffing is sufficient |
| `suggested_technique` | `str` | yes | Meaningful for `suggest_alternative`, otherwise `""` |
| `suggested_sample_size` | `int` | yes | Meaningful for `suggest_alternative`, otherwise `0` |
| `suggested_controls` | `list[str]` | yes | Meaningful for `suggest_alternative`, otherwise `[]` |
| `explanation` | `str` | yes | Human-readable explanation of the constraint outcome |

Conditional rules:

- `action_type = accept` implies `feasible = true` and all constraint flags are `true`.
- `action_type = reject` implies `feasible = false` and at least one constraint flag is `false`.
- `action_type = suggest_alternative` implies `feasible = false` and at least one of the suggestion fields carries a non-default value.

Canonical example:

```json
{
  "action_type": "suggest_alternative",
  "feasible": false,
  "budget_ok": true,
  "equipment_ok": false,
  "reagents_ok": true,
  "schedule_ok": true,
  "staff_ok": true,
  "suggested_technique": "manual_cell_counting",
  "suggested_sample_size": 32,
  "suggested_controls": ["vehicle_control", "positive_control"],
  "explanation": "The plate reader is fully booked, so use manual counting and reduce the sample size to stay within the timeline."
}
```

### ScientistObservation

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `paper_title` | `str` | yes | Study title |
| `paper_hypothesis` | `str` | yes | Core hypothesis being replicated |
| `paper_method` | `str` | yes | Short method summary |
| `paper_key_finding` | `str` | yes | Main finding being targeted |
| `experiment_goal` | `str` | yes | What the scientist is trying to preserve |
| `conversation_history` | `list[ConversationEntry]` | yes | Empty list at reset |
| `current_protocol` | `Protocol \| null` | yes | `null` until a protocol exists |
| `round_number` | `int` | yes | Zero-based current round |
| `max_rounds` | `int` | yes | Max allowed rounds in the episode |

Canonical example:

```json
{
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}
```

### LabManagerObservation

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `budget_total` | `float` | yes | Initial budget for the episode |
| `budget_remaining` | `float` | yes | Current remaining budget |
| `equipment_available` | `list[str]` | yes | Equipment that can be used |
| `equipment_booked` | `list[str]` | yes | Equipment unavailable due to booking |
| `reagents_in_stock` | `list[str]` | yes | Available reagents |
| `reagents_out_of_stock` | `list[str]` | yes | Required but unavailable reagents |
| `staff_count` | `int` | yes | Available staff count |
| `time_limit_days` | `int` | yes | Whole calendar days remaining |
| `safety_restrictions` | `list[str]` | yes | Constraints such as banned solvents or assay restrictions |
| `conversation_history` | `list[ConversationEntry]` | yes | Empty list at reset |
| `current_protocol` | `Protocol \| null` | yes | `null` until a protocol exists |
| `round_number` | `int` | yes | Zero-based current round |
| `max_rounds` | `int` | yes | Max allowed rounds in the episode |

Canonical example:

```json
{
  "budget_total": 1200.0,
  "budget_remaining": 1200.0,
  "equipment_available": ["co2_incubator", "microscope"],
  "equipment_booked": ["plate_reader"],
  "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
  "reagents_out_of_stock": ["wst1"],
  "staff_count": 2,
  "time_limit_days": 7,
  "safety_restrictions": ["no_radioactive_reagents"],
  "conversation_history": [],
  "current_protocol": null,
  "round_number": 0,
  "max_rounds": 6
}
```

### Observation

Wrapper behavior:

- Serialized `Observation` objects always include both top-level keys: `scientist` and `lab_manager`.
- In shared environment state, replay, and API payloads, both branches should normally be populated.
- When a consumer is intentionally given only one role view, the non-owned branch must be `null`, not omitted.

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `scientist` | `ScientistObservation \| null` | yes | Scientist-side view |
| `lab_manager` | `LabManagerObservation \| null` | yes | Lab-manager-side view |

Canonical example:

```json
{
  "scientist": {
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  },
  "lab_manager": {
    "budget_total": 1200.0,
    "budget_remaining": 1200.0,
    "equipment_available": ["co2_incubator", "microscope"],
    "equipment_booked": ["plate_reader"],
    "reagents_in_stock": ["dmso", "drug_x", "culture_media"],
    "reagents_out_of_stock": ["wst1"],
    "staff_count": 2,
    "time_limit_days": 7,
    "safety_restrictions": ["no_radioactive_reagents"],
    "conversation_history": [],
    "current_protocol": null,
    "round_number": 0,
    "max_rounds": 6
  }
}
```

### StepResult

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `observation` | `Observation \| null` | yes | Present on normal steps and terminal steps; may be `null` only on hard failure |
| `reward` | `float` | yes | Episode reward after the step; terminal reward on final step |
| `done` | `bool` | yes | Whether the episode is terminal |
| `info` | `dict` | yes | Flexible metadata object |

Reserved `info` keys:

- `agreement_reached`: `bool`
- `error`: `str | null`
- `reward_breakdown`: `RewardBreakdown | null`
- `judge_notes`: `str | null`
- `verdict`: `str | null`

Canonical example:

```json
{
  "observation": {
    "scientist": null,
    "lab_manager": null
  },
  "reward": 6.72,
  "done": true,
  "info": {
    "agreement_reached": true,
    "error": null,
    "reward_breakdown": {
      "rigor": 0.9,
      "feasibility": 0.8,
      "fidelity": 0.85,
      "efficiency_bonus": 0.25,
      "communication_bonus": 0.15,
      "penalties": {
        "invalid_action": 0.0,
        "timeout": 0.0
      }
    },
    "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
    "verdict": "accept"
  }
}
```

### EpisodeState

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `seed` | `int` | yes | Deterministic episode seed |
| `scenario_template` | `str` | yes | Scenario family identifier |
| `difficulty` | `str` | yes | `easy`, `medium`, or `hard` |
| `paper_title` | `str` | yes | Study title |
| `paper_hypothesis` | `str` | yes | Core hypothesis |
| `paper_method` | `str` | yes | Method summary |
| `paper_key_finding` | `str` | yes | Main finding |
| `experiment_goal` | `str` | yes | Goal preserved through negotiation |
| `lab_budget_total` | `float` | yes | Initial budget |
| `lab_budget_remaining` | `float` | yes | Remaining budget |
| `lab_equipment` | `list[str]` | yes | Equipment state |
| `lab_reagents` | `list[str]` | yes | Reagent state |
| `lab_staff_count` | `int` | yes | Available staff count |
| `lab_time_limit_days` | `int` | yes | Whole calendar days remaining |
| `current_protocol` | `Protocol \| null` | yes | Current agreed or latest proposed protocol |
| `conversation_history` | `list[ConversationEntry]` | yes | Negotiation history |
| `round_number` | `int` | yes | Zero-based round counter |
| `max_rounds` | `int` | yes | Maximum rounds allowed |
| `done` | `bool` | yes | Terminal flag |
| `agreement_reached` | `bool` | yes | Whether both sides reached agreement |
| `reward` | `float` | yes | Final total reward or `0.0` until terminal scoring |
| `rigor_score` | `float` | yes | Final component score or `0.0` until terminal scoring |
| `feasibility_score` | `float` | yes | Final component score or `0.0` until terminal scoring |
| `fidelity_score` | `float` | yes | Final component score or `0.0` until terminal scoring |

Canonical example:

```json
{
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "paper_title": "Drug X reduces glioblastoma cell viability",
  "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
  "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
  "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
  "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
  "lab_budget_total": 1200.0,
  "lab_budget_remaining": 850.0,
  "lab_equipment": ["co2_incubator", "microscope"],
  "lab_reagents": ["dmso", "drug_x", "culture_media"],
  "lab_staff_count": 2,
  "lab_time_limit_days": 7,
  "current_protocol": {
    "sample_size": 32,
    "controls": ["vehicle_control", "positive_control"],
    "technique": "manual_cell_counting",
    "duration_days": 5,
    "required_equipment": ["microscope", "co2_incubator"],
    "required_reagents": ["dmso", "drug_x", "culture_media"],
    "rationale": "Uses available equipment while preserving control structure."
  },
  "conversation_history": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    }
  ],
  "round_number": 1,
  "max_rounds": 6,
  "done": false,
  "agreement_reached": false,
  "reward": 0.0,
  "rigor_score": 0.0,
  "feasibility_score": 0.0,
  "fidelity_score": 0.0
}
```

### EpisodeLog

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `episode_id` | `str` | yes | Stable replay identifier |
| `seed` | `int` | yes | Episode seed |
| `scenario_template` | `str` | yes | Scenario family identifier |
| `difficulty` | `str` | yes | `easy`, `medium`, or `hard` |
| `final_state` | `EpisodeState \| null` | yes | Must be populated for completed episodes |
| `transcript` | `list[ConversationEntry]` | yes | Replayable transcript |
| `reward_breakdown` | `RewardBreakdown` | yes | Final reward components |
| `total_reward` | `float` | yes | Final total reward |
| `rounds_used` | `int` | yes | Number of completed rounds |
| `agreement_reached` | `bool` | yes | Final agreement flag |
| `judge_notes` | `str` | yes | Human-readable audit summary |
| `verdict` | `str` | yes | One of `accept`, `revise`, `reject` |

Canonical example:

```json
{
  "episode_id": "cell_biology-17-medium-0001",
  "seed": 17,
  "scenario_template": "cell_biology",
  "difficulty": "medium",
  "final_state": {
    "seed": 17,
    "scenario_template": "cell_biology",
    "difficulty": "medium",
    "paper_title": "Drug X reduces glioblastoma cell viability",
    "paper_hypothesis": "Drug X reduces viability in a dose-dependent manner.",
    "paper_method": "96-well viability assay with 24h incubation and absorbance readout.",
    "paper_key_finding": "The highest dose reduced viability by about 40 percent.",
    "experiment_goal": "Replicate the dose-response trend without dropping essential controls.",
    "lab_budget_total": 1200.0,
    "lab_budget_remaining": 850.0,
    "lab_equipment": ["co2_incubator", "microscope"],
    "lab_reagents": ["dmso", "drug_x", "culture_media"],
    "lab_staff_count": 2,
    "lab_time_limit_days": 7,
    "current_protocol": {
      "sample_size": 32,
      "controls": ["vehicle_control", "positive_control"],
      "technique": "manual_cell_counting",
      "duration_days": 5,
      "required_equipment": ["microscope", "co2_incubator"],
      "required_reagents": ["dmso", "drug_x", "culture_media"],
      "rationale": "Uses available equipment while preserving control structure."
    },
    "conversation_history": [
      {
        "role": "scientist",
        "message": "I propose a manual counting protocol that keeps both controls.",
        "round_number": 0,
        "action_type": "propose_protocol"
      },
      {
        "role": "lab_manager",
        "message": "This alternative is feasible with current equipment and budget.",
        "round_number": 0,
        "action_type": "accept"
      }
    ],
    "round_number": 1,
    "max_rounds": 6,
    "done": true,
    "agreement_reached": true,
    "reward": 6.72,
    "rigor_score": 0.9,
    "feasibility_score": 0.8,
    "fidelity_score": 0.85
  },
  "transcript": [
    {
      "role": "scientist",
      "message": "I propose a manual counting protocol that keeps both controls.",
      "round_number": 0,
      "action_type": "propose_protocol"
    },
    {
      "role": "lab_manager",
      "message": "This alternative is feasible with current equipment and budget.",
      "round_number": 0,
      "action_type": "accept"
    }
  ],
  "reward_breakdown": {
    "rigor": 0.9,
    "feasibility": 0.8,
    "fidelity": 0.85,
    "efficiency_bonus": 0.25,
    "communication_bonus": 0.15,
    "penalties": {
      "invalid_action": 0.0,
      "timeout": 0.0
    }
  },
  "total_reward": 6.72,
  "rounds_used": 1,
  "agreement_reached": true,
  "judge_notes": "Controls were preserved and the substitutions remained scientifically acceptable.",
  "verdict": "accept"
}
```

## Sign-off

| Owner | Status | Notes |
| --- | --- | --- |
| Person B (Ayush) | signed off | Draft matches current stubs and downstream parser needs |
| Kian (Person A) | signed off | Validator and environment-owner review completed; contract is frozen for `MOD 01`, `MOD 03`, `FND 09`, and downstream parser work |