---
title: SimLab — Lab Automation RL Environment
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: "4.0.0"
app_port: 7860
pinned: false
---

# SimLab — Lab Automation RL Environment

A self-contained Gymnasium-style reinforcement learning environment that
simulates **any** wet-lab experiment workflow. The experiment type is defined by
an **ExperimentSpec** (protocol presets, inventory, rewards, outcome model). The
default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom
assays, or any protocol-discovery task under real-world constraints: limited
time, budget, and finite reagent inventory.

Built for the **OpenEnv** ecosystem so it can be wrapped as an HTTP-served,
sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.

**Integrations:** [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) · [Hugging Face](https://huggingface.co/openenv)

---

## What the Environment Simulates

Each episode represents a scientist at the bench trying to get a successful
result. The environment:

- **Samples a hidden optimal protocol** on every `reset()` — the agent never
  sees it directly.
- Offers **protocol presets** (defined in the spec) the agent can choose from.
- Lets the agent **run assays** that consume reagents and time, returning
  outcomes (e.g. success / partial / fail) from the spec’s outcome model.
- **Custom protocols:** Specs with `evaluate_custom_protocol` (PCR, ELISA) allow
  **arbitrary** protocol parameters via `env.run_assay_with_protocol(protocol_dict)` — agents can generate and try any valid params, not just presets.
- Allows **ordering more reagents** (costs money and time) and **waiting**.
- Terminates when the agent calls **finish**, runs out of time/budget, or
  exhausts inventory with no way to reorder.

**Default (PCR):** 12 presets (3 temps × 2 cycle counts × 2 reagent ratios);
probabilistic success based on distance to hidden optimum. Other experiments
use their own presets and outcome logic via a custom `ExperimentSpec`.

### Reward structure (default PCR)

The reward encodes real lab trade-offs (all configurable per spec):

| Signal | Value |
|---|---|
| Immediate assay result: success | +15 |
| Immediate assay result: partial | +5 |
| Per-assay cost penalty | -3 |
| Terminal bonus (best = success) | +60 |
| Terminal bonus (best = partial) | +25 |
| Terminal penalty (no success/partial) | -20 |
| Time penalty | -0.25 per minute elapsed |

A good agent learns to explore efficiently — try a few presets, read the
signals from partial/success outcomes, and converge on the best protocol before
finishing.

---

## Architecture

```
simlab/
├── pyproject.toml              # Package metadata & dependencies
├── README.md
├── lab_env/
│   ├── __init__.py
│   ├── spec.py                 # ExperimentSpec, pcr_experiment_spec()
│   ├── env.py                  # LabEnv (Gymnasium interface, any experiment)
│   └── openenv_adapter.py      # OpenEnv types, LabEnvironment, HTTP app
├── agents/
│   ├── __init__.py
│   ├── naive_agent.py          # Random-preset baseline
│   ├── rl_agent.py             # REINFORCE policy-gradient agent (PyTorch)
│   ├── research_llm_agent.py   # LLM researcher: presets + research
│   └── research_generate_agent.py  # Research → generate any protocol → run → learn from feedback
├── knowledge/
│   └── pcr_protocols.json      # Fake “papers” for web_search tool (demo)
├── demo/
│   └── streamlit_app.py        # Live research dashboard + 3-agent comparison
└── scripts/
    ├── run_naive_baseline.py   # Evaluate the naive agent
    ├── train_and_eval_agent.py # Train REINFORCE & compare both agents
    ├── compare_all_agents.py  # Benchmark Naive vs RL vs Research LLM
    ├── run_research_generate_agent.py  # Research → generate protocol → run → learn (any protocol)
    └── demo_research_agent.py  # Terminal demo of research agent
```

### Defining a new experiment

Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with:

- **presets** — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
- **inventory_items** / **orderable_items** — what the lab tracks and can reorder.
- **initial_inventory**, **order_costs**, **result_labels**.
- **sample_hidden_optimum(rng)** — returns hidden optimal state (e.g. ideal temp/cycles).
- **sample_assay_result(hidden, preset_idx, presets, rng)** — returns outcome label.
- **evaluate_custom_protocol(hidden, protocol_dict, rng)** (optional) — score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`.
- **protocol_param_schema** (optional) — dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`).

Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`.

### Agent design

The **REINFORCE agent** decomposes the problem into a learned and a scripted
part:

- **Learned** — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a
  distribution over the 12 protocol presets.  Trained with REINFORCE + entropy
  bonus + running-mean baseline.
- **Scripted** — the episode loop (setup → run assay → check result → order
  if needed → finish on success) is fixed so the agent focuses on the hard
  decision: *which* preset to try.

This decomposition lets training converge in ~2000 episodes (a few seconds on
CPU) while clearly beating the random-preset naive baseline.

The **Research LLM agent** adds a self-improving lab scientist: it researches
protocols (via a `web_search` tool over a local knowledge base), hypothesizes
new parameter combinations (mapped to presets), runs experiments in LabEnv, and
updates internal knowledge from results.

The **Research & Generate agent** (`research_generate_agent.py`) goes further: it
**researches** (web_search), **generates** protocol parameters for **any** valid
values (not limited to presets), **runs** them via `env.run_assay_with_protocol(protocol_dict)`,
and **learns from feedback** — each run's (protocol, result, reward) is passed
into the next trial so the agent improves over the episode. Works with any spec
that has `evaluate_custom_protocol` (PCR, ELISA). Run it with:

```bash
export OPENAI_API_KEY=your_key
python scripts/run_research_generate_agent.py --episodes 5 --verbose
```

Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json`
for more experiment types so research has literature to search.

### Training on different protocol sets

Each **protocol** (PCR, ELISA, or a custom spec) has its own **presets** and outcome model. The RL agent can train on any of them so you get one policy per protocol set.

- **One agent per protocol:** Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
- **Script:** `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`):

  ```bash
  python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
  ```

- **Using agents to create different protocol sets:** You can define new protocol sets in two ways:
  1. **In code:** Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`.
  2. **Generated presets:** Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already “creates” protocols at run time (arbitrary params); to **train** on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.

---

## Quick Start

### Install

```bash
pip install -e .
```

Or just ensure `numpy`, `torch`, and `gymnasium` are installed.

### Run the naive baseline

```bash
python scripts/run_naive_baseline.py --episodes 200
```

### Train the REINFORCE agent and compare

```bash
python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100
```

### Next.js UI + API server (general UI)

Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):

```bash
# Terminal 1: Python API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2: Next.js frontend (v0ap)
cd v0ap && pnpm dev
```

Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows **Run with AI Agent**, **Run Research Agent** (research → hypothesize → experiment → learn), and **Run Naive Baseline**. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent.

---

## Hackathon / live demo — how to show the RL

**Pitch in one line:** *“We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”*

### Setup (do this before going on stage)

1. **Start both servers** (two terminals):
   ```bash
   # Terminal 1 — API (agents + LabEnv)
   uvicorn server.app:app --host 0.0.0.0 --port 8000

   # Terminal 2 — UI
   cd v0ap && pnpm dev
   ```
2. Open **http://localhost:3000** (or the URL Next.js prints).
3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate.

### Demo flow A — “Watch the RL agent learn” (~2 min)

1. Go to **Training** (`/training`).
2. Say: *“This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”*
3. Set **episodes to 500** (slider) for a short run — training finishes in under a minute on a laptop.
4. Click **Start Training**. Point at:
   - **Progress** and “Episode X of 500”.
   - **Chart**: reward and success rate climbing over episodes.
5. When it finishes: *“Here’s the comparison: REINFORCE vs random baseline.”* Show the table (success rate, reward, time).

### Demo flow B — “Compare agents in the lab” (~1–2 min)

1. Go to **PCR Amplification** (`/workflows/pcr-amplification`).
2. Say: *“Each run is one scientist trying to get a successful experiment under time and budget.”*
3. Click **Run Naive Baseline** — timeline fills with random preset choices and results.
4. Then click **Run with AI Agent** (uses the policy you trained in flow A, or a default). Point at the timeline: *“The learned agent picks protocols more purposefully and often gets success sooner.”*
5. If you have an API key: click **Research & Generate (any protocol)** — *“This one researches, proposes parameters, runs them, and learns from feedback.”*

### Tips

- **Keep training short on stage:** 500 episodes is enough to show learning; 1000 if you have time.
- **If the UI is slow:** Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
- **Backup:** Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
- **Talking points:** Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.

### Demo script (optional)

From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually:

```bash
# Terminal 1
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2
cd v0ap && pnpm dev
# Open http://localhost:3000 → /training or /workflows/pcr-amplification
```

---

### Research LLM agent (optional, Streamlit)

Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`:

```bash
pip install -e ".[demo]"
export OPENAI_API_KEY=your_key
streamlit run demo/streamlit_app.py
```

The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:

```bash
python scripts/compare_all_agents.py --eval-episodes 50
```

### Sample output (train & eval)

```
Metric                  REINFORCE        Naive
----------------------------------------------
Avg reward                   15.7          5.0
Success rate                53.0%        43.0%
Partial rate                19.0%        15.0%
Avg time                    62.8m        63.0m
Avg cost                     $0.0         $0.0
Avg steps                     7.0          7.0
----------------------------------------------
```

---

## OpenEnv & Hugging Face — How to show and use

SimLab is built for the **OpenEnv** ecosystem and can be served over HTTP and deployed to **Hugging Face** as a standardized agentic environment.

### How SimLab uses OpenEnv

- **`openenv-core`** is a required dependency (`pyproject.toml`).
- **`lab_env/openenv_adapter.py`** wraps `LabEnv` in the OpenEnv `Environment` interface:
  - **Types:** `LabAction`, `LabObservation`, `LabState`, `LabEnvironment`
  - **`create_app(LabEnvironment, LabAction, LabObservation, ...)`** — FastAPI app with OpenEnv endpoints

### Run the OpenEnv HTTP server

```bash
uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
```

This exposes standard OpenEnv endpoints:

| Endpoint        | Description                    |
|----------------|--------------------------------|
| `POST /reset`  | Reset environment, get initial observation |
| `POST /step`   | Send action, get next observation & reward |
| `GET /state`   | Current state snapshot        |
| `GET /metadata`| Environment name, version, docs |
| WebSocket `/ws`| Persistent session (optional)  |

Up to `max_concurrent_envs=4` sessions are supported.

### Call the OpenEnv server (show usage)

From another process or machine, you can drive SimLab over HTTP:

```bash
# Reset (start new episode)
curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .

# Step (e.g. action 0 = setup preset 0)
curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .

# Get current state
curl -s http://localhost:8000/state | jq .
```

From Python (e.g. for demos or integration):

```python
import requests

BASE = "http://localhost:8000"

# Reset
r = requests.post(f"{BASE}/reset", json={"seed": 42})
obs = r.json()  # observation with metadata (obs_vector, info, etc.)

# Step: setup preset 0, then run assay (action 12 for PCR)
requests.post(f"{BASE}/step", json={"action": 0})
r = requests.post(f"{BASE}/step", json={"action": 12})
print(r.json())  # observation, reward, done

# State
state = requests.get(f"{BASE}/state").json()
print(state["step_count"], state["best_result"])
```

### Deploy to Hugging Face

To **show SimLab on the Hugging Face Hub** as an OpenEnv environment:

1. **Option A — Hugging Face Space (Docker)**  
   Create a Space with **Docker** as the SDK. Use a `Dockerfile` that installs SimLab and runs:
   ```dockerfile
   CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
   ```
   Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint.

2. **Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)**  
   The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and **`openenv push`** to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure.

3. **Link your repo on the Hub**  
   In your SimLab repo or any Hugging Face model/Space card, set the **Repository** and **Documentation** URLs to your GitHub repo and add a tag or short description such as: *"OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."*

### References

- [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) — framework overview and APIs  
- [OpenEnv on Hugging Face](https://huggingface.co/openenv) — OpenEnv org and environments  
- [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) — build, validate, push to Hub

---

## Environment API Reference

```python
from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec

# Default: PCR experiment (same as before)
env = LabEnv()
# Or any experiment from a spec:
# env = LabEnv(spec=my_experiment_spec)

obs, info = env.reset(seed=42)

# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
#   [0]    step_index (normalised)
#   [1]    elapsed_minutes (normalised)
#   [2]    remaining_budget (normalised)
#   [3..]  inventory (one per spec.inventory_items, normalised)
#   [...]  last_result one-hot (len(spec.result_labels))
#   [...]  has_setup, current_preset_idx (norm), best_result_score

# Actions (Discrete, from spec):
#   0 .. num_presets-1   setup_reaction(preset_index)
#   num_presets          run_assay
#   num_presets+1 ..     order_reagents (one per orderable_items)
#   ...                  wait, finish

obs, reward, terminated, truncated, info = env.step(0)    # setup preset 0
obs, reward, terminated, truncated, info = env.step(12)   # run assay (PCR)
obs, reward, terminated, truncated, info = env.step(17)   # finish (PCR)

# Custom protocol (any params; spec must have evaluate_custom_protocol)
obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})
```

---

## License

MIT