title: SimLab — Lab Automation RL Environment
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 4.0.0
app_port: 7860
pinned: false
SimLab — Lab Automation RL Environment
A self-contained Gymnasium-style reinforcement learning environment that simulates any wet-lab experiment workflow. The experiment type is defined by an ExperimentSpec (protocol presets, inventory, rewards, outcome model). The default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom assays, or any protocol-discovery task under real-world constraints: limited time, budget, and finite reagent inventory.
Built for the OpenEnv ecosystem so it can be wrapped as an HTTP-served, sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.
Integrations: OpenEnv · Hugging Face
What the Environment Simulates
Each episode represents a scientist at the bench trying to get a successful result. The environment:
- Samples a hidden optimal protocol on every
reset()— the agent never sees it directly. - Offers protocol presets (defined in the spec) the agent can choose from.
- Lets the agent run assays that consume reagents and time, returning outcomes (e.g. success / partial / fail) from the spec’s outcome model.
- Custom protocols: Specs with
evaluate_custom_protocol(PCR, ELISA) allow arbitrary protocol parameters viaenv.run_assay_with_protocol(protocol_dict)— agents can generate and try any valid params, not just presets. - Allows ordering more reagents (costs money and time) and waiting.
- Terminates when the agent calls finish, runs out of time/budget, or exhausts inventory with no way to reorder.
Default (PCR): 12 presets (3 temps × 2 cycle counts × 2 reagent ratios);
probabilistic success based on distance to hidden optimum. Other experiments
use their own presets and outcome logic via a custom ExperimentSpec.
Reward structure (default PCR)
The reward encodes real lab trade-offs (all configurable per spec):
| Signal | Value |
|---|---|
| Immediate assay result: success | +15 |
| Immediate assay result: partial | +5 |
| Per-assay cost penalty | -3 |
| Terminal bonus (best = success) | +60 |
| Terminal bonus (best = partial) | +25 |
| Terminal penalty (no success/partial) | -20 |
| Time penalty | -0.25 per minute elapsed |
A good agent learns to explore efficiently — try a few presets, read the signals from partial/success outcomes, and converge on the best protocol before finishing.
Architecture
simlab/
├── pyproject.toml # Package metadata & dependencies
├── README.md
├── lab_env/
│ ├── __init__.py
│ ├── spec.py # ExperimentSpec, pcr_experiment_spec()
│ ├── env.py # LabEnv (Gymnasium interface, any experiment)
│ └── openenv_adapter.py # OpenEnv types, LabEnvironment, HTTP app
├── agents/
│ ├── __init__.py
│ ├── naive_agent.py # Random-preset baseline
│ ├── rl_agent.py # REINFORCE policy-gradient agent (PyTorch)
│ ├── research_llm_agent.py # LLM researcher: presets + research
│ └── research_generate_agent.py # Research → generate any protocol → run → learn from feedback
├── knowledge/
│ └── pcr_protocols.json # Fake “papers” for web_search tool (demo)
├── demo/
│ └── streamlit_app.py # Live research dashboard + 3-agent comparison
└── scripts/
├── run_naive_baseline.py # Evaluate the naive agent
├── train_and_eval_agent.py # Train REINFORCE & compare both agents
├── compare_all_agents.py # Benchmark Naive vs RL vs Research LLM
├── run_research_generate_agent.py # Research → generate protocol → run → learn (any protocol)
└── demo_research_agent.py # Terminal demo of research agent
Defining a new experiment
Implement an ExperimentSpec in lab_env/spec.py (or your own module) with:
- presets — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
- inventory_items / orderable_items — what the lab tracks and can reorder.
- initial_inventory, order_costs, result_labels.
- sample_hidden_optimum(rng) — returns hidden optimal state (e.g. ideal temp/cycles).
- sample_assay_result(hidden, preset_idx, presets, rng) — returns outcome label.
- evaluate_custom_protocol(hidden, protocol_dict, rng) (optional) — score an arbitrary protocol dict so agents can run any params via
env.run_assay_with_protocol(protocol_dict). - protocol_param_schema (optional) — dict describing params for codegen/LLM (e.g.
{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}).
Then use LabEnv(spec=my_spec) or pass spec into the OpenEnv LabEnvironment(spec=my_spec).
Agent design
The REINFORCE agent decomposes the problem into a learned and a scripted part:
- Learned — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a distribution over the 12 protocol presets. Trained with REINFORCE + entropy bonus + running-mean baseline.
- Scripted — the episode loop (setup → run assay → check result → order if needed → finish on success) is fixed so the agent focuses on the hard decision: which preset to try.
This decomposition lets training converge in ~2000 episodes (a few seconds on CPU) while clearly beating the random-preset naive baseline.
The Research LLM agent adds a self-improving lab scientist: it researches
protocols (via a web_search tool over a local knowledge base), hypothesizes
new parameter combinations (mapped to presets), runs experiments in LabEnv, and
updates internal knowledge from results.
The Research & Generate agent (research_generate_agent.py) goes further: it
researches (web_search), generates protocol parameters for any valid
values (not limited to presets), runs them via env.run_assay_with_protocol(protocol_dict),
and learns from feedback — each run's (protocol, result, reward) is passed
into the next trial so the agent improves over the episode. Works with any spec
that has evaluate_custom_protocol (PCR, ELISA). Run it with:
export OPENAI_API_KEY=your_key
python scripts/run_research_generate_agent.py --episodes 5 --verbose
Use --workflow elisa-readout for ELISA. Add knowledge/{name}_protocols.json
for more experiment types so research has literature to search.
Training on different protocol sets
Each protocol (PCR, ELISA, or a custom spec) has its own presets and outcome model. The RL agent can train on any of them so you get one policy per protocol set.
One agent per protocol: Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
Script:
scripts/train_per_protocol.pytrains a separate REINFORCE agent for each workflow and saves checkpoints (e.g.checkpoints/pcr-amplification.pt,checkpoints/elisa-readout.pt):python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500Using agents to create different protocol sets: You can define new protocol sets in two ways:
- In code: Add a new
ExperimentSpecinlab_env/spec.py(or your own module): definepresets,sample_hidden_optimum,sample_assay_result, and optionallyevaluate_custom_protocol+protocol_param_schema. Register it inget_spec_for_workflow()and runtrain_per_protocol.py --workflows your-workflow-id. - Generated presets: Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an
ExperimentSpecand train an agent withReinforceAgent(spec=my_spec)onLabEnv(spec=my_spec). The Research & Generate agent already “creates” protocols at run time (arbitrary params); to train on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.
- In code: Add a new
Quick Start
Install
pip install -e .
Or just ensure numpy, torch, and gymnasium are installed.
Run the naive baseline
python scripts/run_naive_baseline.py --episodes 200
Train the REINFORCE agent and compare
python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100
Next.js UI + API server (general UI)
Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):
# Terminal 1: Python API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Terminal 2: Next.js frontend (v0ap)
cd v0ap && pnpm dev
Then open the workflow run page (e.g. /workflows/pcr-amplification). The UI shows Run with AI Agent, Run Research Agent (research → hypothesize → experiment → learn), and Run Naive Baseline. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set OPENAI_API_KEY if you use the Research agent.
Hackathon / live demo — how to show the RL
Pitch in one line: “We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”
Setup (do this before going on stage)
- Start both servers (two terminals):
# Terminal 1 — API (agents + LabEnv) uvicorn server.app:app --host 0.0.0.0 --port 8000 # Terminal 2 — UI cd v0ap && pnpm dev - Open http://localhost:3000 (or the URL Next.js prints).
- Optional: set
OPENAI_API_KEYif you want to demo Research / Research & Generate.
Demo flow A — “Watch the RL agent learn” (~2 min)
- Go to Training (
/training). - Say: “This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”
- Set episodes to 500 (slider) for a short run — training finishes in under a minute on a laptop.
- Click Start Training. Point at:
- Progress and “Episode X of 500”.
- Chart: reward and success rate climbing over episodes.
- When it finishes: “Here’s the comparison: REINFORCE vs random baseline.” Show the table (success rate, reward, time).
Demo flow B — “Compare agents in the lab” (~1–2 min)
- Go to PCR Amplification (
/workflows/pcr-amplification). - Say: “Each run is one scientist trying to get a successful experiment under time and budget.”
- Click Run Naive Baseline — timeline fills with random preset choices and results.
- Then click Run with AI Agent (uses the policy you trained in flow A, or a default). Point at the timeline: “The learned agent picks protocols more purposefully and often gets success sooner.”
- If you have an API key: click Research & Generate (any protocol) — “This one researches, proposes parameters, runs them, and learns from feedback.”
Tips
- Keep training short on stage: 500 episodes is enough to show learning; 1000 if you have time.
- If the UI is slow: Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
- Backup: Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
- Talking points: Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.
Demo script (optional)
From repo root, run ./scripts/demo_hackathon.sh for a short checklist and the option to start the API in that terminal. Or start both manually:
# Terminal 1
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Terminal 2
cd v0ap && pnpm dev
# Open http://localhost:3000 → /training or /workflows/pcr-amplification
Research LLM agent (optional, Streamlit)
Install demo dependencies (openai, streamlit) and set OPENAI_API_KEY:
pip install -e ".[demo]"
export OPENAI_API_KEY=your_key
streamlit run demo/streamlit_app.py
The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:
python scripts/compare_all_agents.py --eval-episodes 50
Sample output (train & eval)
Metric REINFORCE Naive
----------------------------------------------
Avg reward 15.7 5.0
Success rate 53.0% 43.0%
Partial rate 19.0% 15.0%
Avg time 62.8m 63.0m
Avg cost $0.0 $0.0
Avg steps 7.0 7.0
----------------------------------------------
OpenEnv & Hugging Face — How to show and use
SimLab is built for the OpenEnv ecosystem and can be served over HTTP and deployed to Hugging Face as a standardized agentic environment.
How SimLab uses OpenEnv
openenv-coreis a required dependency (pyproject.toml).lab_env/openenv_adapter.pywrapsLabEnvin the OpenEnvEnvironmentinterface:- Types:
LabAction,LabObservation,LabState,LabEnvironment create_app(LabEnvironment, LabAction, LabObservation, ...)— FastAPI app with OpenEnv endpoints
- Types:
Run the OpenEnv HTTP server
uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
This exposes standard OpenEnv endpoints:
| Endpoint | Description |
|---|---|
POST /reset |
Reset environment, get initial observation |
POST /step |
Send action, get next observation & reward |
GET /state |
Current state snapshot |
GET /metadata |
Environment name, version, docs |
WebSocket /ws |
Persistent session (optional) |
Up to max_concurrent_envs=4 sessions are supported.
Call the OpenEnv server (show usage)
From another process or machine, you can drive SimLab over HTTP:
# Reset (start new episode)
curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .
# Step (e.g. action 0 = setup preset 0)
curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .
# Get current state
curl -s http://localhost:8000/state | jq .
From Python (e.g. for demos or integration):
import requests
BASE = "http://localhost:8000"
# Reset
r = requests.post(f"{BASE}/reset", json={"seed": 42})
obs = r.json() # observation with metadata (obs_vector, info, etc.)
# Step: setup preset 0, then run assay (action 12 for PCR)
requests.post(f"{BASE}/step", json={"action": 0})
r = requests.post(f"{BASE}/step", json={"action": 12})
print(r.json()) # observation, reward, done
# State
state = requests.get(f"{BASE}/state").json()
print(state["step_count"], state["best_result"])
Deploy to Hugging Face
To show SimLab on the Hugging Face Hub as an OpenEnv environment:
Option A — Hugging Face Space (Docker)
Create a Space with Docker as the SDK. Use aDockerfilethat installs SimLab and runs:CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g.
https://huggingface.co/spaces/your-username/simlab-env) is then the public OpenEnv endpoint.Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)
The OpenEnv Packaging & Deploying guide usesopenenv init,openenv build, andopenenv pushto deploy to the Hub. SimLab currently usesopenenv-coreand a custom adapter; to useopenenv push, you would add the expected layout (e.g.openenv.yaml,server/with Dockerfile) and wire the existingLabEnvironment+create_appinto that structure.Link your repo on the Hub
In your SimLab repo or any Hugging Face model/Space card, set the Repository and Documentation URLs to your GitHub repo and add a tag or short description such as: "OpenEnv-compatible lab automation environment; run withuvicorn lab_env.openenv_adapter:appand connect via POST /reset, POST /step."
References
- OpenEnv documentation — framework overview and APIs
- OpenEnv on Hugging Face — OpenEnv org and environments
- Packaging & Deploying (OpenEnv) — build, validate, push to Hub
Environment API Reference
from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec
# Default: PCR experiment (same as before)
env = LabEnv()
# Or any experiment from a spec:
# env = LabEnv(spec=my_experiment_spec)
obs, info = env.reset(seed=42)
# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
# [0] step_index (normalised)
# [1] elapsed_minutes (normalised)
# [2] remaining_budget (normalised)
# [3..] inventory (one per spec.inventory_items, normalised)
# [...] last_result one-hot (len(spec.result_labels))
# [...] has_setup, current_preset_idx (norm), best_result_score
# Actions (Discrete, from spec):
# 0 .. num_presets-1 setup_reaction(preset_index)
# num_presets run_assay
# num_presets+1 .. order_reagents (one per orderable_items)
# ... wait, finish
obs, reward, terminated, truncated, info = env.step(0) # setup preset 0
obs, reward, terminated, truncated, info = env.step(12) # run assay (PCR)
obs, reward, terminated, truncated, info = env.step(17) # finish (PCR)
# Custom protocol (any params; spec must have evaluate_custom_protocol)
obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})
License
MIT