| --- |
| title: SimLab — Lab Automation RL Environment |
| emoji: 🧪 |
| colorFrom: blue |
| colorTo: green |
| sdk: docker |
| sdk_version: "4.0.0" |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # SimLab — Lab Automation RL Environment |
|
|
| A self-contained Gymnasium-style reinforcement learning environment that |
| simulates **any** wet-lab experiment workflow. The experiment type is defined by |
| an **ExperimentSpec** (protocol presets, inventory, rewards, outcome model). The |
| default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom |
| assays, or any protocol-discovery task under real-world constraints: limited |
| time, budget, and finite reagent inventory. |
|
|
| Built for the **OpenEnv** ecosystem so it can be wrapped as an HTTP-served, |
| sandboxed environment and uploaded to the OpenEnv hub on Hugging Face. |
|
|
| **Integrations:** [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) · [Hugging Face](https://huggingface.co/openenv) |
|
|
| --- |
|
|
| ## What the Environment Simulates |
|
|
| Each episode represents a scientist at the bench trying to get a successful |
| result. The environment: |
|
|
| - **Samples a hidden optimal protocol** on every `reset()` — the agent never |
| sees it directly. |
| - Offers **protocol presets** (defined in the spec) the agent can choose from. |
| - Lets the agent **run assays** that consume reagents and time, returning |
| outcomes (e.g. success / partial / fail) from the spec’s outcome model. |
| - **Custom protocols:** Specs with `evaluate_custom_protocol` (PCR, ELISA) allow |
| **arbitrary** protocol parameters via `env.run_assay_with_protocol(protocol_dict)` — agents can generate and try any valid params, not just presets. |
| - Allows **ordering more reagents** (costs money and time) and **waiting**. |
| - Terminates when the agent calls **finish**, runs out of time/budget, or |
| exhausts inventory with no way to reorder. |
|
|
| **Default (PCR):** 12 presets (3 temps × 2 cycle counts × 2 reagent ratios); |
| probabilistic success based on distance to hidden optimum. Other experiments |
| use their own presets and outcome logic via a custom `ExperimentSpec`. |
|
|
| ### Reward structure (default PCR) |
|
|
| The reward encodes real lab trade-offs (all configurable per spec): |
|
|
| | Signal | Value | |
| |---|---| |
| | Immediate assay result: success | +15 | |
| | Immediate assay result: partial | +5 | |
| | Per-assay cost penalty | -3 | |
| | Terminal bonus (best = success) | +60 | |
| | Terminal bonus (best = partial) | +25 | |
| | Terminal penalty (no success/partial) | -20 | |
| | Time penalty | -0.25 per minute elapsed | |
|
|
| A good agent learns to explore efficiently — try a few presets, read the |
| signals from partial/success outcomes, and converge on the best protocol before |
| finishing. |
|
|
| --- |
|
|
| ## Architecture |
|
|
| ``` |
| simlab/ |
| ├── pyproject.toml # Package metadata & dependencies |
| ├── README.md |
| ├── lab_env/ |
| │ ├── __init__.py |
| │ ├── spec.py # ExperimentSpec, pcr_experiment_spec() |
| │ ├── env.py # LabEnv (Gymnasium interface, any experiment) |
| │ └── openenv_adapter.py # OpenEnv types, LabEnvironment, HTTP app |
| ├── agents/ |
| │ ├── __init__.py |
| │ ├── naive_agent.py # Random-preset baseline |
| │ ├── rl_agent.py # REINFORCE policy-gradient agent (PyTorch) |
| │ ├── research_llm_agent.py # LLM researcher: presets + research |
| │ └── research_generate_agent.py # Research → generate any protocol → run → learn from feedback |
| ├── knowledge/ |
| │ └── pcr_protocols.json # Fake “papers” for web_search tool (demo) |
| ├── demo/ |
| │ └── streamlit_app.py # Live research dashboard + 3-agent comparison |
| └── scripts/ |
| ├── run_naive_baseline.py # Evaluate the naive agent |
| ├── train_and_eval_agent.py # Train REINFORCE & compare both agents |
| ├── compare_all_agents.py # Benchmark Naive vs RL vs Research LLM |
| ├── run_research_generate_agent.py # Research → generate protocol → run → learn (any protocol) |
| └── demo_research_agent.py # Terminal demo of research agent |
| ``` |
|
|
| ### Defining a new experiment |
|
|
| Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with: |
|
|
| - **presets** — list of protocol dicts (e.g. temperature, cycles, ratio for PCR). |
| - **inventory_items** / **orderable_items** — what the lab tracks and can reorder. |
| - **initial_inventory**, **order_costs**, **result_labels**. |
| - **sample_hidden_optimum(rng)** — returns hidden optimal state (e.g. ideal temp/cycles). |
| - **sample_assay_result(hidden, preset_idx, presets, rng)** — returns outcome label. |
| - **evaluate_custom_protocol(hidden, protocol_dict, rng)** (optional) — score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`. |
| - **protocol_param_schema** (optional) — dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`). |
| |
| Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`. |
| |
| ### Agent design |
| |
| The **REINFORCE agent** decomposes the problem into a learned and a scripted |
| part: |
| |
| - **Learned** — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a |
| distribution over the 12 protocol presets. Trained with REINFORCE + entropy |
| bonus + running-mean baseline. |
| - **Scripted** — the episode loop (setup → run assay → check result → order |
| if needed → finish on success) is fixed so the agent focuses on the hard |
| decision: *which* preset to try. |
| |
| This decomposition lets training converge in ~2000 episodes (a few seconds on |
| CPU) while clearly beating the random-preset naive baseline. |
| |
| The **Research LLM agent** adds a self-improving lab scientist: it researches |
| protocols (via a `web_search` tool over a local knowledge base), hypothesizes |
| new parameter combinations (mapped to presets), runs experiments in LabEnv, and |
| updates internal knowledge from results. |
| |
| The **Research & Generate agent** (`research_generate_agent.py`) goes further: it |
| **researches** (web_search), **generates** protocol parameters for **any** valid |
| values (not limited to presets), **runs** them via `env.run_assay_with_protocol(protocol_dict)`, |
| and **learns from feedback** — each run's (protocol, result, reward) is passed |
| into the next trial so the agent improves over the episode. Works with any spec |
| that has `evaluate_custom_protocol` (PCR, ELISA). Run it with: |
| |
| ```bash |
| export OPENAI_API_KEY=your_key |
| python scripts/run_research_generate_agent.py --episodes 5 --verbose |
| ``` |
| |
| Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json` |
| for more experiment types so research has literature to search. |
|
|
| ### Training on different protocol sets |
|
|
| Each **protocol** (PCR, ELISA, or a custom spec) has its own **presets** and outcome model. The RL agent can train on any of them so you get one policy per protocol set. |
|
|
| - **One agent per protocol:** Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA). |
| - **Script:** `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`): |
|
|
| ```bash |
| python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500 |
| ``` |
|
|
| - **Using agents to create different protocol sets:** You can define new protocol sets in two ways: |
| 1. **In code:** Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`. |
| 2. **Generated presets:** Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already “creates” protocols at run time (arbitrary params); to **train** on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it. |
|
|
| --- |
|
|
| ## Quick Start |
|
|
| ### Install |
|
|
| ```bash |
| pip install -e . |
| ``` |
|
|
| Or just ensure `numpy`, `torch`, and `gymnasium` are installed. |
|
|
| ### Run the naive baseline |
|
|
| ```bash |
| python scripts/run_naive_baseline.py --episodes 200 |
| ``` |
|
|
| ### Train the REINFORCE agent and compare |
|
|
| ```bash |
| python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100 |
| ``` |
|
|
| ### Next.js UI + API server (general UI) |
|
|
| Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend): |
|
|
| ```bash |
| # Terminal 1: Python API (agents + LabEnv) |
| uvicorn server.app:app --host 0.0.0.0 --port 8000 |
| |
| # Terminal 2: Next.js frontend (v0ap) |
| cd v0ap && pnpm dev |
| ``` |
|
|
| Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows **Run with AI Agent**, **Run Research Agent** (research → hypothesize → experiment → learn), and **Run Naive Baseline**. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent. |
|
|
| --- |
|
|
| ## Hackathon / live demo — how to show the RL |
|
|
| **Pitch in one line:** *“We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”* |
|
|
| ### Setup (do this before going on stage) |
|
|
| 1. **Start both servers** (two terminals): |
| ```bash |
| # Terminal 1 — API (agents + LabEnv) |
| uvicorn server.app:app --host 0.0.0.0 --port 8000 |
| |
| # Terminal 2 — UI |
| cd v0ap && pnpm dev |
| ``` |
| 2. Open **http://localhost:3000** (or the URL Next.js prints). |
| 3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate. |
|
|
| ### Demo flow A — “Watch the RL agent learn” (~2 min) |
|
|
| 1. Go to **Training** (`/training`). |
| 2. Say: *“This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”* |
| 3. Set **episodes to 500** (slider) for a short run — training finishes in under a minute on a laptop. |
| 4. Click **Start Training**. Point at: |
| - **Progress** and “Episode X of 500”. |
| - **Chart**: reward and success rate climbing over episodes. |
| 5. When it finishes: *“Here’s the comparison: REINFORCE vs random baseline.”* Show the table (success rate, reward, time). |
|
|
| ### Demo flow B — “Compare agents in the lab” (~1–2 min) |
|
|
| 1. Go to **PCR Amplification** (`/workflows/pcr-amplification`). |
| 2. Say: *“Each run is one scientist trying to get a successful experiment under time and budget.”* |
| 3. Click **Run Naive Baseline** — timeline fills with random preset choices and results. |
| 4. Then click **Run with AI Agent** (uses the policy you trained in flow A, or a default). Point at the timeline: *“The learned agent picks protocols more purposefully and often gets success sooner.”* |
| 5. If you have an API key: click **Research & Generate (any protocol)** — *“This one researches, proposes parameters, runs them, and learns from feedback.”* |
|
|
| ### Tips |
|
|
| - **Keep training short on stage:** 500 episodes is enough to show learning; 1000 if you have time. |
| - **If the UI is slow:** Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table. |
| - **Backup:** Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails. |
| - **Talking points:** Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback. |
|
|
| ### Demo script (optional) |
|
|
| From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually: |
|
|
| ```bash |
| # Terminal 1 |
| uvicorn server.app:app --host 0.0.0.0 --port 8000 |
| |
| # Terminal 2 |
| cd v0ap && pnpm dev |
| # Open http://localhost:3000 → /training or /workflows/pcr-amplification |
| ``` |
|
|
| --- |
|
|
| ### Research LLM agent (optional, Streamlit) |
|
|
| Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`: |
|
|
| ```bash |
| pip install -e ".[demo]" |
| export OPENAI_API_KEY=your_key |
| streamlit run demo/streamlit_app.py |
| ``` |
|
|
| The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal: |
|
|
| ```bash |
| python scripts/compare_all_agents.py --eval-episodes 50 |
| ``` |
|
|
| ### Sample output (train & eval) |
|
|
| ``` |
| Metric REINFORCE Naive |
| ---------------------------------------------- |
| Avg reward 15.7 5.0 |
| Success rate 53.0% 43.0% |
| Partial rate 19.0% 15.0% |
| Avg time 62.8m 63.0m |
| Avg cost $0.0 $0.0 |
| Avg steps 7.0 7.0 |
| ---------------------------------------------- |
| ``` |
|
|
| --- |
|
|
| ## OpenEnv & Hugging Face — How to show and use |
|
|
| SimLab is built for the **OpenEnv** ecosystem and can be served over HTTP and deployed to **Hugging Face** as a standardized agentic environment. |
|
|
| ### How SimLab uses OpenEnv |
|
|
| - **`openenv-core`** is a required dependency (`pyproject.toml`). |
| - **`lab_env/openenv_adapter.py`** wraps `LabEnv` in the OpenEnv `Environment` interface: |
| - **Types:** `LabAction`, `LabObservation`, `LabState`, `LabEnvironment` |
| - **`create_app(LabEnvironment, LabAction, LabObservation, ...)`** — FastAPI app with OpenEnv endpoints |
| |
| ### Run the OpenEnv HTTP server |
| |
| ```bash |
| uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000 |
| ``` |
| |
| This exposes standard OpenEnv endpoints: |
| |
| | Endpoint | Description | |
| |----------------|--------------------------------| |
| | `POST /reset` | Reset environment, get initial observation | |
| | `POST /step` | Send action, get next observation & reward | |
| | `GET /state` | Current state snapshot | |
| | `GET /metadata`| Environment name, version, docs | |
| | WebSocket `/ws`| Persistent session (optional) | |
| |
| Up to `max_concurrent_envs=4` sessions are supported. |
| |
| ### Call the OpenEnv server (show usage) |
| |
| From another process or machine, you can drive SimLab over HTTP: |
| |
| ```bash |
| # Reset (start new episode) |
| curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq . |
| |
| # Step (e.g. action 0 = setup preset 0) |
| curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq . |
| |
| # Get current state |
| curl -s http://localhost:8000/state | jq . |
| ``` |
| |
| From Python (e.g. for demos or integration): |
| |
| ```python |
| import requests |
| |
| BASE = "http://localhost:8000" |
| |
| # Reset |
| r = requests.post(f"{BASE}/reset", json={"seed": 42}) |
| obs = r.json() # observation with metadata (obs_vector, info, etc.) |
| |
| # Step: setup preset 0, then run assay (action 12 for PCR) |
| requests.post(f"{BASE}/step", json={"action": 0}) |
| r = requests.post(f"{BASE}/step", json={"action": 12}) |
| print(r.json()) # observation, reward, done |
| |
| # State |
| state = requests.get(f"{BASE}/state").json() |
| print(state["step_count"], state["best_result"]) |
| ``` |
| |
| ### Deploy to Hugging Face |
| |
| To **show SimLab on the Hugging Face Hub** as an OpenEnv environment: |
|
|
| 1. **Option A — Hugging Face Space (Docker)** |
| Create a Space with **Docker** as the SDK. Use a `Dockerfile` that installs SimLab and runs: |
| ```dockerfile |
| CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860 |
| ``` |
| Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint. |
|
|
| 2. **Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)** |
| The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and **`openenv push`** to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure. |
|
|
| 3. **Link your repo on the Hub** |
| In your SimLab repo or any Hugging Face model/Space card, set the **Repository** and **Documentation** URLs to your GitHub repo and add a tag or short description such as: *"OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."* |
|
|
| ### References |
|
|
| - [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) — framework overview and APIs |
| - [OpenEnv on Hugging Face](https://huggingface.co/openenv) — OpenEnv org and environments |
| - [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) — build, validate, push to Hub |
|
|
| --- |
|
|
| ## Environment API Reference |
|
|
| ```python |
| from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec |
| |
| # Default: PCR experiment (same as before) |
| env = LabEnv() |
| # Or any experiment from a spec: |
| # env = LabEnv(spec=my_experiment_spec) |
| |
| obs, info = env.reset(seed=42) |
| |
| # obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions) |
| # [0] step_index (normalised) |
| # [1] elapsed_minutes (normalised) |
| # [2] remaining_budget (normalised) |
| # [3..] inventory (one per spec.inventory_items, normalised) |
| # [...] last_result one-hot (len(spec.result_labels)) |
| # [...] has_setup, current_preset_idx (norm), best_result_score |
| |
| # Actions (Discrete, from spec): |
| # 0 .. num_presets-1 setup_reaction(preset_index) |
| # num_presets run_assay |
| # num_presets+1 .. order_reagents (one per orderable_items) |
| # ... wait, finish |
| |
| obs, reward, terminated, truncated, info = env.step(0) # setup preset 0 |
| obs, reward, terminated, truncated, info = env.step(12) # run assay (PCR) |
| obs, reward, terminated, truncated, info = env.step(17) # finish (PCR) |
| |
| # Custom protocol (any params; spec must have evaluate_custom_protocol) |
| obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"}) |
| ``` |
|
|
| --- |
|
|
| ## License |
|
|
| MIT |
|
|