--- title: SimLab β€” Lab Automation RL Environment emoji: πŸ§ͺ colorFrom: blue colorTo: green sdk: docker sdk_version: "4.0.0" app_port: 7860 pinned: false --- # SimLab β€” Lab Automation RL Environment A self-contained Gymnasium-style reinforcement learning environment that simulates **any** wet-lab experiment workflow. The experiment type is defined by an **ExperimentSpec** (protocol presets, inventory, rewards, outcome model). The default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom assays, or any protocol-discovery task under real-world constraints: limited time, budget, and finite reagent inventory. Built for the **OpenEnv** ecosystem so it can be wrapped as an HTTP-served, sandboxed environment and uploaded to the OpenEnv hub on Hugging Face. **Integrations:** [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) Β· [Hugging Face](https://huggingface.co/openenv) --- ## What the Environment Simulates Each episode represents a scientist at the bench trying to get a successful result. The environment: - **Samples a hidden optimal protocol** on every `reset()` β€” the agent never sees it directly. - Offers **protocol presets** (defined in the spec) the agent can choose from. - Lets the agent **run assays** that consume reagents and time, returning outcomes (e.g. success / partial / fail) from the spec’s outcome model. - **Custom protocols:** Specs with `evaluate_custom_protocol` (PCR, ELISA) allow **arbitrary** protocol parameters via `env.run_assay_with_protocol(protocol_dict)` β€” agents can generate and try any valid params, not just presets. - Allows **ordering more reagents** (costs money and time) and **waiting**. - Terminates when the agent calls **finish**, runs out of time/budget, or exhausts inventory with no way to reorder. **Default (PCR):** 12 presets (3 temps Γ— 2 cycle counts Γ— 2 reagent ratios); probabilistic success based on distance to hidden optimum. Other experiments use their own presets and outcome logic via a custom `ExperimentSpec`. ### Reward structure (default PCR) The reward encodes real lab trade-offs (all configurable per spec): | Signal | Value | |---|---| | Immediate assay result: success | +15 | | Immediate assay result: partial | +5 | | Per-assay cost penalty | -3 | | Terminal bonus (best = success) | +60 | | Terminal bonus (best = partial) | +25 | | Terminal penalty (no success/partial) | -20 | | Time penalty | -0.25 per minute elapsed | A good agent learns to explore efficiently β€” try a few presets, read the signals from partial/success outcomes, and converge on the best protocol before finishing. --- ## Architecture ``` simlab/ β”œβ”€β”€ pyproject.toml # Package metadata & dependencies β”œβ”€β”€ README.md β”œβ”€β”€ lab_env/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ spec.py # ExperimentSpec, pcr_experiment_spec() β”‚ β”œβ”€β”€ env.py # LabEnv (Gymnasium interface, any experiment) β”‚ └── openenv_adapter.py # OpenEnv types, LabEnvironment, HTTP app β”œβ”€β”€ agents/ β”‚ β”œβ”€β”€ __init__.py β”‚ β”œβ”€β”€ naive_agent.py # Random-preset baseline β”‚ β”œβ”€β”€ rl_agent.py # REINFORCE policy-gradient agent (PyTorch) β”‚ β”œβ”€β”€ research_llm_agent.py # LLM researcher: presets + research β”‚ └── research_generate_agent.py # Research β†’ generate any protocol β†’ run β†’ learn from feedback β”œβ”€β”€ knowledge/ β”‚ └── pcr_protocols.json # Fake β€œpapers” for web_search tool (demo) β”œβ”€β”€ demo/ β”‚ └── streamlit_app.py # Live research dashboard + 3-agent comparison └── scripts/ β”œβ”€β”€ run_naive_baseline.py # Evaluate the naive agent β”œβ”€β”€ train_and_eval_agent.py # Train REINFORCE & compare both agents β”œβ”€β”€ compare_all_agents.py # Benchmark Naive vs RL vs Research LLM β”œβ”€β”€ run_research_generate_agent.py # Research β†’ generate protocol β†’ run β†’ learn (any protocol) └── demo_research_agent.py # Terminal demo of research agent ``` ### Defining a new experiment Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with: - **presets** β€” list of protocol dicts (e.g. temperature, cycles, ratio for PCR). - **inventory_items** / **orderable_items** β€” what the lab tracks and can reorder. - **initial_inventory**, **order_costs**, **result_labels**. - **sample_hidden_optimum(rng)** β€” returns hidden optimal state (e.g. ideal temp/cycles). - **sample_assay_result(hidden, preset_idx, presets, rng)** β€” returns outcome label. - **evaluate_custom_protocol(hidden, protocol_dict, rng)** (optional) β€” score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`. - **protocol_param_schema** (optional) β€” dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`). Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`. ### Agent design The **REINFORCE agent** decomposes the problem into a learned and a scripted part: - **Learned** β€” a 2-layer MLP (14 β†’ 64 β†’ 64 β†’ 12) maps the observation to a distribution over the 12 protocol presets. Trained with REINFORCE + entropy bonus + running-mean baseline. - **Scripted** β€” the episode loop (setup β†’ run assay β†’ check result β†’ order if needed β†’ finish on success) is fixed so the agent focuses on the hard decision: *which* preset to try. This decomposition lets training converge in ~2000 episodes (a few seconds on CPU) while clearly beating the random-preset naive baseline. The **Research LLM agent** adds a self-improving lab scientist: it researches protocols (via a `web_search` tool over a local knowledge base), hypothesizes new parameter combinations (mapped to presets), runs experiments in LabEnv, and updates internal knowledge from results. The **Research & Generate agent** (`research_generate_agent.py`) goes further: it **researches** (web_search), **generates** protocol parameters for **any** valid values (not limited to presets), **runs** them via `env.run_assay_with_protocol(protocol_dict)`, and **learns from feedback** β€” each run's (protocol, result, reward) is passed into the next trial so the agent improves over the episode. Works with any spec that has `evaluate_custom_protocol` (PCR, ELISA). Run it with: ```bash export OPENAI_API_KEY=your_key python scripts/run_research_generate_agent.py --episodes 5 --verbose ``` Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json` for more experiment types so research has literature to search. ### Training on different protocol sets Each **protocol** (PCR, ELISA, or a custom spec) has its own **presets** and outcome model. The RL agent can train on any of them so you get one policy per protocol set. - **One agent per protocol:** Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs β†’ 12 presets for PCR; same for ELISA). - **Script:** `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`): ```bash python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500 ``` - **Using agents to create different protocol sets:** You can define new protocol sets in two ways: 1. **In code:** Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`. 2. **Generated presets:** Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already β€œcreates” protocols at run time (arbitrary params); to **train** on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it. --- ## Quick Start ### Install ```bash pip install -e . ``` Or just ensure `numpy`, `torch`, and `gymnasium` are installed. ### Run the naive baseline ```bash python scripts/run_naive_baseline.py --episodes 200 ``` ### Train the REINFORCE agent and compare ```bash python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100 ``` ### Next.js UI + API server (general UI) Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend): ```bash # Terminal 1: Python API (agents + LabEnv) uvicorn server.app:app --host 0.0.0.0 --port 8000 # Terminal 2: Next.js frontend (v0ap) cd v0ap && pnpm dev ``` Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows **Run with AI Agent**, **Run Research Agent** (research β†’ hypothesize β†’ experiment β†’ learn), and **Run Naive Baseline**. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent. --- ## Hackathon / live demo β€” how to show the RL **Pitch in one line:** *β€œWe simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”* ### Setup (do this before going on stage) 1. **Start both servers** (two terminals): ```bash # Terminal 1 β€” API (agents + LabEnv) uvicorn server.app:app --host 0.0.0.0 --port 8000 # Terminal 2 β€” UI cd v0ap && pnpm dev ``` 2. Open **http://localhost:3000** (or the URL Next.js prints). 3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate. ### Demo flow A β€” β€œWatch the RL agent learn” (~2 min) 1. Go to **Training** (`/training`). 2. Say: *β€œThis is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”* 3. Set **episodes to 500** (slider) for a short run β€” training finishes in under a minute on a laptop. 4. Click **Start Training**. Point at: - **Progress** and β€œEpisode X of 500”. - **Chart**: reward and success rate climbing over episodes. 5. When it finishes: *β€œHere’s the comparison: REINFORCE vs random baseline.”* Show the table (success rate, reward, time). ### Demo flow B β€” β€œCompare agents in the lab” (~1–2 min) 1. Go to **PCR Amplification** (`/workflows/pcr-amplification`). 2. Say: *β€œEach run is one scientist trying to get a successful experiment under time and budget.”* 3. Click **Run Naive Baseline** β€” timeline fills with random preset choices and results. 4. Then click **Run with AI Agent** (uses the policy you trained in flow A, or a default). Point at the timeline: *β€œThe learned agent picks protocols more purposefully and often gets success sooner.”* 5. If you have an API key: click **Research & Generate (any protocol)** β€” *β€œThis one researches, proposes parameters, runs them, and learns from feedback.”* ### Tips - **Keep training short on stage:** 500 episodes is enough to show learning; 1000 if you have time. - **If the UI is slow:** Run a quick train in the background before the demo, then only show β€œRun with AI Agent” and the comparison table. - **Backup:** Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails. - **Talking points:** Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for β€œany protocol” + learning from feedback. ### Demo script (optional) From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually: ```bash # Terminal 1 uvicorn server.app:app --host 0.0.0.0 --port 8000 # Terminal 2 cd v0ap && pnpm dev # Open http://localhost:3000 β†’ /training or /workflows/pcr-amplification ``` --- ### Research LLM agent (optional, Streamlit) Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`: ```bash pip install -e ".[demo]" export OPENAI_API_KEY=your_key streamlit run demo/streamlit_app.py ``` The Streamlit app shows the research flow (research β†’ hypothesize β†’ experiment β†’ learn) and a 3-agent comparison table. To benchmark all agents from the terminal: ```bash python scripts/compare_all_agents.py --eval-episodes 50 ``` ### Sample output (train & eval) ``` Metric REINFORCE Naive ---------------------------------------------- Avg reward 15.7 5.0 Success rate 53.0% 43.0% Partial rate 19.0% 15.0% Avg time 62.8m 63.0m Avg cost $0.0 $0.0 Avg steps 7.0 7.0 ---------------------------------------------- ``` --- ## OpenEnv & Hugging Face β€” How to show and use SimLab is built for the **OpenEnv** ecosystem and can be served over HTTP and deployed to **Hugging Face** as a standardized agentic environment. ### How SimLab uses OpenEnv - **`openenv-core`** is a required dependency (`pyproject.toml`). - **`lab_env/openenv_adapter.py`** wraps `LabEnv` in the OpenEnv `Environment` interface: - **Types:** `LabAction`, `LabObservation`, `LabState`, `LabEnvironment` - **`create_app(LabEnvironment, LabAction, LabObservation, ...)`** β€” FastAPI app with OpenEnv endpoints ### Run the OpenEnv HTTP server ```bash uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000 ``` This exposes standard OpenEnv endpoints: | Endpoint | Description | |----------------|--------------------------------| | `POST /reset` | Reset environment, get initial observation | | `POST /step` | Send action, get next observation & reward | | `GET /state` | Current state snapshot | | `GET /metadata`| Environment name, version, docs | | WebSocket `/ws`| Persistent session (optional) | Up to `max_concurrent_envs=4` sessions are supported. ### Call the OpenEnv server (show usage) From another process or machine, you can drive SimLab over HTTP: ```bash # Reset (start new episode) curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq . # Step (e.g. action 0 = setup preset 0) curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq . # Get current state curl -s http://localhost:8000/state | jq . ``` From Python (e.g. for demos or integration): ```python import requests BASE = "http://localhost:8000" # Reset r = requests.post(f"{BASE}/reset", json={"seed": 42}) obs = r.json() # observation with metadata (obs_vector, info, etc.) # Step: setup preset 0, then run assay (action 12 for PCR) requests.post(f"{BASE}/step", json={"action": 0}) r = requests.post(f"{BASE}/step", json={"action": 12}) print(r.json()) # observation, reward, done # State state = requests.get(f"{BASE}/state").json() print(state["step_count"], state["best_result"]) ``` ### Deploy to Hugging Face To **show SimLab on the Hugging Face Hub** as an OpenEnv environment: 1. **Option A β€” Hugging Face Space (Docker)** Create a Space with **Docker** as the SDK. Use a `Dockerfile` that installs SimLab and runs: ```dockerfile CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860 ``` Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint. 2. **Option B β€” OpenEnv CLI (if you adopt the full OpenEnv layout)** The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and **`openenv push`** to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure. 3. **Link your repo on the Hub** In your SimLab repo or any Hugging Face model/Space card, set the **Repository** and **Documentation** URLs to your GitHub repo and add a tag or short description such as: *"OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."* ### References - [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) β€” framework overview and APIs - [OpenEnv on Hugging Face](https://huggingface.co/openenv) β€” OpenEnv org and environments - [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) β€” build, validate, push to Hub --- ## Environment API Reference ```python from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec # Default: PCR experiment (same as before) env = LabEnv() # Or any experiment from a spec: # env = LabEnv(spec=my_experiment_spec) obs, info = env.reset(seed=42) # obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions) # [0] step_index (normalised) # [1] elapsed_minutes (normalised) # [2] remaining_budget (normalised) # [3..] inventory (one per spec.inventory_items, normalised) # [...] last_result one-hot (len(spec.result_labels)) # [...] has_setup, current_preset_idx (norm), best_result_score # Actions (Discrete, from spec): # 0 .. num_presets-1 setup_reaction(preset_index) # num_presets run_assay # num_presets+1 .. order_reagents (one per orderable_items) # ... wait, finish obs, reward, terminated, truncated, info = env.step(0) # setup preset 0 obs, reward, terminated, truncated, info = env.step(12) # run assay (PCR) obs, reward, terminated, truncated, info = env.step(17) # finish (PCR) # Custom protocol (any params; spec must have evaluate_custom_protocol) obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"}) ``` --- ## License MIT