biosim / README.md
arminfg's picture
fix(spaces): add app_port and troubleshooting for HF init/DNS error
c49f391
metadata
title: SimLab  Lab Automation RL Environment
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 4.0.0
app_port: 7860
pinned: false

SimLab — Lab Automation RL Environment

A self-contained Gymnasium-style reinforcement learning environment that simulates any wet-lab experiment workflow. The experiment type is defined by an ExperimentSpec (protocol presets, inventory, rewards, outcome model). The default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom assays, or any protocol-discovery task under real-world constraints: limited time, budget, and finite reagent inventory.

Built for the OpenEnv ecosystem so it can be wrapped as an HTTP-served, sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.

Integrations: OpenEnv · Hugging Face


What the Environment Simulates

Each episode represents a scientist at the bench trying to get a successful result. The environment:

  • Samples a hidden optimal protocol on every reset() — the agent never sees it directly.
  • Offers protocol presets (defined in the spec) the agent can choose from.
  • Lets the agent run assays that consume reagents and time, returning outcomes (e.g. success / partial / fail) from the spec’s outcome model.
  • Custom protocols: Specs with evaluate_custom_protocol (PCR, ELISA) allow arbitrary protocol parameters via env.run_assay_with_protocol(protocol_dict) — agents can generate and try any valid params, not just presets.
  • Allows ordering more reagents (costs money and time) and waiting.
  • Terminates when the agent calls finish, runs out of time/budget, or exhausts inventory with no way to reorder.

Default (PCR): 12 presets (3 temps × 2 cycle counts × 2 reagent ratios); probabilistic success based on distance to hidden optimum. Other experiments use their own presets and outcome logic via a custom ExperimentSpec.

Reward structure (default PCR)

The reward encodes real lab trade-offs (all configurable per spec):

Signal Value
Immediate assay result: success +15
Immediate assay result: partial +5
Per-assay cost penalty -3
Terminal bonus (best = success) +60
Terminal bonus (best = partial) +25
Terminal penalty (no success/partial) -20
Time penalty -0.25 per minute elapsed

A good agent learns to explore efficiently — try a few presets, read the signals from partial/success outcomes, and converge on the best protocol before finishing.


Architecture

simlab/
├── pyproject.toml              # Package metadata & dependencies
├── README.md
├── lab_env/
│   ├── __init__.py
│   ├── spec.py                 # ExperimentSpec, pcr_experiment_spec()
│   ├── env.py                  # LabEnv (Gymnasium interface, any experiment)
│   └── openenv_adapter.py      # OpenEnv types, LabEnvironment, HTTP app
├── agents/
│   ├── __init__.py
│   ├── naive_agent.py          # Random-preset baseline
│   ├── rl_agent.py             # REINFORCE policy-gradient agent (PyTorch)
│   ├── research_llm_agent.py   # LLM researcher: presets + research
│   └── research_generate_agent.py  # Research → generate any protocol → run → learn from feedback
├── knowledge/
│   └── pcr_protocols.json      # Fake “papers” for web_search tool (demo)
├── demo/
│   └── streamlit_app.py        # Live research dashboard + 3-agent comparison
└── scripts/
    ├── run_naive_baseline.py   # Evaluate the naive agent
    ├── train_and_eval_agent.py # Train REINFORCE & compare both agents
    ├── compare_all_agents.py  # Benchmark Naive vs RL vs Research LLM
    ├── run_research_generate_agent.py  # Research → generate protocol → run → learn (any protocol)
    └── demo_research_agent.py  # Terminal demo of research agent

Defining a new experiment

Implement an ExperimentSpec in lab_env/spec.py (or your own module) with:

  • presets — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
  • inventory_items / orderable_items — what the lab tracks and can reorder.
  • initial_inventory, order_costs, result_labels.
  • sample_hidden_optimum(rng) — returns hidden optimal state (e.g. ideal temp/cycles).
  • sample_assay_result(hidden, preset_idx, presets, rng) — returns outcome label.
  • evaluate_custom_protocol(hidden, protocol_dict, rng) (optional) — score an arbitrary protocol dict so agents can run any params via env.run_assay_with_protocol(protocol_dict).
  • protocol_param_schema (optional) — dict describing params for codegen/LLM (e.g. {"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}).

Then use LabEnv(spec=my_spec) or pass spec into the OpenEnv LabEnvironment(spec=my_spec).

Agent design

The REINFORCE agent decomposes the problem into a learned and a scripted part:

  • Learned — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a distribution over the 12 protocol presets. Trained with REINFORCE + entropy bonus + running-mean baseline.
  • Scripted — the episode loop (setup → run assay → check result → order if needed → finish on success) is fixed so the agent focuses on the hard decision: which preset to try.

This decomposition lets training converge in ~2000 episodes (a few seconds on CPU) while clearly beating the random-preset naive baseline.

The Research LLM agent adds a self-improving lab scientist: it researches protocols (via a web_search tool over a local knowledge base), hypothesizes new parameter combinations (mapped to presets), runs experiments in LabEnv, and updates internal knowledge from results.

The Research & Generate agent (research_generate_agent.py) goes further: it researches (web_search), generates protocol parameters for any valid values (not limited to presets), runs them via env.run_assay_with_protocol(protocol_dict), and learns from feedback — each run's (protocol, result, reward) is passed into the next trial so the agent improves over the episode. Works with any spec that has evaluate_custom_protocol (PCR, ELISA). Run it with:

export OPENAI_API_KEY=your_key
python scripts/run_research_generate_agent.py --episodes 5 --verbose

Use --workflow elisa-readout for ELISA. Add knowledge/{name}_protocols.json for more experiment types so research has literature to search.

Training on different protocol sets

Each protocol (PCR, ELISA, or a custom spec) has its own presets and outcome model. The RL agent can train on any of them so you get one policy per protocol set.

  • One agent per protocol: Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).

  • Script: scripts/train_per_protocol.py trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. checkpoints/pcr-amplification.pt, checkpoints/elisa-readout.pt):

    python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
    
  • Using agents to create different protocol sets: You can define new protocol sets in two ways:

    1. In code: Add a new ExperimentSpec in lab_env/spec.py (or your own module): define presets, sample_hidden_optimum, sample_assay_result, and optionally evaluate_custom_protocol + protocol_param_schema. Register it in get_spec_for_workflow() and run train_per_protocol.py --workflows your-workflow-id.
    2. Generated presets: Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an ExperimentSpec and train an agent with ReinforceAgent(spec=my_spec) on LabEnv(spec=my_spec). The Research & Generate agent already “creates” protocols at run time (arbitrary params); to train on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.

Quick Start

Install

pip install -e .

Or just ensure numpy, torch, and gymnasium are installed.

Run the naive baseline

python scripts/run_naive_baseline.py --episodes 200

Train the REINFORCE agent and compare

python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100

Next.js UI + API server (general UI)

Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):

# Terminal 1: Python API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2: Next.js frontend (v0ap)
cd v0ap && pnpm dev

Then open the workflow run page (e.g. /workflows/pcr-amplification). The UI shows Run with AI Agent, Run Research Agent (research → hypothesize → experiment → learn), and Run Naive Baseline. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set OPENAI_API_KEY if you use the Research agent.


Hackathon / live demo — how to show the RL

Pitch in one line: “We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”

Setup (do this before going on stage)

  1. Start both servers (two terminals):
    # Terminal 1 — API (agents + LabEnv)
    uvicorn server.app:app --host 0.0.0.0 --port 8000
    
    # Terminal 2 — UI
    cd v0ap && pnpm dev
    
  2. Open http://localhost:3000 (or the URL Next.js prints).
  3. Optional: set OPENAI_API_KEY if you want to demo Research / Research & Generate.

Demo flow A — “Watch the RL agent learn” (~2 min)

  1. Go to Training (/training).
  2. Say: “This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”
  3. Set episodes to 500 (slider) for a short run — training finishes in under a minute on a laptop.
  4. Click Start Training. Point at:
    • Progress and “Episode X of 500”.
    • Chart: reward and success rate climbing over episodes.
  5. When it finishes: “Here’s the comparison: REINFORCE vs random baseline.” Show the table (success rate, reward, time).

Demo flow B — “Compare agents in the lab” (~1–2 min)

  1. Go to PCR Amplification (/workflows/pcr-amplification).
  2. Say: “Each run is one scientist trying to get a successful experiment under time and budget.”
  3. Click Run Naive Baseline — timeline fills with random preset choices and results.
  4. Then click Run with AI Agent (uses the policy you trained in flow A, or a default). Point at the timeline: “The learned agent picks protocols more purposefully and often gets success sooner.”
  5. If you have an API key: click Research & Generate (any protocol)“This one researches, proposes parameters, runs them, and learns from feedback.”

Tips

  • Keep training short on stage: 500 episodes is enough to show learning; 1000 if you have time.
  • If the UI is slow: Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
  • Backup: Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
  • Talking points: Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.

Demo script (optional)

From repo root, run ./scripts/demo_hackathon.sh for a short checklist and the option to start the API in that terminal. Or start both manually:

# Terminal 1
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2
cd v0ap && pnpm dev
# Open http://localhost:3000 → /training or /workflows/pcr-amplification

Research LLM agent (optional, Streamlit)

Install demo dependencies (openai, streamlit) and set OPENAI_API_KEY:

pip install -e ".[demo]"
export OPENAI_API_KEY=your_key
streamlit run demo/streamlit_app.py

The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:

python scripts/compare_all_agents.py --eval-episodes 50

Sample output (train & eval)

Metric                  REINFORCE        Naive
----------------------------------------------
Avg reward                   15.7          5.0
Success rate                53.0%        43.0%
Partial rate                19.0%        15.0%
Avg time                    62.8m        63.0m
Avg cost                     $0.0         $0.0
Avg steps                     7.0          7.0
----------------------------------------------

OpenEnv & Hugging Face — How to show and use

SimLab is built for the OpenEnv ecosystem and can be served over HTTP and deployed to Hugging Face as a standardized agentic environment.

How SimLab uses OpenEnv

  • openenv-core is a required dependency (pyproject.toml).
  • lab_env/openenv_adapter.py wraps LabEnv in the OpenEnv Environment interface:
    • Types: LabAction, LabObservation, LabState, LabEnvironment
    • create_app(LabEnvironment, LabAction, LabObservation, ...) — FastAPI app with OpenEnv endpoints

Run the OpenEnv HTTP server

uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000

This exposes standard OpenEnv endpoints:

Endpoint Description
POST /reset Reset environment, get initial observation
POST /step Send action, get next observation & reward
GET /state Current state snapshot
GET /metadata Environment name, version, docs
WebSocket /ws Persistent session (optional)

Up to max_concurrent_envs=4 sessions are supported.

Call the OpenEnv server (show usage)

From another process or machine, you can drive SimLab over HTTP:

# Reset (start new episode)
curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .

# Step (e.g. action 0 = setup preset 0)
curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .

# Get current state
curl -s http://localhost:8000/state | jq .

From Python (e.g. for demos or integration):

import requests

BASE = "http://localhost:8000"

# Reset
r = requests.post(f"{BASE}/reset", json={"seed": 42})
obs = r.json()  # observation with metadata (obs_vector, info, etc.)

# Step: setup preset 0, then run assay (action 12 for PCR)
requests.post(f"{BASE}/step", json={"action": 0})
r = requests.post(f"{BASE}/step", json={"action": 12})
print(r.json())  # observation, reward, done

# State
state = requests.get(f"{BASE}/state").json()
print(state["step_count"], state["best_result"])

Deploy to Hugging Face

To show SimLab on the Hugging Face Hub as an OpenEnv environment:

  1. Option A — Hugging Face Space (Docker)
    Create a Space with Docker as the SDK. Use a Dockerfile that installs SimLab and runs:

    CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
    

    Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. https://huggingface.co/spaces/your-username/simlab-env) is then the public OpenEnv endpoint.

  2. Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)
    The OpenEnv Packaging & Deploying guide uses openenv init, openenv build, and openenv push to deploy to the Hub. SimLab currently uses openenv-core and a custom adapter; to use openenv push, you would add the expected layout (e.g. openenv.yaml, server/ with Dockerfile) and wire the existing LabEnvironment + create_app into that structure.

  3. Link your repo on the Hub
    In your SimLab repo or any Hugging Face model/Space card, set the Repository and Documentation URLs to your GitHub repo and add a tag or short description such as: "OpenEnv-compatible lab automation environment; run with uvicorn lab_env.openenv_adapter:app and connect via POST /reset, POST /step."

References


Environment API Reference

from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec

# Default: PCR experiment (same as before)
env = LabEnv()
# Or any experiment from a spec:
# env = LabEnv(spec=my_experiment_spec)

obs, info = env.reset(seed=42)

# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
#   [0]    step_index (normalised)
#   [1]    elapsed_minutes (normalised)
#   [2]    remaining_budget (normalised)
#   [3..]  inventory (one per spec.inventory_items, normalised)
#   [...]  last_result one-hot (len(spec.result_labels))
#   [...]  has_setup, current_preset_idx (norm), best_result_score

# Actions (Discrete, from spec):
#   0 .. num_presets-1   setup_reaction(preset_index)
#   num_presets          run_assay
#   num_presets+1 ..     order_reagents (one per orderable_items)
#   ...                  wait, finish

obs, reward, terminated, truncated, info = env.step(0)    # setup preset 0
obs, reward, terminated, truncated, info = env.step(12)   # run assay (PCR)
obs, reward, terminated, truncated, info = env.step(17)   # finish (PCR)

# Custom protocol (any params; spec must have evaluate_custom_protocol)
obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})

License

MIT