biosim / README.md
arminfg's picture
fix(spaces): add app_port and troubleshooting for HF init/DNS error
c49f391
---
title: SimLab Lab Automation RL Environment
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: "4.0.0"
app_port: 7860
pinned: false
---
# SimLab — Lab Automation RL Environment
A self-contained Gymnasium-style reinforcement learning environment that
simulates **any** wet-lab experiment workflow. The experiment type is defined by
an **ExperimentSpec** (protocol presets, inventory, rewards, outcome model). The
default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom
assays, or any protocol-discovery task under real-world constraints: limited
time, budget, and finite reagent inventory.
Built for the **OpenEnv** ecosystem so it can be wrapped as an HTTP-served,
sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.
**Integrations:** [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) · [Hugging Face](https://huggingface.co/openenv)
---
## What the Environment Simulates
Each episode represents a scientist at the bench trying to get a successful
result. The environment:
- **Samples a hidden optimal protocol** on every `reset()` — the agent never
sees it directly.
- Offers **protocol presets** (defined in the spec) the agent can choose from.
- Lets the agent **run assays** that consume reagents and time, returning
outcomes (e.g. success / partial / fail) from the spec’s outcome model.
- **Custom protocols:** Specs with `evaluate_custom_protocol` (PCR, ELISA) allow
**arbitrary** protocol parameters via `env.run_assay_with_protocol(protocol_dict)` — agents can generate and try any valid params, not just presets.
- Allows **ordering more reagents** (costs money and time) and **waiting**.
- Terminates when the agent calls **finish**, runs out of time/budget, or
exhausts inventory with no way to reorder.
**Default (PCR):** 12 presets (3 temps × 2 cycle counts × 2 reagent ratios);
probabilistic success based on distance to hidden optimum. Other experiments
use their own presets and outcome logic via a custom `ExperimentSpec`.
### Reward structure (default PCR)
The reward encodes real lab trade-offs (all configurable per spec):
| Signal | Value |
|---|---|
| Immediate assay result: success | +15 |
| Immediate assay result: partial | +5 |
| Per-assay cost penalty | -3 |
| Terminal bonus (best = success) | +60 |
| Terminal bonus (best = partial) | +25 |
| Terminal penalty (no success/partial) | -20 |
| Time penalty | -0.25 per minute elapsed |
A good agent learns to explore efficiently — try a few presets, read the
signals from partial/success outcomes, and converge on the best protocol before
finishing.
---
## Architecture
```
simlab/
├── pyproject.toml # Package metadata & dependencies
├── README.md
├── lab_env/
│ ├── __init__.py
│ ├── spec.py # ExperimentSpec, pcr_experiment_spec()
│ ├── env.py # LabEnv (Gymnasium interface, any experiment)
│ └── openenv_adapter.py # OpenEnv types, LabEnvironment, HTTP app
├── agents/
│ ├── __init__.py
│ ├── naive_agent.py # Random-preset baseline
│ ├── rl_agent.py # REINFORCE policy-gradient agent (PyTorch)
│ ├── research_llm_agent.py # LLM researcher: presets + research
│ └── research_generate_agent.py # Research → generate any protocol → run → learn from feedback
├── knowledge/
│ └── pcr_protocols.json # Fake “papers” for web_search tool (demo)
├── demo/
│ └── streamlit_app.py # Live research dashboard + 3-agent comparison
└── scripts/
├── run_naive_baseline.py # Evaluate the naive agent
├── train_and_eval_agent.py # Train REINFORCE & compare both agents
├── compare_all_agents.py # Benchmark Naive vs RL vs Research LLM
├── run_research_generate_agent.py # Research → generate protocol → run → learn (any protocol)
└── demo_research_agent.py # Terminal demo of research agent
```
### Defining a new experiment
Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with:
- **presets** — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
- **inventory_items** / **orderable_items** — what the lab tracks and can reorder.
- **initial_inventory**, **order_costs**, **result_labels**.
- **sample_hidden_optimum(rng)** — returns hidden optimal state (e.g. ideal temp/cycles).
- **sample_assay_result(hidden, preset_idx, presets, rng)** — returns outcome label.
- **evaluate_custom_protocol(hidden, protocol_dict, rng)** (optional) — score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`.
- **protocol_param_schema** (optional) — dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`).
Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`.
### Agent design
The **REINFORCE agent** decomposes the problem into a learned and a scripted
part:
- **Learned** — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a
distribution over the 12 protocol presets. Trained with REINFORCE + entropy
bonus + running-mean baseline.
- **Scripted** — the episode loop (setup → run assay → check result → order
if needed → finish on success) is fixed so the agent focuses on the hard
decision: *which* preset to try.
This decomposition lets training converge in ~2000 episodes (a few seconds on
CPU) while clearly beating the random-preset naive baseline.
The **Research LLM agent** adds a self-improving lab scientist: it researches
protocols (via a `web_search` tool over a local knowledge base), hypothesizes
new parameter combinations (mapped to presets), runs experiments in LabEnv, and
updates internal knowledge from results.
The **Research & Generate agent** (`research_generate_agent.py`) goes further: it
**researches** (web_search), **generates** protocol parameters for **any** valid
values (not limited to presets), **runs** them via `env.run_assay_with_protocol(protocol_dict)`,
and **learns from feedback** — each run's (protocol, result, reward) is passed
into the next trial so the agent improves over the episode. Works with any spec
that has `evaluate_custom_protocol` (PCR, ELISA). Run it with:
```bash
export OPENAI_API_KEY=your_key
python scripts/run_research_generate_agent.py --episodes 5 --verbose
```
Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json`
for more experiment types so research has literature to search.
### Training on different protocol sets
Each **protocol** (PCR, ELISA, or a custom spec) has its own **presets** and outcome model. The RL agent can train on any of them so you get one policy per protocol set.
- **One agent per protocol:** Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
- **Script:** `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`):
```bash
python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
```
- **Using agents to create different protocol sets:** You can define new protocol sets in two ways:
1. **In code:** Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`.
2. **Generated presets:** Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already “creates” protocols at run time (arbitrary params); to **train** on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.
---
## Quick Start
### Install
```bash
pip install -e .
```
Or just ensure `numpy`, `torch`, and `gymnasium` are installed.
### Run the naive baseline
```bash
python scripts/run_naive_baseline.py --episodes 200
```
### Train the REINFORCE agent and compare
```bash
python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100
```
### Next.js UI + API server (general UI)
Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):
```bash
# Terminal 1: Python API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Terminal 2: Next.js frontend (v0ap)
cd v0ap && pnpm dev
```
Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows **Run with AI Agent**, **Run Research Agent** (research → hypothesize → experiment → learn), and **Run Naive Baseline**. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent.
---
## Hackathon / live demo — how to show the RL
**Pitch in one line:** *“We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”*
### Setup (do this before going on stage)
1. **Start both servers** (two terminals):
```bash
# Terminal 1 — API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Terminal 2 — UI
cd v0ap && pnpm dev
```
2. Open **http://localhost:3000** (or the URL Next.js prints).
3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate.
### Demo flow A — “Watch the RL agent learn” (~2 min)
1. Go to **Training** (`/training`).
2. Say: *“This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”*
3. Set **episodes to 500** (slider) for a short run — training finishes in under a minute on a laptop.
4. Click **Start Training**. Point at:
- **Progress** and “Episode X of 500”.
- **Chart**: reward and success rate climbing over episodes.
5. When it finishes: *“Here’s the comparison: REINFORCE vs random baseline.”* Show the table (success rate, reward, time).
### Demo flow B — “Compare agents in the lab” (~1–2 min)
1. Go to **PCR Amplification** (`/workflows/pcr-amplification`).
2. Say: *“Each run is one scientist trying to get a successful experiment under time and budget.”*
3. Click **Run Naive Baseline** — timeline fills with random preset choices and results.
4. Then click **Run with AI Agent** (uses the policy you trained in flow A, or a default). Point at the timeline: *“The learned agent picks protocols more purposefully and often gets success sooner.”*
5. If you have an API key: click **Research & Generate (any protocol)***“This one researches, proposes parameters, runs them, and learns from feedback.”*
### Tips
- **Keep training short on stage:** 500 episodes is enough to show learning; 1000 if you have time.
- **If the UI is slow:** Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
- **Backup:** Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
- **Talking points:** Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.
### Demo script (optional)
From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually:
```bash
# Terminal 1
uvicorn server.app:app --host 0.0.0.0 --port 8000
# Terminal 2
cd v0ap && pnpm dev
# Open http://localhost:3000 → /training or /workflows/pcr-amplification
```
---
### Research LLM agent (optional, Streamlit)
Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`:
```bash
pip install -e ".[demo]"
export OPENAI_API_KEY=your_key
streamlit run demo/streamlit_app.py
```
The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:
```bash
python scripts/compare_all_agents.py --eval-episodes 50
```
### Sample output (train & eval)
```
Metric REINFORCE Naive
----------------------------------------------
Avg reward 15.7 5.0
Success rate 53.0% 43.0%
Partial rate 19.0% 15.0%
Avg time 62.8m 63.0m
Avg cost $0.0 $0.0
Avg steps 7.0 7.0
----------------------------------------------
```
---
## OpenEnv & Hugging Face — How to show and use
SimLab is built for the **OpenEnv** ecosystem and can be served over HTTP and deployed to **Hugging Face** as a standardized agentic environment.
### How SimLab uses OpenEnv
- **`openenv-core`** is a required dependency (`pyproject.toml`).
- **`lab_env/openenv_adapter.py`** wraps `LabEnv` in the OpenEnv `Environment` interface:
- **Types:** `LabAction`, `LabObservation`, `LabState`, `LabEnvironment`
- **`create_app(LabEnvironment, LabAction, LabObservation, ...)`** — FastAPI app with OpenEnv endpoints
### Run the OpenEnv HTTP server
```bash
uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
```
This exposes standard OpenEnv endpoints:
| Endpoint | Description |
|----------------|--------------------------------|
| `POST /reset` | Reset environment, get initial observation |
| `POST /step` | Send action, get next observation & reward |
| `GET /state` | Current state snapshot |
| `GET /metadata`| Environment name, version, docs |
| WebSocket `/ws`| Persistent session (optional) |
Up to `max_concurrent_envs=4` sessions are supported.
### Call the OpenEnv server (show usage)
From another process or machine, you can drive SimLab over HTTP:
```bash
# Reset (start new episode)
curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .
# Step (e.g. action 0 = setup preset 0)
curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .
# Get current state
curl -s http://localhost:8000/state | jq .
```
From Python (e.g. for demos or integration):
```python
import requests
BASE = "http://localhost:8000"
# Reset
r = requests.post(f"{BASE}/reset", json={"seed": 42})
obs = r.json() # observation with metadata (obs_vector, info, etc.)
# Step: setup preset 0, then run assay (action 12 for PCR)
requests.post(f"{BASE}/step", json={"action": 0})
r = requests.post(f"{BASE}/step", json={"action": 12})
print(r.json()) # observation, reward, done
# State
state = requests.get(f"{BASE}/state").json()
print(state["step_count"], state["best_result"])
```
### Deploy to Hugging Face
To **show SimLab on the Hugging Face Hub** as an OpenEnv environment:
1. **Option A — Hugging Face Space (Docker)**
Create a Space with **Docker** as the SDK. Use a `Dockerfile` that installs SimLab and runs:
```dockerfile
CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
```
Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint.
2. **Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)**
The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and **`openenv push`** to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure.
3. **Link your repo on the Hub**
In your SimLab repo or any Hugging Face model/Space card, set the **Repository** and **Documentation** URLs to your GitHub repo and add a tag or short description such as: *"OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."*
### References
- [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) — framework overview and APIs
- [OpenEnv on Hugging Face](https://huggingface.co/openenv) — OpenEnv org and environments
- [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) — build, validate, push to Hub
---
## Environment API Reference
```python
from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec
# Default: PCR experiment (same as before)
env = LabEnv()
# Or any experiment from a spec:
# env = LabEnv(spec=my_experiment_spec)
obs, info = env.reset(seed=42)
# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
# [0] step_index (normalised)
# [1] elapsed_minutes (normalised)
# [2] remaining_budget (normalised)
# [3..] inventory (one per spec.inventory_items, normalised)
# [...] last_result one-hot (len(spec.result_labels))
# [...] has_setup, current_preset_idx (norm), best_result_score
# Actions (Discrete, from spec):
# 0 .. num_presets-1 setup_reaction(preset_index)
# num_presets run_assay
# num_presets+1 .. order_reagents (one per orderable_items)
# ... wait, finish
obs, reward, terminated, truncated, info = env.step(0) # setup preset 0
obs, reward, terminated, truncated, info = env.step(12) # run assay (PCR)
obs, reward, terminated, truncated, info = env.step(17) # finish (PCR)
# Custom protocol (any params; spec must have evaluate_custom_protocol)
obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})
```
---
## License
MIT