Spaces:

arminfg
/

biosim

Sleeping

App Files Files Community

biosim / README.md

arminfg

fix(spaces): add app_port and troubleshooting for HF init/DNS error

c49f391 3 months ago

preview code

raw

history blame contribute delete

18.4 kB

metadata

title: SimLab — Lab Automation RL Environment
emoji: 🧪
colorFrom: blue
colorTo: green
sdk: docker
sdk_version: 4.0.0
app_port: 7860
pinned: false

SimLab — Lab Automation RL Environment

A self-contained Gymnasium-style reinforcement learning environment that simulates any wet-lab experiment workflow. The experiment type is defined by an ExperimentSpec (protocol presets, inventory, rewards, outcome model). The default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom assays, or any protocol-discovery task under real-world constraints: limited time, budget, and finite reagent inventory.

Built for the OpenEnv ecosystem so it can be wrapped as an HTTP-served, sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.

Integrations: OpenEnv · Hugging Face

What the Environment Simulates

Each episode represents a scientist at the bench trying to get a successful result. The environment:

Samples a hidden optimal protocol on every reset() — the agent never sees it directly.
Offers protocol presets (defined in the spec) the agent can choose from.
Lets the agent run assays that consume reagents and time, returning outcomes (e.g. success / partial / fail) from the spec’s outcome model.
Custom protocols: Specs with evaluate_custom_protocol (PCR, ELISA) allow arbitrary protocol parameters via env.run_assay_with_protocol(protocol_dict) — agents can generate and try any valid params, not just presets.
Allows ordering more reagents (costs money and time) and waiting.
Terminates when the agent calls finish, runs out of time/budget, or exhausts inventory with no way to reorder.

Default (PCR): 12 presets (3 temps × 2 cycle counts × 2 reagent ratios); probabilistic success based on distance to hidden optimum. Other experiments use their own presets and outcome logic via a custom ExperimentSpec.

Reward structure (default PCR)

The reward encodes real lab trade-offs (all configurable per spec):

Signal	Value
Immediate assay result: success	+15
Immediate assay result: partial	+5
Per-assay cost penalty	-3
Terminal bonus (best = success)	+60
Terminal bonus (best = partial)	+25
Terminal penalty (no success/partial)	-20
Time penalty	-0.25 per minute elapsed

A good agent learns to explore efficiently — try a few presets, read the signals from partial/success outcomes, and converge on the best protocol before finishing.

Architecture

simlab/
├── pyproject.toml              # Package metadata & dependencies
├── README.md
├── lab_env/
│   ├── __init__.py
│   ├── spec.py                 # ExperimentSpec, pcr_experiment_spec()
│   ├── env.py                  # LabEnv (Gymnasium interface, any experiment)
│   └── openenv_adapter.py      # OpenEnv types, LabEnvironment, HTTP app
├── agents/
│   ├── __init__.py
│   ├── naive_agent.py          # Random-preset baseline
│   ├── rl_agent.py             # REINFORCE policy-gradient agent (PyTorch)
│   ├── research_llm_agent.py   # LLM researcher: presets + research
│   └── research_generate_agent.py  # Research → generate any protocol → run → learn from feedback
├── knowledge/
│   └── pcr_protocols.json      # Fake “papers” for web_search tool (demo)
├── demo/
│   └── streamlit_app.py        # Live research dashboard + 3-agent comparison
└── scripts/
    ├── run_naive_baseline.py   # Evaluate the naive agent
    ├── train_and_eval_agent.py # Train REINFORCE & compare both agents
    ├── compare_all_agents.py  # Benchmark Naive vs RL vs Research LLM
    ├── run_research_generate_agent.py  # Research → generate protocol → run → learn (any protocol)
    └── demo_research_agent.py  # Terminal demo of research agent

Defining a new experiment

Implement an ExperimentSpec in lab_env/spec.py (or your own module) with:

presets — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
inventory_items / orderable_items — what the lab tracks and can reorder.
initial_inventory, order_costs, result_labels.
sample_hidden_optimum(rng) — returns hidden optimal state (e.g. ideal temp/cycles).
sample_assay_result(hidden, preset_idx, presets, rng) — returns outcome label.
evaluate_custom_protocol(hidden, protocol_dict, rng) (optional) — score an arbitrary protocol dict so agents can run any params via env.run_assay_with_protocol(protocol_dict).
protocol_param_schema (optional) — dict describing params for codegen/LLM (e.g. {"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}).

Then use LabEnv(spec=my_spec) or pass spec into the OpenEnv LabEnvironment(spec=my_spec).

Agent design

The REINFORCE agent decomposes the problem into a learned and a scripted part:

Learned — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a distribution over the 12 protocol presets. Trained with REINFORCE + entropy bonus + running-mean baseline.
Scripted — the episode loop (setup → run assay → check result → order if needed → finish on success) is fixed so the agent focuses on the hard decision: which preset to try.

This decomposition lets training converge in ~2000 episodes (a few seconds on CPU) while clearly beating the random-preset naive baseline.

The Research LLM agent adds a self-improving lab scientist: it researches protocols (via a web_search tool over a local knowledge base), hypothesizes new parameter combinations (mapped to presets), runs experiments in LabEnv, and updates internal knowledge from results.

The Research & Generate agent (research_generate_agent.py) goes further: it researches (web_search), generates protocol parameters for any valid values (not limited to presets), runs them via env.run_assay_with_protocol(protocol_dict), and learns from feedback — each run's (protocol, result, reward) is passed into the next trial so the agent improves over the episode. Works with any spec that has evaluate_custom_protocol (PCR, ELISA). Run it with:

export OPENAI_API_KEY=your_key
python scripts/run_research_generate_agent.py --episodes 5 --verbose

Use --workflow elisa-readout for ELISA. Add knowledge/{name}_protocols.json for more experiment types so research has literature to search.

Training on different protocol sets

Each protocol (PCR, ELISA, or a custom spec) has its own presets and outcome model. The RL agent can train on any of them so you get one policy per protocol set.

One agent per protocol: Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
Script: scripts/train_per_protocol.py trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. checkpoints/pcr-amplification.pt, checkpoints/elisa-readout.pt):
```
python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
```
Using agents to create different protocol sets: You can define new protocol sets in two ways:
1. In code: Add a new ExperimentSpec in lab_env/spec.py (or your own module): define presets, sample_hidden_optimum, sample_assay_result, and optionally evaluate_custom_protocol + protocol_param_schema. Register it in get_spec_for_workflow() and run train_per_protocol.py --workflows your-workflow-id.
2. Generated presets: Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an ExperimentSpec and train an agent with ReinforceAgent(spec=my_spec) on LabEnv(spec=my_spec). The Research & Generate agent already “creates” protocols at run time (arbitrary params); to train on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.

Quick Start

Install

pip install -e .

Or just ensure numpy, torch, and gymnasium are installed.

Run the naive baseline

python scripts/run_naive_baseline.py --episodes 200

Train the REINFORCE agent and compare

python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100

Next.js UI + API server (general UI)

Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):

# Terminal 1: Python API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2: Next.js frontend (v0ap)
cd v0ap && pnpm dev

Then open the workflow run page (e.g. /workflows/pcr-amplification). The UI shows Run with AI Agent, Run Research Agent (research → hypothesize → experiment → learn), and Run Naive Baseline. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set OPENAI_API_KEY if you use the Research agent.

Hackathon / live demo — how to show the RL

Pitch in one line: “We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”

Setup (do this before going on stage)

Start both servers (two terminals):

# Terminal 1 — API (agents + LabEnv)
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2 — UI
cd v0ap && pnpm dev

Open http://localhost:3000 (or the URL Next.js prints).
Optional: set OPENAI_API_KEY if you want to demo Research / Research & Generate.

Demo flow A — “Watch the RL agent learn” (~2 min)

Go to Training (/training).
Say: “This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”
Set episodes to 500 (slider) for a short run — training finishes in under a minute on a laptop.
Click Start Training. Point at:
- Progress and “Episode X of 500”.
- Chart: reward and success rate climbing over episodes.
When it finishes: “Here’s the comparison: REINFORCE vs random baseline.” Show the table (success rate, reward, time).

Demo flow B — “Compare agents in the lab” (~1–2 min)

Go to PCR Amplification (/workflows/pcr-amplification).
Say: “Each run is one scientist trying to get a successful experiment under time and budget.”
Click Run Naive Baseline — timeline fills with random preset choices and results.
Then click Run with AI Agent (uses the policy you trained in flow A, or a default). Point at the timeline: “The learned agent picks protocols more purposefully and often gets success sooner.”
If you have an API key: click Research & Generate (any protocol) — “This one researches, proposes parameters, runs them, and learns from feedback.”

Tips

Keep training short on stage: 500 episodes is enough to show learning; 1000 if you have time.
If the UI is slow: Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
Backup: Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
Talking points: Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.

Demo script (optional)

From repo root, run ./scripts/demo_hackathon.sh for a short checklist and the option to start the API in that terminal. Or start both manually:

# Terminal 1
uvicorn server.app:app --host 0.0.0.0 --port 8000

# Terminal 2
cd v0ap && pnpm dev
# Open http://localhost:3000 → /training or /workflows/pcr-amplification

Research LLM agent (optional, Streamlit)

Install demo dependencies (openai, streamlit) and set OPENAI_API_KEY:

pip install -e ".[demo]"
export OPENAI_API_KEY=your_key
streamlit run demo/streamlit_app.py

The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:

python scripts/compare_all_agents.py --eval-episodes 50

Sample output (train & eval)

Metric                  REINFORCE        Naive
----------------------------------------------
Avg reward                   15.7          5.0
Success rate                53.0%        43.0%
Partial rate                19.0%        15.0%
Avg time                    62.8m        63.0m
Avg cost                     $0.0         $0.0
Avg steps                     7.0          7.0
----------------------------------------------

OpenEnv & Hugging Face — How to show and use

SimLab is built for the OpenEnv ecosystem and can be served over HTTP and deployed to Hugging Face as a standardized agentic environment.

How SimLab uses OpenEnv

openenv-core is a required dependency (pyproject.toml).
lab_env/openenv_adapter.py wraps LabEnv in the OpenEnv Environment interface:
- Types: LabAction, LabObservation, LabState, LabEnvironment
- create_app(LabEnvironment, LabAction, LabObservation, ...) — FastAPI app with OpenEnv endpoints

Run the OpenEnv HTTP server

uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000

This exposes standard OpenEnv endpoints:

Endpoint	Description
`POST /reset`	Reset environment, get initial observation
`POST /step`	Send action, get next observation & reward
`GET /state`	Current state snapshot
`GET /metadata`	Environment name, version, docs
WebSocket `/ws`	Persistent session (optional)

Up to max_concurrent_envs=4 sessions are supported.

Call the OpenEnv server (show usage)

From another process or machine, you can drive SimLab over HTTP:

# Reset (start new episode)
curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .

# Step (e.g. action 0 = setup preset 0)
curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .

# Get current state
curl -s http://localhost:8000/state | jq .

From Python (e.g. for demos or integration):

import requests

BASE = "http://localhost:8000"

# Reset
r = requests.post(f"{BASE}/reset", json={"seed": 42})
obs = r.json()  # observation with metadata (obs_vector, info, etc.)

# Step: setup preset 0, then run assay (action 12 for PCR)
requests.post(f"{BASE}/step", json={"action": 0})
r = requests.post(f"{BASE}/step", json={"action": 12})
print(r.json())  # observation, reward, done

# State
state = requests.get(f"{BASE}/state").json()
print(state["step_count"], state["best_result"])

Deploy to Hugging Face

To show SimLab on the Hugging Face Hub as an OpenEnv environment:

Option A — Hugging Face Space (Docker)
Create a Space with Docker as the SDK. Use a Dockerfile that installs SimLab and runs:
```
CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
```
Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. https://huggingface.co/spaces/your-username/simlab-env) is then the public OpenEnv endpoint.
Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)
The OpenEnv Packaging & Deploying guide uses openenv init, openenv build, and openenv push to deploy to the Hub. SimLab currently uses openenv-core and a custom adapter; to use openenv push, you would add the expected layout (e.g. openenv.yaml, server/ with Dockerfile) and wire the existing LabEnvironment + create_app into that structure.
Link your repo on the Hub
In your SimLab repo or any Hugging Face model/Space card, set the Repository and Documentation URLs to your GitHub repo and add a tag or short description such as: "OpenEnv-compatible lab automation environment; run with uvicorn lab_env.openenv_adapter:app and connect via POST /reset, POST /step."

References

OpenEnv documentation — framework overview and APIs
OpenEnv on Hugging Face — OpenEnv org and environments
Packaging & Deploying (OpenEnv) — build, validate, push to Hub

Environment API Reference

from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec

# Default: PCR experiment (same as before)
env = LabEnv()
# Or any experiment from a spec:
# env = LabEnv(spec=my_experiment_spec)

obs, info = env.reset(seed=42)

# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
#   [0]    step_index (normalised)
#   [1]    elapsed_minutes (normalised)
#   [2]    remaining_budget (normalised)
#   [3..]  inventory (one per spec.inventory_items, normalised)
#   [...]  last_result one-hot (len(spec.result_labels))
#   [...]  has_setup, current_preset_idx (norm), best_result_score

# Actions (Discrete, from spec):
#   0 .. num_presets-1   setup_reaction(preset_index)
#   num_presets          run_assay
#   num_presets+1 ..     order_reagents (one per orderable_items)
#   ...                  wait, finish

obs, reward, terminated, truncated, info = env.step(0)    # setup preset 0
obs, reward, terminated, truncated, info = env.step(12)   # run assay (PCR)
obs, reward, terminated, truncated, info = env.step(17)   # finish (PCR)

# Custom protocol (any params; spec must have evaluate_custom_protocol)
obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})

License

MIT