Spaces:

arminfg
/

biosim

Sleeping

App Files Files Community

biosim / README.md

arminfg

fix(spaces): add app_port and troubleshooting for HF init/DNS error

c49f391 3 months ago

preview code

raw

history blame contribute delete

18.4 kB

	---
	title: SimLab — Lab Automation RL Environment
	emoji: 🧪
	colorFrom: blue
	colorTo: green
	sdk: docker
	sdk_version: "4.0.0"
	app_port: 7860
	pinned: false
	---

	# SimLab — Lab Automation RL Environment

	A self-contained Gymnasium-style reinforcement learning environment that
	simulates any wet-lab experiment workflow. The experiment type is defined by
	an ExperimentSpec (protocol presets, inventory, rewards, outcome model). The
	default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom
	assays, or any protocol-discovery task under real-world constraints: limited
	time, budget, and finite reagent inventory.

	Built for the OpenEnv ecosystem so it can be wrapped as an HTTP-served,
	sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.

	Integrations: [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) · [Hugging Face](https://huggingface.co/openenv)

	---

	## What the Environment Simulates

	Each episode represents a scientist at the bench trying to get a successful
	result. The environment:

	- Samples a hidden optimal protocol on every `reset()` — the agent never
	sees it directly.
	- Offers protocol presets (defined in the spec) the agent can choose from.
	- Lets the agent run assays that consume reagents and time, returning
	outcomes (e.g. success / partial / fail) from the spec’s outcome model.
	- Custom protocols: Specs with `evaluate_custom_protocol` (PCR, ELISA) allow
	arbitrary protocol parameters via `env.run_assay_with_protocol(protocol_dict)` — agents can generate and try any valid params, not just presets.
	- Allows ordering more reagents (costs money and time) and waiting.
	- Terminates when the agent calls finish, runs out of time/budget, or
	exhausts inventory with no way to reorder.

	Default (PCR): 12 presets (3 temps × 2 cycle counts × 2 reagent ratios);
	probabilistic success based on distance to hidden optimum. Other experiments
	use their own presets and outcome logic via a custom `ExperimentSpec`.

	### Reward structure (default PCR)

	The reward encodes real lab trade-offs (all configurable per spec):

	\| Signal \| Value \|
	\|---\|---\|
	\| Immediate assay result: success \| +15 \|
	\| Immediate assay result: partial \| +5 \|
	\| Per-assay cost penalty \| -3 \|
	\| Terminal bonus (best = success) \| +60 \|
	\| Terminal bonus (best = partial) \| +25 \|
	\| Terminal penalty (no success/partial) \| -20 \|
	\| Time penalty \| -0.25 per minute elapsed \|

	A good agent learns to explore efficiently — try a few presets, read the
	signals from partial/success outcomes, and converge on the best protocol before
	finishing.

	---

	## Architecture

	```
	simlab/
	├── pyproject.toml # Package metadata & dependencies
	├── README.md
	├── lab_env/
	│ ├── __init__.py
	│ ├── spec.py # ExperimentSpec, pcr_experiment_spec()
	│ ├── env.py # LabEnv (Gymnasium interface, any experiment)
	│ └── openenv_adapter.py # OpenEnv types, LabEnvironment, HTTP app
	├── agents/
	│ ├── __init__.py
	│ ├── naive_agent.py # Random-preset baseline
	│ ├── rl_agent.py # REINFORCE policy-gradient agent (PyTorch)
	│ ├── research_llm_agent.py # LLM researcher: presets + research
	│ └── research_generate_agent.py # Research → generate any protocol → run → learn from feedback
	├── knowledge/
	│ └── pcr_protocols.json # Fake “papers” for web_search tool (demo)
	├── demo/
	│ └── streamlit_app.py # Live research dashboard + 3-agent comparison
	└── scripts/
	├── run_naive_baseline.py # Evaluate the naive agent
	├── train_and_eval_agent.py # Train REINFORCE & compare both agents
	├── compare_all_agents.py # Benchmark Naive vs RL vs Research LLM
	├── run_research_generate_agent.py # Research → generate protocol → run → learn (any protocol)
	└── demo_research_agent.py # Terminal demo of research agent
	```

	### Defining a new experiment

	Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with:

	- presets — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
	- inventory_items / orderable_items — what the lab tracks and can reorder.
	- initial_inventory, order_costs, result_labels.
	- sample_hidden_optimum(rng) — returns hidden optimal state (e.g. ideal temp/cycles).
	- sample_assay_result(hidden, preset_idx, presets, rng) — returns outcome label.
	- evaluate_custom_protocol(hidden, protocol_dict, rng) (optional) — score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`.
	- protocol_param_schema (optional) — dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`).

	Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`.

	### Agent design

	The REINFORCE agent decomposes the problem into a learned and a scripted
	part:

	- Learned — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a
	distribution over the 12 protocol presets. Trained with REINFORCE + entropy
	bonus + running-mean baseline.
	- Scripted — the episode loop (setup → run assay → check result → order
	if needed → finish on success) is fixed so the agent focuses on the hard
	decision: which preset to try.

	This decomposition lets training converge in ~2000 episodes (a few seconds on
	CPU) while clearly beating the random-preset naive baseline.

	The Research LLM agent adds a self-improving lab scientist: it researches
	protocols (via a `web_search` tool over a local knowledge base), hypothesizes
	new parameter combinations (mapped to presets), runs experiments in LabEnv, and
	updates internal knowledge from results.

	The Research & Generate agent (`research_generate_agent.py`) goes further: it
	researches (web_search), generates protocol parameters for any valid
	values (not limited to presets), runs them via `env.run_assay_with_protocol(protocol_dict)`,
	and learns from feedback — each run's (protocol, result, reward) is passed
	into the next trial so the agent improves over the episode. Works with any spec
	that has `evaluate_custom_protocol` (PCR, ELISA). Run it with:

	```bash
	export OPENAI_API_KEY=your_key
	python scripts/run_research_generate_agent.py --episodes 5 --verbose
	```

	Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json`
	for more experiment types so research has literature to search.

	### Training on different protocol sets

	Each protocol (PCR, ELISA, or a custom spec) has its own presets and outcome model. The RL agent can train on any of them so you get one policy per protocol set.

	- One agent per protocol: Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
	- Script: `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`):

	```bash
	python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
	```

	- Using agents to create different protocol sets: You can define new protocol sets in two ways:
	1. In code: Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`.
	2. Generated presets: Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already “creates” protocols at run time (arbitrary params); to train on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.

	---

	## Quick Start

	### Install

	```bash
	pip install -e .
	```

	Or just ensure `numpy`, `torch`, and `gymnasium` are installed.

	### Run the naive baseline

	```bash
	python scripts/run_naive_baseline.py --episodes 200
	```

	### Train the REINFORCE agent and compare

	```bash
	python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100
	```

	### Next.js UI + API server (general UI)

	Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):

	```bash
	# Terminal 1: Python API (agents + LabEnv)
	uvicorn server.app:app --host 0.0.0.0 --port 8000

	# Terminal 2: Next.js frontend (v0ap)
	cd v0ap && pnpm dev
	```

	Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows Run with AI Agent, Run Research Agent (research → hypothesize → experiment → learn), and Run Naive Baseline. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent.

	---

	## Hackathon / live demo — how to show the RL

	Pitch in one line: “We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”

	### Setup (do this before going on stage)

	1. Start both servers (two terminals):
	```bash
	# Terminal 1 — API (agents + LabEnv)
	uvicorn server.app:app --host 0.0.0.0 --port 8000

	# Terminal 2 — UI
	cd v0ap && pnpm dev
	```
	2. Open http://localhost:3000 (or the URL Next.js prints).
	3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate.

	### Demo flow A — “Watch the RL agent learn” (~2 min)

	1. Go to Training (`/training`).
	2. Say: “This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”
	3. Set episodes to 500 (slider) for a short run — training finishes in under a minute on a laptop.
	4. Click Start Training. Point at:
	- Progress and “Episode X of 500”.
	- Chart: reward and success rate climbing over episodes.
	5. When it finishes: “Here’s the comparison: REINFORCE vs random baseline.” Show the table (success rate, reward, time).

	### Demo flow B — “Compare agents in the lab” (~1–2 min)

	1. Go to PCR Amplification (`/workflows/pcr-amplification`).
	2. Say: “Each run is one scientist trying to get a successful experiment under time and budget.”
	3. Click Run Naive Baseline — timeline fills with random preset choices and results.
	4. Then click Run with AI Agent (uses the policy you trained in flow A, or a default). Point at the timeline: “The learned agent picks protocols more purposefully and often gets success sooner.”
	5. If you have an API key: click Research & Generate (any protocol) — “This one researches, proposes parameters, runs them, and learns from feedback.”

	### Tips

	- Keep training short on stage: 500 episodes is enough to show learning; 1000 if you have time.
	- If the UI is slow: Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
	- Backup: Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
	- Talking points: Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.

	### Demo script (optional)

	From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually:

	```bash
	# Terminal 1
	uvicorn server.app:app --host 0.0.0.0 --port 8000

	# Terminal 2
	cd v0ap && pnpm dev
	# Open http://localhost:3000 → /training or /workflows/pcr-amplification
	```

	---

	### Research LLM agent (optional, Streamlit)

	Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`:

	```bash
	pip install -e ".[demo]"
	export OPENAI_API_KEY=your_key
	streamlit run demo/streamlit_app.py
	```

	The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:

	```bash
	python scripts/compare_all_agents.py --eval-episodes 50
	```

	### Sample output (train & eval)

	```
	Metric REINFORCE Naive
	----------------------------------------------
	Avg reward 15.7 5.0
	Success rate 53.0% 43.0%
	Partial rate 19.0% 15.0%
	Avg time 62.8m 63.0m
	Avg cost $0.0 $0.0
	Avg steps 7.0 7.0
	----------------------------------------------
	```

	---

	## OpenEnv & Hugging Face — How to show and use

	SimLab is built for the OpenEnv ecosystem and can be served over HTTP and deployed to Hugging Face as a standardized agentic environment.

	### How SimLab uses OpenEnv

	- `openenv-core` is a required dependency (`pyproject.toml`).
	- `lab_env/openenv_adapter.py` wraps `LabEnv` in the OpenEnv `Environment` interface:
	- Types: `LabAction`, `LabObservation`, `LabState`, `LabEnvironment`
	- `create_app(LabEnvironment, LabAction, LabObservation, ...)` — FastAPI app with OpenEnv endpoints

	### Run the OpenEnv HTTP server

	```bash
	uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
	```

	This exposes standard OpenEnv endpoints:

	\| Endpoint \| Description \|
	\|----------------\|--------------------------------\|
	\| `POST /reset` \| Reset environment, get initial observation \|
	\| `POST /step` \| Send action, get next observation & reward \|
	\| `GET /state` \| Current state snapshot \|
	\| `GET /metadata`\| Environment name, version, docs \|
	\| WebSocket `/ws`\| Persistent session (optional) \|

	Up to `max_concurrent_envs=4` sessions are supported.

	### Call the OpenEnv server (show usage)

	From another process or machine, you can drive SimLab over HTTP:

	```bash
	# Reset (start new episode)
	curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' \| jq .

	# Step (e.g. action 0 = setup preset 0)
	curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' \| jq .

	# Get current state
	curl -s http://localhost:8000/state \| jq .
	```

	From Python (e.g. for demos or integration):

	```python
	import requests

	BASE = "http://localhost:8000"

	# Reset
	r = requests.post(f"{BASE}/reset", json={"seed": 42})
	obs = r.json() # observation with metadata (obs_vector, info, etc.)

	# Step: setup preset 0, then run assay (action 12 for PCR)
	requests.post(f"{BASE}/step", json={"action": 0})
	r = requests.post(f"{BASE}/step", json={"action": 12})
	print(r.json()) # observation, reward, done

	# State
	state = requests.get(f"{BASE}/state").json()
	print(state["step_count"], state["best_result"])
	```

	### Deploy to Hugging Face

	To show SimLab on the Hugging Face Hub as an OpenEnv environment:

	1. Option A — Hugging Face Space (Docker)
	Create a Space with Docker as the SDK. Use a `Dockerfile` that installs SimLab and runs:
	```dockerfile
	CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
	```
	Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint.

	2. Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)
	The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and `openenv push` to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure.

	3. Link your repo on the Hub
	In your SimLab repo or any Hugging Face model/Space card, set the Repository and Documentation URLs to your GitHub repo and add a tag or short description such as: "OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."

	### References

	- [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) — framework overview and APIs
	- [OpenEnv on Hugging Face](https://huggingface.co/openenv) — OpenEnv org and environments
	- [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) — build, validate, push to Hub

	---

	## Environment API Reference

	```python
	from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec

	# Default: PCR experiment (same as before)
	env = LabEnv()
	# Or any experiment from a spec:
	# env = LabEnv(spec=my_experiment_spec)

	obs, info = env.reset(seed=42)

	# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
	# [0] step_index (normalised)
	# [1] elapsed_minutes (normalised)
	# [2] remaining_budget (normalised)
	# [3..] inventory (one per spec.inventory_items, normalised)
	# [...] last_result one-hot (len(spec.result_labels))
	# [...] has_setup, current_preset_idx (norm), best_result_score

	# Actions (Discrete, from spec):
	# 0 .. num_presets-1 setup_reaction(preset_index)
	# num_presets run_assay
	# num_presets+1 .. order_reagents (one per orderable_items)
	# ... wait, finish

	obs, reward, terminated, truncated, info = env.step(0) # setup preset 0
	obs, reward, terminated, truncated, info = env.step(12) # run assay (PCR)
	obs, reward, terminated, truncated, info = env.step(17) # finish (PCR)

	# Custom protocol (any params; spec must have evaluate_custom_protocol)
	obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})
	```

	---

	## License

	MIT