Spaces:

arminfg
/

biosim

Sleeping

App Files Files Community

arminfg commited on Mar 8

Commit

da63ca8

0 Parent(s):

SimLab: lab automation RL env, OpenEnv adapter, Training UI, agents

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +26 -0
.space/DEPLOY.md +39 -0
Dockerfile +24 -0
README.md +407 -0
agents/__init__.py +19 -0
agents/naive_agent.py +75 -0
agents/research_generate_agent.py +323 -0
agents/research_llm_agent.py +406 -0
agents/rl_agent.py +307 -0
demo/streamlit_app.py +185 -0
knowledge/pcr_protocols.json +32 -0
lab_env/__init__.py +4 -0
lab_env/env.py +369 -0
lab_env/openenv_adapter.py +231 -0
lab_env/spec.py +367 -0
pyproject.toml +24 -0
scripts/compare_all_agents.py +139 -0
scripts/demo_hackathon.sh +24 -0
scripts/demo_research_agent.py +45 -0
scripts/run_naive_baseline.py +75 -0
scripts/run_research_generate_agent.py +91 -0
scripts/train_and_eval_agent.py +148 -0
scripts/train_per_protocol.py +82 -0
scripts/visualize.py +258 -0
server/app.py +621 -0
v0ap/.gitignore +10 -0
v0ap/app/docs/page.tsx +87 -0
v0ap/app/globals.css +137 -0
v0ap/app/layout.tsx +68 -0
v0ap/app/page.tsx +24 -0
v0ap/app/training/page.tsx +114 -0
v0ap/app/workflows/[id]/page.tsx +180 -0
v0ap/app/workflows/page.tsx +24 -0
v0ap/components.json +21 -0
v0ap/components/app-sidebar.tsx +91 -0
v0ap/components/dashboard/performance-chart.tsx +88 -0
v0ap/components/dashboard/recent-experiments.tsx +79 -0
v0ap/components/dashboard/stats-cards.tsx +62 -0
v0ap/components/theme-provider.tsx +14 -0
v0ap/components/training/comparison-table.tsx +116 -0
v0ap/components/training/training-chart.tsx +119 -0
v0ap/components/training/training-controls.tsx +141 -0
v0ap/components/ui/accordion.tsx +66 -0
v0ap/components/ui/alert-dialog.tsx +157 -0
v0ap/components/ui/alert.tsx +66 -0
v0ap/components/ui/aspect-ratio.tsx +11 -0
v0ap/components/ui/avatar.tsx +53 -0
v0ap/components/ui/badge.tsx +46 -0
v0ap/components/ui/breadcrumb.tsx +109 -0
v0ap/components/ui/button-group.tsx +83 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,26 @@

+# Python
+__pycache__/
+*.py[cod]
+*.egg-info/
+.eggs/
+dist/
+build/
+.env
+.venv/
+venv/
+# Node / Next
+node_modules/
+.next/
+.v0ap/
+# IDE / OS
+.idea/
+.vscode/
+.DS_Store
+*.log
+# v0 runtime (if present)
+__v0_runtime_loader.js
+__v0_devtools.tsx
+__v0_jsx-dev-runtime.ts

.space/DEPLOY.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# Deploy SimLab to Hugging Face Spaces
+Use this to get a **Hugging Face Spaces link** (e.g. `https://YOUR_USERNAME-simlab-env.hf.space`).
+## Option 1: Docker Space (OpenEnv API only)
+1. Go to [https://huggingface.co/spaces](https://huggingface.co/spaces) and click **Create new Space**.
+2. Choose:
+   - **Name:** e.g. `simlab-env`
+   - **SDK:** **Docker**
+   - **Visibility:** Public (or Private)
+3. Push this repo (or the contents) to the Space repo, or copy the `Dockerfile` from the repo root into the Space.
+4. In the Space repo, the **Dockerfile** must be at the root. If your Space is a clone of simlab, the root already has the Dockerfile. If you created an empty Space, add a Dockerfile with:
+   ```dockerfile
+   FROM python:3.11-slim
+   WORKDIR /app
+   RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/*
+   COPY pyproject.toml ./
+   COPY lab_env ./lab_env/
+   RUN pip install --no-cache-dir -e .
+   ENV PORT=7860
+   EXPOSE 7860
+   CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port ${PORT}
+   ```
+5. Build and run. Your Space link will be:
+   - **`https://huggingface.co/spaces/YOUR_USERNAME/simlab-env`**
+   - or **`https://YOUR_USERNAME-simlab-env.hf.space`**
+That Space serves the **OpenEnv API** (POST /reset, POST /step, GET /state, GET /metadata). It does **not** serve the full Next.js Training/Workflow UI; for that you run the app locally or host it elsewhere.
+## Option 2: Link to an existing Space
+If the OpenEnv org or someone else has already deployed SimLab, the link might be:
+- **OpenEnv org:** [https://huggingface.co/openenv](https://huggingface.co/openenv) (list of envs; SimLab may be listed there if published)
+Once your Space is live, use **`https://huggingface.co/spaces/YOUR_USERNAME/simlab-env`** as the Hugging Face Spaces link.

Dockerfile ADDED Viewed

	@@ -0,0 +1,24 @@

+# Hugging Face Space — SimLab OpenEnv API
+# Exposes POST /reset, POST /step, GET /state, GET /metadata on port 7860
+FROM python:3.11-slim
+WORKDIR /app
+# Install system deps if needed
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    build-essential \
+    && rm -rf /var/lib/apt/lists/*
+# Copy package files
+COPY pyproject.toml ./
+COPY lab_env ./lab_env/
+# Install simlab (and openenv-core, torch, gymnasium, numpy)
+RUN pip install --no-cache-dir -e .
+# HF Spaces expect the app on port 7860
+ENV PORT=7860
+EXPOSE 7860
+CMD ["uvicorn", "lab_env.openenv_adapter:app", "--host", "0.0.0.0", "--port", "7860"]

README.md ADDED Viewed

	@@ -0,0 +1,407 @@

+# SimLab — Lab Automation RL Environment
+A self-contained Gymnasium-style reinforcement learning environment that
+simulates **any** wet-lab experiment workflow. The experiment type is defined by
+an **ExperimentSpec** (protocol presets, inventory, rewards, outcome model). The
+default spec is PCR (Polymerase Chain Reaction); you can plug in ELISA, custom
+assays, or any protocol-discovery task under real-world constraints: limited
+time, budget, and finite reagent inventory.
+Built for the **OpenEnv** ecosystem so it can be wrapped as an HTTP-served,
+sandboxed environment and uploaded to the OpenEnv hub on Hugging Face.
+**Integrations:** [OpenEnv](https://meta-pytorch.github.io/OpenEnv/) · [Hugging Face](https://huggingface.co/openenv)
+---
+## What the Environment Simulates
+Each episode represents a scientist at the bench trying to get a successful
+result. The environment:
+- **Samples a hidden optimal protocol** on every `reset()` — the agent never
+  sees it directly.
+- Offers **protocol presets** (defined in the spec) the agent can choose from.
+- Lets the agent **run assays** that consume reagents and time, returning
+  outcomes (e.g. success / partial / fail) from the spec’s outcome model.
+- **Custom protocols:** Specs with `evaluate_custom_protocol` (PCR, ELISA) allow
+  **arbitrary** protocol parameters via `env.run_assay_with_protocol(protocol_dict)` — agents can generate and try any valid params, not just presets.
+- Allows **ordering more reagents** (costs money and time) and **waiting**.
+- Terminates when the agent calls **finish**, runs out of time/budget, or
+  exhausts inventory with no way to reorder.
+**Default (PCR):** 12 presets (3 temps × 2 cycle counts × 2 reagent ratios);
+probabilistic success based on distance to hidden optimum. Other experiments
+use their own presets and outcome logic via a custom `ExperimentSpec`.
+### Reward structure (default PCR)
+The reward encodes real lab trade-offs (all configurable per spec):
+| Signal | Value |
+|---|---|
+| Immediate assay result: success | +15 |
+| Immediate assay result: partial | +5 |
+| Per-assay cost penalty | -3 |
+| Terminal bonus (best = success) | +60 |
+| Terminal bonus (best = partial) | +25 |
+| Terminal penalty (no success/partial) | -20 |
+| Time penalty | -0.25 per minute elapsed |
+A good agent learns to explore efficiently — try a few presets, read the
+signals from partial/success outcomes, and converge on the best protocol before
+finishing.
+---
+## Architecture
+```
+simlab/
+├── pyproject.toml              # Package metadata & dependencies
+├── README.md
+├── lab_env/
+│   ├── __init__.py
+│   ├── spec.py                 # ExperimentSpec, pcr_experiment_spec()
+│   ├── env.py                  # LabEnv (Gymnasium interface, any experiment)
+│   └── openenv_adapter.py      # OpenEnv types, LabEnvironment, HTTP app
+├── agents/
+│   ├── __init__.py
+│   ├── naive_agent.py          # Random-preset baseline
+│   ├── rl_agent.py             # REINFORCE policy-gradient agent (PyTorch)
+│   ├── research_llm_agent.py   # LLM researcher: presets + research
+│   └── research_generate_agent.py  # Research → generate any protocol → run → learn from feedback
+├── knowledge/
+│   └── pcr_protocols.json      # Fake “papers” for web_search tool (demo)
+├── demo/
+│   └── streamlit_app.py        # Live research dashboard + 3-agent comparison
+└── scripts/
+    ├── run_naive_baseline.py   # Evaluate the naive agent
+    ├── train_and_eval_agent.py # Train REINFORCE & compare both agents
+    ├── compare_all_agents.py  # Benchmark Naive vs RL vs Research LLM
+    ├── run_research_generate_agent.py  # Research → generate protocol → run → learn (any protocol)
+    └── demo_research_agent.py  # Terminal demo of research agent
+```
+### Defining a new experiment
+Implement an `ExperimentSpec` in `lab_env/spec.py` (or your own module) with:
+- **presets** — list of protocol dicts (e.g. temperature, cycles, ratio for PCR).
+- **inventory_items** / **orderable_items** — what the lab tracks and can reorder.
+- **initial_inventory**, **order_costs**, **result_labels**.
+- **sample_hidden_optimum(rng)** — returns hidden optimal state (e.g. ideal temp/cycles).
+- **sample_assay_result(hidden, preset_idx, presets, rng)** — returns outcome label.
+- **evaluate_custom_protocol(hidden, protocol_dict, rng)** (optional) — score an arbitrary protocol dict so agents can run any params via `env.run_assay_with_protocol(protocol_dict)`.
+- **protocol_param_schema** (optional) — dict describing params for codegen/LLM (e.g. `{"temp": {"type": "number"}, "cycles": {"type": "integer"}, ...}`).
+Then use `LabEnv(spec=my_spec)` or pass `spec` into the OpenEnv `LabEnvironment(spec=my_spec)`.
+### Agent design
+The **REINFORCE agent** decomposes the problem into a learned and a scripted
+part:
+- **Learned** — a 2-layer MLP (14 → 64 → 64 → 12) maps the observation to a
+  distribution over the 12 protocol presets.  Trained with REINFORCE + entropy
+  bonus + running-mean baseline.
+- **Scripted** — the episode loop (setup → run assay → check result → order
+  if needed → finish on success) is fixed so the agent focuses on the hard
+  decision: *which* preset to try.
+This decomposition lets training converge in ~2000 episodes (a few seconds on
+CPU) while clearly beating the random-preset naive baseline.
+The **Research LLM agent** adds a self-improving lab scientist: it researches
+protocols (via a `web_search` tool over a local knowledge base), hypothesizes
+new parameter combinations (mapped to presets), runs experiments in LabEnv, and
+updates internal knowledge from results.
+The **Research & Generate agent** (`research_generate_agent.py`) goes further: it
+**researches** (web_search), **generates** protocol parameters for **any** valid
+values (not limited to presets), **runs** them via `env.run_assay_with_protocol(protocol_dict)`,
+and **learns from feedback** — each run's (protocol, result, reward) is passed
+into the next trial so the agent improves over the episode. Works with any spec
+that has `evaluate_custom_protocol` (PCR, ELISA). Run it with:
+```bash
+export OPENAI_API_KEY=your_key
+python scripts/run_research_generate_agent.py --episodes 5 --verbose
+```
+Use `--workflow elisa-readout` for ELISA. Add `knowledge/{name}_protocols.json`
+for more experiment types so research has literature to search.
+### Training on different protocol sets
+Each **protocol** (PCR, ELISA, or a custom spec) has its own **presets** and outcome model. The RL agent can train on any of them so you get one policy per protocol set.
+- **One agent per protocol:** Create an agent with that spec and train it on an env with the same spec. The policy’s input/output sizes come from the spec (e.g. 14-dim obs → 12 presets for PCR; same for ELISA).
+- **Script:** `scripts/train_per_protocol.py` trains a separate REINFORCE agent for each workflow and saves checkpoints (e.g. `checkpoints/pcr-amplification.pt`, `checkpoints/elisa-readout.pt`):
+  ```bash
+  python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout --train-episodes 1500
+  ```
+- **Using agents to create different protocol sets:** You can define new protocol sets in two ways:
+  1. **In code:** Add a new `ExperimentSpec` in `lab_env/spec.py` (or your own module): define `presets`, `sample_hidden_optimum`, `sample_assay_result`, and optionally `evaluate_custom_protocol` + `protocol_param_schema`. Register it in `get_spec_for_workflow()` and run `train_per_protocol.py --workflows your-workflow-id`.
+  2. **Generated presets:** Use an LLM or script to produce a list of protocol dicts (e.g. different temps/cycles) and a simple outcome rule; wrap them in an `ExperimentSpec` and train an agent with `ReinforceAgent(spec=my_spec)` on `LabEnv(spec=my_spec)`. The Research & Generate agent already “creates” protocols at run time (arbitrary params); to **train** on a generated set, you’d turn that set into fixed presets in a new spec and train REINFORCE on it.
+---
+## Quick Start
+### Install
+```bash
+pip install -e .
+```
+Or just ensure `numpy`, `torch`, and `gymnasium` are installed.
+### Run the naive baseline
+```bash
+python scripts/run_naive_baseline.py --episodes 200
+```
+### Train the REINFORCE agent and compare
+```bash
+python scripts/train_and_eval_agent.py --train-episodes 2000 --eval-episodes 100
+```
+### Next.js UI + API server (general UI)
+Run the FastAPI backend, then the Next.js frontend (with API proxy to the backend):
+```bash
+# Terminal 1: Python API (agents + LabEnv)
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Terminal 2: Next.js frontend (v0ap)
+cd v0ap && pnpm dev
+```
+Then open the workflow run page (e.g. `/workflows/pcr-amplification`). The UI shows **Run with AI Agent**, **Run Research Agent** (research → hypothesize → experiment → learn), and **Run Naive Baseline**. The timeline displays which agent was used and each step (Research, Hypothesis, Run Assay, Learn for the research agent). Set `OPENAI_API_KEY` if you use the Research agent.
+---
+## Hackathon / live demo — how to show the RL
+**Pitch in one line:** *“We simulate a lab where an agent has to discover the right protocol; you see it learn with RL and compare to baselines.”*
+### Setup (do this before going on stage)
+1. **Start both servers** (two terminals):
+   ```bash
+   # Terminal 1 — API (agents + LabEnv)
+   uvicorn server.app:app --host 0.0.0.0 --port 8000
+   # Terminal 2 — UI
+   cd v0ap && pnpm dev
+   ```
+2. Open **http://localhost:3000** (or the URL Next.js prints).
+3. Optional: set `OPENAI_API_KEY` if you want to demo Research / Research & Generate.
+### Demo flow A — “Watch the RL agent learn” (~2 min)
+1. Go to **Training** (`/training`).
+2. Say: *“This is our wet-lab sim. The agent doesn’t know the optimal protocol; it has to learn from trial and error.”*
+3. Set **episodes to 500** (slider) for a short run — training finishes in under a minute on a laptop.
+4. Click **Start Training**. Point at:
+   - **Progress** and “Episode X of 500”.
+   - **Chart**: reward and success rate climbing over episodes.
+5. When it finishes: *“Here’s the comparison: REINFORCE vs random baseline.”* Show the table (success rate, reward, time).
+### Demo flow B — “Compare agents in the lab” (~1–2 min)
+1. Go to **PCR Amplification** (`/workflows/pcr-amplification`).
+2. Say: *“Each run is one scientist trying to get a successful experiment under time and budget.”*
+3. Click **Run Naive Baseline** — timeline fills with random preset choices and results.
+4. Then click **Run with AI Agent** (uses the policy you trained in flow A, or a default). Point at the timeline: *“The learned agent picks protocols more purposefully and often gets success sooner.”*
+5. If you have an API key: click **Research & Generate (any protocol)** — *“This one researches, proposes parameters, runs them, and learns from feedback.”*
+### Tips
+- **Keep training short on stage:** 500 episodes is enough to show learning; 1000 if you have time.
+- **If the UI is slow:** Run a quick train in the background before the demo, then only show “Run with AI Agent” and the comparison table.
+- **Backup:** Pre-record a 1‑minute screen capture of training + one workflow run; use it if WiFi or live run fails.
+- **Talking points:** Hidden optimal protocol, limited time/budget, REINFORCE policy over presets, Research & Generate for “any protocol” + learning from feedback.
+### Demo script (optional)
+From repo root, run `./scripts/demo_hackathon.sh` for a short checklist and the option to start the API in that terminal. Or start both manually:
+```bash
+# Terminal 1
+uvicorn server.app:app --host 0.0.0.0 --port 8000
+# Terminal 2
+cd v0ap && pnpm dev
+# Open http://localhost:3000 → /training or /workflows/pcr-amplification
+```
+---
+### Research LLM agent (optional, Streamlit)
+Install demo dependencies (`openai`, `streamlit`) and set `OPENAI_API_KEY`:
+```bash
+pip install -e ".[demo]"
+export OPENAI_API_KEY=your_key
+streamlit run demo/streamlit_app.py
+```
+The Streamlit app shows the research flow (research → hypothesize → experiment → learn) and a 3-agent comparison table. To benchmark all agents from the terminal:
+```bash
+python scripts/compare_all_agents.py --eval-episodes 50
+```
+### Sample output (train & eval)
+```
+Metric                  REINFORCE        Naive
+----------------------------------------------
+Avg reward                   15.7          5.0
+Success rate                53.0%        43.0%
+Partial rate                19.0%        15.0%
+Avg time                    62.8m        63.0m
+Avg cost                     $0.0         $0.0
+Avg steps                     7.0          7.0
+----------------------------------------------
+```
+---
+## OpenEnv & Hugging Face — How to show and use
+SimLab is built for the **OpenEnv** ecosystem and can be served over HTTP and deployed to **Hugging Face** as a standardized agentic environment.
+### How SimLab uses OpenEnv
+- **`openenv-core`** is a required dependency (`pyproject.toml`).
+- **`lab_env/openenv_adapter.py`** wraps `LabEnv` in the OpenEnv `Environment` interface:
+  - **Types:** `LabAction`, `LabObservation`, `LabState`, `LabEnvironment`
+  - **`create_app(LabEnvironment, LabAction, LabObservation, ...)`** — FastAPI app with OpenEnv endpoints
+### Run the OpenEnv HTTP server
+```bash
+uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
+```
+This exposes standard OpenEnv endpoints:
+| Endpoint        | Description                    |
+|----------------|--------------------------------|
+| `POST /reset`  | Reset environment, get initial observation |
+| `POST /step`   | Send action, get next observation & reward |
+| `GET /state`   | Current state snapshot        |
+| `GET /metadata`| Environment name, version, docs |
+| WebSocket `/ws`| Persistent session (optional)  |
+Up to `max_concurrent_envs=4` sessions are supported.
+### Call the OpenEnv server (show usage)
+From another process or machine, you can drive SimLab over HTTP:
+```bash
+# Reset (start new episode)
+curl -s -X POST http://localhost:8000/reset -H "Content-Type: application/json" -d '{"seed": 42}' | jq .
+# Step (e.g. action 0 = setup preset 0)
+curl -s -X POST http://localhost:8000/step -H "Content-Type: application/json" -d '{"action": 0}' | jq .
+# Get current state
+curl -s http://localhost:8000/state | jq .
+```
+From Python (e.g. for demos or integration):
+```python
+import requests
+BASE = "http://localhost:8000"
+# Reset
+r = requests.post(f"{BASE}/reset", json={"seed": 42})
+obs = r.json()  # observation with metadata (obs_vector, info, etc.)
+# Step: setup preset 0, then run assay (action 12 for PCR)
+requests.post(f"{BASE}/step", json={"action": 0})
+r = requests.post(f"{BASE}/step", json={"action": 12})
+print(r.json())  # observation, reward, done
+# State
+state = requests.get(f"{BASE}/state").json()
+print(state["step_count"], state["best_result"])
+```
+### Deploy to Hugging Face
+To **show SimLab on the Hugging Face Hub** as an OpenEnv environment:
+1. **Option A — Hugging Face Space (Docker)**
+   Create a Space with **Docker** as the SDK. Use a `Dockerfile` that installs SimLab and runs:
+   ```dockerfile
+   CMD uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 7860
+   ```
+   Point the Space to your repo and set the port to 7860 (or the port HF expects). Your Space URL (e.g. `https://huggingface.co/spaces/your-username/simlab-env`) is then the public OpenEnv endpoint.
+2. **Option B — OpenEnv CLI (if you adopt the full OpenEnv layout)**
+   The [OpenEnv Packaging & Deploying](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) guide uses `openenv init`, `openenv build`, and **`openenv push`** to deploy to the Hub. SimLab currently uses `openenv-core` and a custom adapter; to use `openenv push`, you would add the expected layout (e.g. `openenv.yaml`, `server/` with Dockerfile) and wire the existing `LabEnvironment` + `create_app` into that structure.
+3. **Link your repo on the Hub**
+   In your SimLab repo or any Hugging Face model/Space card, set the **Repository** and **Documentation** URLs to your GitHub repo and add a tag or short description such as: *"OpenEnv-compatible lab automation environment; run with `uvicorn lab_env.openenv_adapter:app` and connect via POST /reset, POST /step."*
+### References
+- [OpenEnv documentation](https://meta-pytorch.github.io/OpenEnv/) — framework overview and APIs
+- [OpenEnv on Hugging Face](https://huggingface.co/openenv) — OpenEnv org and environments
+- [Packaging & Deploying (OpenEnv)](https://meta-pytorch.github.io/OpenEnv/auto_getting_started/environment-builder.html) — build, validate, push to Hub
+---
+## Environment API Reference
+```python
+from lab_env import LabEnv, ExperimentSpec, pcr_experiment_spec
+# Default: PCR experiment (same as before)
+env = LabEnv()
+# Or any experiment from a spec:
+# env = LabEnv(spec=my_experiment_spec)
+obs, info = env.reset(seed=42)
+# obs shape and action count come from env.spec (e.g. PCR: 14-dim obs, 18 actions)
+#   [0]    step_index (normalised)
+#   [1]    elapsed_minutes (normalised)
+#   [2]    remaining_budget (normalised)
+#   [3..]  inventory (one per spec.inventory_items, normalised)
+#   [...]  last_result one-hot (len(spec.result_labels))
+#   [...]  has_setup, current_preset_idx (norm), best_result_score
+# Actions (Discrete, from spec):
+#   0 .. num_presets-1   setup_reaction(preset_index)
+#   num_presets          run_assay
+#   num_presets+1 ..     order_reagents (one per orderable_items)
+#   ...                  wait, finish
+obs, reward, terminated, truncated, info = env.step(0)    # setup preset 0
+obs, reward, terminated, truncated, info = env.step(12)   # run assay (PCR)
+obs, reward, terminated, truncated, info = env.step(17)   # finish (PCR)
+# Custom protocol (any params; spec must have evaluate_custom_protocol)
+obs, reward, term, trunc, info = env.run_assay_with_protocol({"temp": 57.5, "cycles": 32, "ratio": "conservative"})
+```
+---
+## License
+MIT

agents/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+try:
+    from agents.research_llm_agent import ResearchLLMAgent
+except ImportError:
+    ResearchLLMAgent = None  # type: ignore[misc, assignment]
+try:
+    from agents.research_generate_agent import ResearchGenerateAgent
+except ImportError:
+    ResearchGenerateAgent = None  # type: ignore[misc, assignment]
+__all__ = [
+    "NaiveAgent",
+    "ReinforceAgent",
+    "ResearchLLMAgent",
+    "ResearchGenerateAgent",
+]

agents/naive_agent.py ADDED Viewed

	@@ -0,0 +1,75 @@

+"""
+Naive baseline agent for LabEnv.
+Follows a fixed strategy:
+  1. Pick a random protocol preset.
+  2. Run the assay.
+  3. Repeat for a fixed number of trials.
+  4. Finish.
+If inventory is low, order reagents before running.
+"""
+from __future__ import annotations
+import numpy as np
+from lab_env.env import (
+    ACTION_FINISH,
+    ACTION_ORDER_BUFFER,
+    ACTION_ORDER_POLYMERASE,
+    ACTION_ORDER_TIPS,
+    ACTION_RUN_ASSAY,
+    ACTION_SETUP_START,
+    NUM_PRESETS,
+)
+class NaiveAgent:
+    """Baseline agent: random preset selection, fixed trial count, no learning."""
+    def __init__(self, num_trials: int = 3, seed: int | None = None) -> None:
+        self.num_trials = num_trials
+        self._rng = np.random.default_rng(seed)
+        self._trial: int = 0
+        self._phase: str = "setup"
+    def reset(self) -> None:
+        self._trial = 0
+        self._phase = "setup"
+    def select_action(self, obs: np.ndarray) -> int:
+        """Choose the next action based on a scripted strategy."""
+        tips = obs[3]
+        buffer = obs[4]
+        poly = obs[5]
+        samples = obs[6]
+        inventory_low = min(tips, buffer, poly, samples) < 0.05  # ~1 unit
+        if self._trial >= self.num_trials:
+            return ACTION_FINISH
+        if self._phase == "setup":
+            if inventory_low:
+                return self._order_cheapest(obs)
+            preset = int(self._rng.integers(0, NUM_PRESETS))
+            self._phase = "run"
+            return ACTION_SETUP_START + preset
+        if self._phase == "run":
+            self._phase = "setup"
+            self._trial += 1
+            return ACTION_RUN_ASSAY
+        return ACTION_FINISH
+    def update(self, *_args: object, **_kwargs: object) -> None:
+        """No-op — the naive agent does not learn."""
+    @staticmethod
+    def _order_cheapest(obs: np.ndarray) -> int:
+        tips, buffer, poly = obs[3], obs[4], obs[5]
+        if tips <= buffer and tips <= poly:
+            return ACTION_ORDER_TIPS
+        if buffer <= poly:
+            return ACTION_ORDER_BUFFER
+        return ACTION_ORDER_POLYMERASE

agents/research_generate_agent.py ADDED Viewed

	@@ -0,0 +1,323 @@

+"""
+Research & Generate agent: research → generate protocol (any params) → run → learn from feedback.
+Uses the spec's protocol_param_schema so it works for any experiment type (PCR, ELISA, etc.).
+Generates arbitrary protocol dicts (not limited to presets), runs them via
+env.run_assay_with_protocol(), and learns from (protocol, result, reward) history.
+"""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+from typing import Any
+try:
+    from openai import OpenAI
+except ImportError:
+    OpenAI = None  # type: ignore[misc, assignment]
+from lab_env.env import LabEnv
+from lab_env.spec import ExperimentSpec
+# ---------------------------------------------------------------------------
+# Knowledge base (per-experiment)
+# ---------------------------------------------------------------------------
+KNOWLEDGE_DIR = Path(__file__).resolve().parent.parent / "knowledge"
+def load_protocols_knowledge(experiment_name: str) -> list[dict[str, Any]]:
+    """Load literature/knowledge for an experiment. Tries {name}_protocols.json then pcr_protocols.json."""
+    for name in (experiment_name, "pcr"):
+        path = KNOWLEDGE_DIR / f"{name}_protocols.json"
+        if path.exists():
+            with open(path, encoding="utf-8") as f:
+                return json.load(f)
+    return []
+def web_search(query: str, experiment_name: str = "pcr", top_k: int = 4) -> str:
+    """Search literature for the given experiment type; return relevant snippets."""
+    papers = load_protocols_knowledge(experiment_name)
+    query_lower = query.lower()
+    scored = []
+    for p in papers:
+        text = (
+            f"{p.get('title','')} {p.get('abstract','')} "
+            f"{p.get('keywords','')} {p.get('recommendations','')}"
+        ).lower()
+        score = sum(1 for w in query_lower.split() if len(w) > 2 and w in text)
+        if score > 0:
+            scored.append((score, p))
+    scored.sort(key=lambda x: -x[0])
+    if not scored:
+        return (
+            "No relevant literature found. Try general terms for this experiment type, "
+            "e.g. temperature, cycles, protocol parameters."
+        )
+    out = []
+    for _, p in scored[:top_k]:
+        out.append(f"[{p.get('title','')}] {p.get('recommendations','')}")
+    return "\n".join(out)
+# ---------------------------------------------------------------------------
+# Build tool schemas from spec (any protocol)
+# ---------------------------------------------------------------------------
+def build_tool_schemas(spec: ExperimentSpec) -> list[dict[str, Any]]:
+    """Build OpenAI tool schemas from spec: web_search + run_experiment with protocol_param_schema."""
+    schema = spec.protocol_param_schema
+    if not schema:
+        # Fallback for specs without protocol_param_schema (e.g. custom)
+        run_params = {
+            "type": "object",
+            "properties": {"protocol": {"type": "object", "description": "Protocol parameters as key-value dict"}},
+            "required": ["protocol"],
+        }
+    else:
+        run_params = {
+            "type": "object",
+            "properties": {
+                k: {
+                    "type": v.get("type", "string"),
+                    "description": v.get("description", ""),
+                }
+                | ({"enum": v["enum"]} if "enum" in v else {})
+                for k, v in schema.items()
+            },
+            "required": list(schema.keys()),
+        }
+    return [
+        {
+            "type": "function",
+            "function": {
+                "name": "web_search",
+                "description": f"Search scientific literature for {spec.name} protocols and parameter recommendations.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "query": {
+                            "type": "string",
+                            "description": "Search query, e.g. 'optimal temperature and cycles'",
+                        },
+                    },
+                    "required": ["query"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "run_experiment",
+                "description": (
+                    f"Run one {spec.name} experiment with the given protocol parameters. "
+                    "You can use any valid values (not limited to presets). The lab returns success/partial/fail."
+                ),
+                "parameters": run_params,
+            },
+        },
+    ]
+# ---------------------------------------------------------------------------
+# Research & Generate Agent
+# ---------------------------------------------------------------------------
+class ResearchGenerateAgent:
+    """
+    Agentic flow: research → generate protocol (any params) → run in env → get feedback → learn.
+    Works with any ExperimentSpec that has evaluate_custom_protocol and protocol_param_schema.
+    Maintains history of (protocol, result, reward) so the LLM learns from feedback.
+    """
+    def __init__(
+        self,
+        model: str = "gpt-4o-mini",
+        max_trials: int = 6,
+    ) -> None:
+        if OpenAI is None:
+            raise ImportError("Optional dependency 'openai' is required. Install with: pip install openai")
+        self.model = model
+        self.max_trials = max_trials
+        self._client: OpenAI | None = None
+        self.feedback_history: list[dict[str, Any]] = []
+    @property
+    def client(self) -> OpenAI:
+        if self._client is None:
+            api_key = os.environ.get("OPENAI_API_KEY")
+            if not api_key:
+                raise RuntimeError(
+                    "OPENAI_API_KEY environment variable is required for ResearchGenerateAgent"
+                )
+            self._client = OpenAI(api_key=api_key)
+        return self._client
+    def _run_tool(
+        self,
+        name: str,
+        arguments: dict[str, Any],
+        env: LabEnv,
+    ) -> tuple[str, float | None]:
+        """Execute one tool. Returns (result_string, reward_if_run_experiment else None)."""
+        spec = env.spec
+        if name == "web_search":
+            q = arguments.get("query", "")
+            result = web_search(q, experiment_name=spec.name)
+            return result, None
+        if name == "run_experiment":
+            # Protocol dict: use protocol_param_schema keys (e.g. temp, cycles, ratio for PCR)
+            protocol = dict(arguments)
+            if "protocol" in protocol and isinstance(protocol["protocol"], dict):
+                protocol = protocol["protocol"]
+            try:
+                obs, reward, term, trunc, info = env.run_assay_with_protocol(protocol)
+            except ValueError as e:
+                return f"Error: {e}", None
+            result = info.get("last_result", "fail")
+            return (
+                f"Ran protocol {protocol}. Result: {result}. Reward: {reward:.1f}. "
+                f"Best so far: {info.get('best_result', 'none')}.",
+                reward,
+            )
+        return "Unknown tool.", None
+    def _order_reagents_if_low(self, env: LabEnv, obs: Any, info: dict) -> tuple[Any, dict, float]:
+        """Order reagents when inventory is low; return (obs, info, total_reward)."""
+        spec = env.spec
+        inv = info.get("inventory", {})
+        budget = info.get("remaining_budget", 0)
+        order_start = spec.action_order_start()
+        order_actions = list(range(order_start, spec.action_order_end()))
+        total_rew = 0.0
+        for idx, item in enumerate(spec.orderable_items):
+            if inv.get(item, 0) < 2:
+                cost = spec.order_costs.get(item, (0, float("inf")))[1]
+                if budget >= cost:
+                    action = order_start + idx
+                    obs, rew, term, trunc, info = env.step(action)
+                    total_rew += rew
+                    inv = info.get("inventory", inv)
+                    budget = info.get("remaining_budget", budget)
+                    if term or trunc:
+                        break
+        return env._obs(), env._info(), total_rew
+    def run_episode(
+        self,
+        env: LabEnv,
+        seed: int,
+        *,
+        verbose: bool = False,
+    ) -> dict[str, Any]:
+        """
+        Run one episode: each trial = research (optional) → generate protocol → run → record feedback.
+        Feedback history is passed into the next trial so the agent learns.
+        """
+        if env.spec.evaluate_custom_protocol is None:
+            raise ValueError(
+                "This environment spec does not support custom protocols. "
+                "Use a spec with evaluate_custom_protocol (e.g. PCR, ELISA)."
+            )
+        obs, info = env.reset(seed=seed)
+        spec = env.spec
+        tools = build_tool_schemas(spec)
+        self.feedback_history = []
+        total_reward = 0.0
+        steps = 0
+        for trial in range(self.max_trials):
+            if info.get("best_result") == "success":
+                obs, rew, _, _, info = env.step(spec.action_finish())
+                total_reward += rew
+                steps += 1
+                break
+            # Order reagents if needed
+            obs, info, order_rew = self._order_reagents_if_low(env, obs, info)
+            total_reward += order_rew
+            if getattr(env, "_terminated", False) or getattr(env, "_truncated", False):
+                break
+            # Build prompt with feedback from previous runs (learning)
+            feedback_text = ""
+            if self.feedback_history:
+                feedback_text = "Previous runs this episode (learn from these):\n"
+                for i, entry in enumerate(self.feedback_history[-6:], 1):
+                    feedback_text += (
+                        f"  {i}. Protocol: {entry['protocol']} → "
+                        f"Result: {entry['result']}, Reward: {entry['reward']:.1f}\n"
+                    )
+                feedback_text += "\n"
+            param_desc = json.dumps(spec.protocol_param_schema, indent=2) if spec.protocol_param_schema else "protocol params (key-value dict)"
+            system_msg = (
+                f"You are a lab scientist running a {spec.name} experiment. "
+                "You have tools: web_search (research literature), run_experiment (run one assay with your chosen protocol). "
+                "Generate a protocol using the parameter schema below. You can use ANY valid values (not just presets). "
+                "Use research and feedback from previous runs to improve. Output exactly one run_experiment call per turn.\n\n"
+                f"Parameter schema for run_experiment:\n{param_desc}"
+            )
+            user_msg = (
+                f"{feedback_text}"
+                f"Current state: best_result={info.get('best_result')}, last_result={info.get('last_result')}, "
+                f"inventory={info.get('inventory')}, remaining_budget=${info.get('remaining_budget', 0):.0f}. "
+                f"Trial {trial + 1}/{self.max_trials}. Call run_experiment with your next protocol (one call only)."
+            )
+            messages = [
+                {"role": "system", "content": system_msg},
+                {"role": "user", "content": user_msg},
+            ]
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                tools=tools,
+                tool_choice="required",
+            )
+            choice = response.choices[0]
+            if not choice.message.tool_calls:
+                break
+            tc = choice.message.tool_calls[0]
+            name = tc.function.name
+            args = json.loads(tc.function.arguments or "{}")
+            result_str, run_reward = self._run_tool(name, args, env)
+            if run_reward is not None:
+                total_reward += run_reward
+                steps += 1
+                protocol = dict(args)
+                if "protocol" in protocol and isinstance(protocol["protocol"], dict):
+                    protocol = protocol["protocol"]
+                self.feedback_history.append({
+                    "protocol": protocol,
+                    "result": env._info().get("last_result", "fail"),
+                    "reward": run_reward,
+                })
+                if verbose:
+                    print(f"  Trial {trial + 1}: {protocol} → {self.feedback_history[-1]['result']} (reward {run_reward:.1f})")
+            if getattr(env, "_terminated", False) or getattr(env, "_truncated", False):
+                break
+        if not (getattr(env, "_terminated", False) or getattr(env, "_truncated", False)) and info.get("best_result") != "success":
+            obs, rew, _, _, info = env.step(spec.action_finish())
+            total_reward += rew
+            steps += 1
+        return {
+            "reward": total_reward,
+            "success": info.get("best_result") == "success",
+            "partial": info.get("best_result") == "partial",
+            "minutes": info.get("elapsed_minutes", 0.0),
+            "cost": spec.initial_budget - info.get("remaining_budget", spec.initial_budget),
+            "steps": steps,
+            "num_protocols_tried": len(self.feedback_history),
+        }

agents/research_llm_agent.py ADDED Viewed

	@@ -0,0 +1,406 @@

+"""
+Research LLM agent for LabEnv.
+ReAct-style agent that: researches protocols (web_search), hypothesizes params,
+runs experiments in LabEnv (run_experiment), and learns from results (analyze + update_knowledge).
+Uses the same LabEnv action space; continuous hypotheses are mapped to nearest preset.
+"""
+from __future__ import annotations
+import json
+import os
+from pathlib import Path
+from typing import Any
+try:
+    from openai import OpenAI
+except ImportError:
+    OpenAI = None  # type: ignore[misc, assignment]
+from lab_env.env import (
+    ACTION_FINISH,
+    ACTION_ORDER_BUFFER,
+    ACTION_ORDER_POLYMERASE,
+    ACTION_ORDER_TIPS,
+    ACTION_RUN_ASSAY,
+    ACTION_SETUP_START,
+    INITIAL_BUDGET,
+)
+# ---------------------------------------------------------------------------
+# Knowledge base (web_search tool)
+# ---------------------------------------------------------------------------
+def _load_protocols_knowledge() -> list[dict[str, Any]]:
+    path = Path(__file__).resolve().parent.parent / "knowledge" / "pcr_protocols.json"
+    if not path.exists():
+        return []
+    with open(path, encoding="utf-8") as f:
+        return json.load(f)
+def _web_search_impl(query: str, top_k: int = 3) -> str:
+    """Search fake literature; return top_k relevant snippets."""
+    papers = _load_protocols_knowledge()
+    query_lower = query.lower()
+    scored = []
+    for p in papers:
+        text = f"{p.get('title','')} {p.get('abstract','')} {p.get('keywords','')} {p.get('recommendations','')}".lower()
+        score = sum(1 for w in query_lower.split() if len(w) > 2 and w in text)
+        if score > 0:
+            scored.append((score, p))
+    scored.sort(key=lambda x: -x[0])
+    if not scored:
+        return "No relevant literature found. Try general terms: annealing temperature, cycles, PCR protocol."
+    out = []
+    for _, p in scored[:top_k]:
+        out.append(f"[{p.get('title','')}] {p.get('recommendations','')}")
+    return "\n".join(out)
+# ---------------------------------------------------------------------------
+# Preset mapping: continuous (temp, cycles, ratio) -> nearest preset index
+# ---------------------------------------------------------------------------
+def _params_to_preset_index(
+    presets: list[dict[str, Any]],
+    temp: float,
+    cycles: int,
+    ratio: str,
+) -> int:
+    """Map (temp, cycles, ratio) to nearest preset index."""
+    ratio_clean = ratio.strip().lower()
+    if "conservative" in ratio_clean or ratio_clean == "conservative":
+        ratio_clean = "conservative"
+    else:
+        ratio_clean = "aggressive"
+    best_idx = 0
+    best_dist = float("inf")
+    for i, p in enumerate(presets):
+        dt = abs(float(p["temp"]) - temp)
+        dc = abs(int(p["cycles"]) - cycles)
+        dr = 0 if str(p.get("ratio", "")).lower() == ratio_clean else 10
+        dist = dt + dc * 0.1 + dr
+        if dist < best_dist:
+            best_dist = dist
+            best_idx = i
+    return best_idx
+# ---------------------------------------------------------------------------
+# Tool schemas (OpenEnv-style JSON for LLM)
+# ---------------------------------------------------------------------------
+TOOL_SCHEMAS = [
+    {
+        "type": "function",
+        "function": {
+            "name": "web_search",
+            "description": "Search scientific literature for PCR protocols, annealing temperature, cycle number, or reagent ratio recommendations.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {
+                        "type": "string",
+                        "description": "Search query, e.g. 'optimal annealing temp for AT-rich primers'",
+                    },
+                },
+                "required": ["query"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "run_experiment",
+            "description": "Run a PCR experiment with the given parameters. Temperature will be mapped to nearest preset (55, 65, or 72°C); cycles to 25 or 35; ratio to conservative or aggressive.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "temp": {"type": "number", "description": "Annealing temperature in °C (e.g. 57.5)"},
+                    "cycles": {"type": "integer", "description": "Number of PCR cycles (e.g. 32)"},
+                    "ratio": {"type": "string", "description": "Reagent ratio: 'conservative' or 'aggressive'"},
+                },
+                "required": ["temp", "cycles", "ratio"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "analyze_result",
+            "description": "Compare the current result to previous experiments and summarize what we learned.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "current_result": {"type": "string", "description": "Last assay result: success, partial, or fail"},
+                    "summary": {"type": "string", "description": "Brief analysis: what this result suggests for next parameters"},
+                },
+                "required": ["current_result", "summary"],
+            },
+        },
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "update_knowledge",
+            "description": "Record what we learned about optimal parameters (temperature range, cycle range, or notes).",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "temp_range": {
+                        "type": "array",
+                        "items": {"type": "number"},
+                        "description": "Optional [low, high] °C range for optimal annealing",
+                    },
+                    "cycle_range": {
+                        "type": "array",
+                        "items": {"type": "integer"},
+                        "description": "Optional [low, high] cycle range",
+                    },
+                    "notes": {"type": "string", "description": "Optional text note about what we learned"},
+                },
+            },
+        },
+    },
+]
+# ---------------------------------------------------------------------------
+# Research LLM Agent
+# ---------------------------------------------------------------------------
+class ResearchLLMAgent:
+    """LLM agent that researches, hypothesizes, runs experiments, and learns."""
+    def __init__(
+        self,
+        model: str = "gpt-4o-mini",
+        max_trials: int = 5,
+        knowledge_path: str | None = None,
+    ) -> None:
+        if OpenAI is None:
+            raise ImportError("Optional dependency 'openai' is required. Install with: pip install openai")
+        self.model = model
+        self.max_trials = max_trials
+        self._client: OpenAI | None = None
+        self.knowledge_path = knowledge_path
+        self.knowledge: dict[str, Any] = {
+            "temp_range": [50.0, 70.0],
+            "cycle_range": [20, 40],
+            "past_experiments": [],
+            "notes": "",
+        }
+        self._episode_buffer: list[dict[str, Any]] = []  # last 5 episodes for context
+        self._last_research: str = ""
+        self._last_hypothesis: dict[str, Any] = {}
+        self._last_result: str = ""
+        self._last_params: dict[str, Any] = {}
+        self._last_run_reward: float = 0.0
+    @property
+    def client(self) -> OpenAI:
+        if self._client is None:
+            api_key = os.environ.get("OPENAI_API_KEY")
+            if not api_key:
+                raise RuntimeError("OPENAI_API_KEY environment variable is required for ResearchLLMAgent")
+            self._client = OpenAI(api_key=api_key)
+        return self._client
+    def _run_tool(self, name: str, arguments: dict[str, Any], env: Any) -> str:
+        """Execute one tool and return result string."""
+        if name == "web_search":
+            q = arguments.get("query", "")
+            result = _web_search_impl(q)
+            self._last_research = result
+            return result
+        if name == "run_experiment":
+            temp = float(arguments.get("temp", 60))
+            cycles = int(arguments.get("cycles", 30))
+            ratio = str(arguments.get("ratio", "conservative"))
+            presets = env.spec.presets
+            idx = _params_to_preset_index(presets, temp, cycles, ratio)
+            preset = presets[idx]
+            self._last_params = {"temp": preset["temp"], "cycles": preset["cycles"], "ratio": preset["ratio"]}
+            self._last_hypothesis = {"temp": temp, "cycles": cycles, "ratio": ratio}
+            setup_action = ACTION_SETUP_START + idx
+            obs, r1, term, trunc, info = env.step(setup_action)
+            if term or trunc:
+                self._last_run_reward = r1
+                return f"Environment ended after setup. Result: {info.get('last_result', 'none')}"
+            obs, r2, term, trunc, info = env.step(ACTION_RUN_ASSAY)
+            result = info.get("last_result", "fail")
+            self._last_result = result
+            self._last_run_reward = r1 + r2
+            return f"Ran preset {preset}. Result: {result}. Reward: {r1 + r2:.1f}"
+        if name == "analyze_result":
+            current = arguments.get("current_result", "")
+            summary = arguments.get("summary", "")
+            return f"Analysis: {summary}"
+        if name == "update_knowledge":
+            if "temp_range" in arguments:
+                self.knowledge["temp_range"] = arguments["temp_range"]
+            if "cycle_range" in arguments:
+                self.knowledge["cycle_range"] = arguments["cycle_range"]
+            if "notes" in arguments:
+                self.knowledge["notes"] = arguments["notes"]
+            return "Knowledge updated."
+        return "Unknown tool."
+    def _inventory_low(self, obs: Any) -> bool:
+        return float(min(obs[3], obs[4], obs[5], obs[6])) < 0.08
+    def _order_reagents(self, env: Any, obs: Any, info: dict, steps: int) -> tuple[Any, float, dict, int]:
+        total_rew = 0.0
+        for action in (ACTION_ORDER_TIPS, ACTION_ORDER_BUFFER, ACTION_ORDER_POLYMERASE):
+            obs, rew, term, trunc, info = env.step(action)
+            total_rew += rew
+            steps += 1
+            if term or trunc:
+                break
+        return obs, total_rew, info, steps
+    def run_episode(
+        self,
+        env: Any,
+        seed: int,
+        *,
+        verbose: bool = False,
+        episode_callback: list[dict[str, Any]] | None = None,
+    ) -> dict[str, Any]:
+        """Run one episode: Research -> Hypothesize -> Execute -> Learn per trial."""
+        obs, info = env.reset(seed=seed)
+        total_reward = 0.0
+        steps = 0
+        presets = env.spec.presets
+        last_params_used: dict[str, Any] = {}
+        for trial in range(self.max_trials):
+            if info.get("best_result") == "success":
+                obs, rew, _, _, info = env.step(ACTION_FINISH)
+                total_reward += rew
+                steps += 1
+                break
+            if self._inventory_low(obs):
+                obs, rew, info, steps = self._order_reagents(env, obs, info, steps)
+                total_reward += rew
+                if getattr(env, "_terminated", False) or getattr(env, "_truncated", False):
+                    break
+            # Stage 1: Research
+            research_query = (
+                "optimal annealing temperature and cycle number for PCR"
+                if trial == 0
+                else f"PCR protocol improvement: last result was {info.get('last_result','none')} with params {last_params_used}"
+            )
+            research_text = _web_search_impl(research_query)
+            self._last_research = research_text
+            # Stage 2: Hypothesize (LLM chooses temp, cycles, ratio)
+            state_desc = (
+                f"Last result: {info.get('last_result','none')}. "
+                f"Best so far: {info.get('best_result','none')}. "
+                f"Inventory: {info.get('inventory',{})}. "
+                f"Budget: ${info.get('remaining_budget',0):.0f}. "
+                f"Knowledge: temp_range={self.knowledge['temp_range']}, cycle_range={self.knowledge['cycle_range']}. "
+                f"Past experiments this episode: {self.knowledge['past_experiments'][-5:]}."
+            )
+            if last_params_used:
+                state_desc += f" Last params used: {last_params_used}."
+            sys_msg = (
+                "You are a lab scientist optimizing a PCR protocol. You have access to tools: "
+                "web_search (already done: use the research below), run_experiment (use this to try one protocol). "
+                "Output exactly one run_experiment call with temp (number °C), cycles (integer), ratio ('conservative' or 'aggressive'). "
+                "Use the research and past results to pick the best next parameters. "
+                "Available presets in the lab are temp in [55, 65, 72], cycles in [25, 35], ratio conservative or aggressive; "
+                "your values will be mapped to the nearest preset."
+            )
+            user_msg = (
+                f"Research:\n{research_text}\n\n"
+                f"Current state: {state_desc}\n\n"
+                "Call run_experiment with your chosen temp, cycles, and ratio (one call only)."
+            )
+            messages = [
+                {"role": "system", "content": sys_msg},
+                {"role": "user", "content": user_msg},
+            ]
+            response = self.client.chat.completions.create(
+                model=self.model,
+                messages=messages,
+                tools=TOOL_SCHEMAS,
+                tool_choice={"type": "function", "function": {"name": "run_experiment"}},
+            )
+            choice = response.choices[0]
+            if choice.message.tool_calls:
+                tc = choice.message.tool_calls[0]
+                name = tc.function.name
+                args = json.loads(tc.function.arguments or "{}")
+                result_str = self._run_tool(name, args, env)
+                total_reward += getattr(self, "_last_run_reward", 0.0)
+                last_params_used = dict(self._last_params)
+                steps += 2  # setup + run_assay
+                obs = env._obs()
+                info = env._info()
+                if verbose:
+                    print(f"  Trial {trial+1}: hypothesis {self._last_hypothesis} -> preset {self._last_params} -> {self._last_result}")
+                # Stage 4: Learn (update knowledge from this result)
+                self.knowledge["past_experiments"].append(
+                    (dict(self._last_params), self._last_result, 1.0 if self._last_result == "success" else (0.5 if self._last_result == "partial" else 0.0))
+                )
+                if len(self.knowledge["past_experiments"]) > 20:
+                    self.knowledge["past_experiments"] = self.knowledge["past_experiments"][-20:]
+                # Narrow knowledge range by heuristic
+                if self._last_result == "success":
+                    t = self._last_params.get("temp", 60)
+                    self.knowledge["temp_range"] = [t - 2, t + 2]
+                    c = self._last_params.get("cycles", 30)
+                    self.knowledge["cycle_range"] = [max(20, c - 2), min(40, c + 2)]
+                elif self._last_result == "partial":
+                    t = self._last_params.get("temp", 60)
+                    self.knowledge["temp_range"] = [
+                        min(self.knowledge["temp_range"][0], t - 1),
+                        max(self.knowledge["temp_range"][1], t + 1),
+                    ]
+                if episode_callback is not None:
+                    episode_callback.append({
+                        "trial": trial + 1,
+                        "research": self._last_research,
+                        "hypothesis": self._last_hypothesis,
+                        "params_used": self._last_params,
+                        "result": self._last_result,
+                    })
+            else:
+                break
+            if getattr(env, "_terminated", False) or getattr(env, "_truncated", False):
+                break
+        if not (getattr(env, "_terminated", False) or getattr(env, "_truncated", False)) and info.get("best_result") != "success":
+            obs, rew, _, _, info = env.step(ACTION_FINISH)
+            total_reward += rew
+            steps += 1
+        return {
+            "reward": total_reward,
+            "success": info.get("best_result") == "success",
+            "partial": info.get("best_result") == "partial",
+            "minutes": info.get("elapsed_minutes", 0.0),
+            "cost": INITIAL_BUDGET - info.get("remaining_budget", 500.0),
+            "steps": steps,
+        }

agents/rl_agent.py ADDED Viewed

	@@ -0,0 +1,307 @@

+"""
+REINFORCE policy-gradient agent for LabEnv.
+Rather than trying to learn the full 18-action sequential policy (which
+requires discovering the setup->run->finish sequence from scratch), this
+agent decomposes the problem:
+- **Learned part** — a small MLP policy that maps the current observation to
+  a distribution over the 12 protocol presets.  Trained with REINFORCE.
+- **Scripted part** — episode logic that executes setup(preset) -> run_assay,
+  checks inventory, orders reagents when needed, and finishes after a
+  configurable number of trials or when a success is achieved.
+This decomposition makes training tractable in minutes on a CPU while still
+demonstrating clear improvement over the random-preset naive baseline.
+The policy network is pure PyTorch — directly compatible with Hugging Face TRL,
+Lightning AI, or any custom training loop.
+"""
+from __future__ import annotations
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.distributions import Categorical
+import numpy as np
+from lab_env.env import (
+    ACTION_FINISH,
+    ACTION_ORDER_BUFFER,
+    ACTION_ORDER_POLYMERASE,
+    ACTION_ORDER_TIPS,
+    ACTION_RUN_ASSAY,
+    ACTION_SETUP_START,
+    NUM_PRESETS,
+    OBS_DIM,
+)
+from lab_env.spec import ExperimentSpec
+class PolicyNetwork(nn.Module):
+    """Two-hidden-layer MLP: obs -> preset logits (12 outputs)."""
+    def __init__(
+        self,
+        obs_dim: int = OBS_DIM,
+        hidden: int = 64,
+        n_presets: int = NUM_PRESETS,
+    ) -> None:
+        super().__init__()
+        self.fc1 = nn.Linear(obs_dim, hidden)
+        self.fc2 = nn.Linear(hidden, hidden)
+        self.fc3 = nn.Linear(hidden, n_presets)
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        x = F.relu(self.fc1(x))
+        x = F.relu(self.fc2(x))
+        return self.fc3(x)
+class ReinforceAgent:
+    """REINFORCE agent that learns which preset to pick each trial.
+    The episode loop (setup -> run -> order-if-needed -> maybe-finish) is
+    scripted.  Only the preset selection is learned.
+    Pass spec=... to train on a different protocol set (e.g. ELISA or a
+    custom spec). Otherwise uses default PCR (12 presets, 14-dim obs).
+    """
+    def __init__(
+        self,
+        lr: float = 3e-3,
+        gamma: float = 0.99,
+        entropy_coef: float = 0.02,
+        max_trials: int = 4,
+        device: str = "cpu",
+        spec: ExperimentSpec | None = None,
+    ) -> None:
+        self.gamma = gamma
+        self.entropy_coef = entropy_coef
+        self.max_trials = max_trials
+        self.device = torch.device(device)
+        self.spec = spec
+        obs_dim = (spec.obs_dim if spec else OBS_DIM)
+        n_presets = (spec.num_presets if spec else NUM_PRESETS)
+        self.policy = PolicyNetwork(obs_dim=obs_dim, n_presets=n_presets).to(self.device)
+        self.optimizer = torch.optim.Adam(self.policy.parameters(), lr=lr)
+        self._log_probs: list[torch.Tensor] = []
+        self._entropies: list[torch.Tensor] = []
+        self._rewards: list[float] = []
+        self._baseline: float = 0.0
+        self._baseline_count: int = 0
+    # ------------------------------------------------------------------
+    # Preset selection (the learned part)
+    # ------------------------------------------------------------------
+    def _select_preset(
+        self, obs: np.ndarray, *, deterministic: bool = False
+    ) -> int:
+        obs_t = torch.as_tensor(
+            obs, dtype=torch.float32, device=self.device
+        ).unsqueeze(0)
+        logits = self.policy(obs_t)
+        if deterministic:
+            logits = logits * 5.0
+        dist = Categorical(logits=logits)
+        action = dist.sample()
+        self._log_probs.append(dist.log_prob(action))
+        self._entropies.append(dist.entropy())
+        return int(action.item())
+    # ------------------------------------------------------------------
+    # Full episode runner (scripted loop + learned preset choice)
+    # ------------------------------------------------------------------
+    def run_episode(
+        self,
+        env: object,
+        seed: int,
+        *,
+        train: bool = True,
+    ) -> dict[str, object]:
+        """Run a complete episode, returning metrics dict.
+        *env* must be a :class:`LabEnv` (or anything with the same
+        ``reset`` / ``step`` interface).
+        """
+        obs, info = env.reset(seed=seed)  # type: ignore[union-attr]
+        total_reward = 0.0
+        steps = 0
+        trial_rewards: list[float] = []
+        finish_action = getattr(env.spec, "action_finish", lambda: ACTION_FINISH)()
+        for trial in range(self.max_trials):
+            if self._episode_done(info):
+                break
+            if self._inventory_low(obs, env):
+                obs, rew, info, steps = self._order_reagents(env, obs, info, steps)
+                total_reward += rew
+                if self._episode_done(info):
+                    break
+            preset = self._select_preset(obs, deterministic=not train)
+            setup_start = getattr(env.spec, "action_setup_start", lambda: ACTION_SETUP_START)()
+            obs, rew_setup, term, trunc, info = env.step(setup_start + preset)  # type: ignore[union-attr]
+            total_reward += rew_setup
+            steps += 1
+            if term or trunc:
+                trial_rewards.append(rew_setup)
+                self._rewards.append(rew_setup)
+                break
+            run_assay = getattr(env.spec, "action_run_assay", lambda: ACTION_RUN_ASSAY)()
+            obs, rew_run, done, truncated, info = env.step(run_assay)  # type: ignore[union-attr]
+            total_reward += rew_run
+            steps += 1
+            trial_rewards.append(rew_run)
+            self._rewards.append(rew_run)
+            if done or truncated:
+                break
+            if info.get("best_result") == "success":
+                obs, rew_finish, _, _, info = env.step(finish_action)  # type: ignore[union-attr]
+                total_reward += rew_finish
+                steps += 1
+                self._rewards[-1] += rew_finish
+                break
+        else:
+            if not self._episode_done(info):
+                obs, rew_finish, _, _, info = env.step(finish_action)  # type: ignore[union-attr]
+                total_reward += rew_finish
+                steps += 1
+                if self._rewards:
+                    self._rewards[-1] += rew_finish
+        loss = self.update() if train else 0.0
+        return {
+            "reward": total_reward,
+            "success": info.get("best_result") == "success",
+            "partial": info.get("best_result") == "partial",
+            "minutes": info.get("elapsed_minutes", 0.0),
+            "cost": 500.0 - info.get("remaining_budget", 500.0),
+            "steps": steps,
+            "loss": loss,
+        }
+    # ------------------------------------------------------------------
+    # Helpers for the scripted loop
+    # ------------------------------------------------------------------
+    @staticmethod
+    def _episode_done(info: dict) -> bool:
+        return False
+    @staticmethod
+    def _inventory_low(obs: np.ndarray, env: object | None = None) -> bool:
+        n_inv = 4
+        if env is not None and getattr(env, "spec", None) is not None:
+            n_inv = len(env.spec.inventory_items)
+        inv_slice = obs[3 : 3 + n_inv]
+        return float(min(inv_slice)) < 0.08 if len(inv_slice) else False
+    @staticmethod
+    def _order_reagents(
+        env: object, obs: np.ndarray, info: dict, steps: int
+    ) -> tuple[np.ndarray, float, dict, int]:
+        total_rew = 0.0
+        spec = getattr(env, "spec", None)
+        if spec is not None:
+            order_actions = range(spec.action_order_start(), spec.action_order_end())
+        else:
+            order_actions = (ACTION_ORDER_TIPS, ACTION_ORDER_BUFFER, ACTION_ORDER_POLYMERASE)
+        for action in order_actions:
+            obs, rew, done, truncated, info = env.step(action)  # type: ignore[union-attr]
+            total_rew += rew
+            steps += 1
+            if done or truncated:
+                break
+        return obs, total_rew, info, steps
+    # ------------------------------------------------------------------
+    # Learning
+    # ------------------------------------------------------------------
+    def reset(self) -> None:
+        self._log_probs.clear()
+        self._entropies.clear()
+        self._rewards.clear()
+    def update(self) -> float:
+        """REINFORCE update over the collected episode. Returns loss."""
+        if not self._rewards or not self._log_probs:
+            return 0.0
+        n = min(len(self._rewards), len(self._log_probs))
+        rewards = self._rewards[:n]
+        log_probs = self._log_probs[:n]
+        entropies = self._entropies[:n]
+        returns = self._compute_returns(rewards)
+        self._update_baseline(returns)
+        returns_t = torch.as_tensor(returns, dtype=torch.float32, device=self.device)
+        advantages = returns_t - self._baseline
+        policy_loss = torch.zeros(1, device=self.device)
+        entropy_bonus = torch.zeros(1, device=self.device)
+        for lp, adv, ent in zip(log_probs, advantages, entropies):
+            policy_loss -= lp * adv.detach()
+            entropy_bonus += ent
+        loss = policy_loss - self.entropy_coef * entropy_bonus
+        self.optimizer.zero_grad()
+        loss.backward()
+        nn.utils.clip_grad_norm_(self.policy.parameters(), max_norm=1.0)
+        self.optimizer.step()
+        self.reset()
+        return float(loss.item())
+    def _compute_returns(self, rewards: list[float]) -> list[float]:
+        returns: list[float] = []
+        g = 0.0
+        for r in reversed(rewards):
+            g = r + self.gamma * g
+            returns.insert(0, g)
+        return returns
+    def _update_baseline(self, returns: list[float]) -> None:
+        episode_return = returns[0] if returns else 0.0
+        self._baseline_count += 1
+        self._baseline += (episode_return - self._baseline) / self._baseline_count
+    def save(self, path: str) -> None:
+        torch.save(
+            {
+                "policy_state_dict": self.policy.state_dict(),
+                "optimizer_state_dict": self.optimizer.state_dict(),
+                "baseline": self._baseline,
+                "baseline_count": self._baseline_count,
+            },
+            path,
+        )
+    def load(self, path: str) -> None:
+        checkpoint = torch.load(path, map_location=self.device, weights_only=True)
+        self.policy.load_state_dict(checkpoint["policy_state_dict"])
+        self.optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
+        self._baseline = checkpoint["baseline"]
+        self._baseline_count = checkpoint["baseline_count"]

demo/streamlit_app.py ADDED Viewed

	@@ -0,0 +1,185 @@

+"""
+Streamlit demo: Self-Improving Lab Scientist — Research flow and 3-agent comparison.
+Run with: streamlit run demo/streamlit_app.py
+"""
+from __future__ import annotations
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+import streamlit as st
+from lab_env.env import LabEnv, INITIAL_BUDGET
+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+from agents.research_llm_agent import ResearchLLMAgent
+st.set_page_config(page_title="SimLab Research Agent", layout="wide")
+st.title("Self-Improving Lab Scientist")
+st.markdown("**Research Scientist Agent** — Research → Hypothesize → Experiment → Learn")
+# Session state
+if "episode_history" not in st.session_state:
+    st.session_state.episode_history = []
+if "comparison_table" not in st.session_state:
+    st.session_state.comparison_table = None
+if "current_run_steps" not in st.session_state:
+    st.session_state.current_run_steps = []
+if "last_knowledge" not in st.session_state:
+    st.session_state.last_knowledge = None
+# Sidebar controls
+with st.sidebar:
+    st.header("Controls")
+    num_episodes = st.slider("Episodes to run", 1, 10, 3)
+    seed = st.number_input("Seed", value=42, min_value=0, step=1)
+    max_trials = st.slider("Max trials per episode", 2, 8, 5)
+    run_research = st.button("Run research agent episodes")
+    st.divider()
+    st.header("Benchmark")
+    compare_episodes = st.slider("Eval episodes for comparison", 10, 100, 30)
+    run_compare = st.button("Run 3-agent comparison")
+# Main: Research flow
+st.header("Research flow")
+if run_research:
+    try:
+        env = LabEnv()
+        agent = ResearchLLMAgent(max_trials=max_trials)
+        progress = st.progress(0, text="Running episodes...")
+        for ep in range(1, num_episodes + 1):
+            progress.progress(ep / num_episodes, text=f"Episode {ep}/{num_episodes}...")
+            callback: list[dict] = []
+            result = agent.run_episode(env, seed=seed + ep, episode_callback=callback)
+            st.session_state.last_knowledge = dict(agent.knowledge)
+            st.session_state.episode_history.append({
+                "episode": ep,
+                "success": result["success"],
+                "partial": result["partial"],
+                "reward": result["reward"],
+                "cost": result["cost"],
+                "steps": result["steps"],
+                "callback": callback,
+            })
+        env.close()
+        progress.empty()
+    except Exception as e:
+        st.error(f"Research agent failed: {e}. Set OPENAI_API_KEY for LLM agent.")
+else:
+    if not st.session_state.episode_history:
+        st.info("Click **Run research agent episodes** in the sidebar to start.")
+# Show last run / history
+if st.session_state.episode_history:
+    st.subheader("Learning progress")
+    cols = st.columns(min(10, len(st.session_state.episode_history)))
+    for i, rec in enumerate(st.session_state.episode_history[-10:]):
+        with cols[i % len(cols)]:
+            label = "SUCCESS" if rec["success"] else "partial" if rec["partial"] else "fail"
+            pct = "94%" if rec["success"] else "73%" if rec["partial"] else "12%"
+            st.metric(f"Ep{rec['episode']}", label, pct)
+    st.divider()
+    # Show latest episode detail (stage cards)
+    latest = st.session_state.episode_history[-1]
+    st.subheader(f"Episode {latest['episode']} — Research → Experiment → Learn")
+    if latest.get("callback"):
+        for step in latest["callback"]:
+            with st.expander(f"Trial {step['trial']}: {step['result']}", expanded=(step['trial'] == latest['callback'][-1]['trial'])):
+                st.markdown("**Research**")
+                st.caption(step.get("research", "")[:400] + "..." if len(step.get("research", "")) > 400 else step.get("research", ""))
+                st.markdown("**Hypothesis**")
+                st.code(step.get("hypothesis", {}))
+                st.markdown("**Experiment**")
+                st.write(f"Ran preset: {step.get('params_used', {})} → **{step.get('result', '')}**")
+    st.markdown("**Knowledge**")
+    if st.session_state.last_knowledge:
+        k = st.session_state.last_knowledge
+        st.write(f"temp_range = {k.get('temp_range', [])} °C, cycle_range = {k.get('cycle_range', [])}")
+        past = k.get("past_experiments", [])[-5:]
+        if past:
+            st.caption("Last experiments: " + ", ".join(f"{p[0]}→{p[1]}" for p in past))
+    else:
+        st.caption("Run research episodes to see updated knowledge.")
+# 3-agent comparison
+st.header("3-agent comparison")
+if run_compare:
+    with st.status("Running Naive, RL, and Research LLM agents...", expanded=True) as status:
+        try:
+            env = LabEnv()
+            eval_seed_base = 100_000 + seed
+            st.write("Naive agent...")
+            naive_agent = NaiveAgent(num_trials=3, seed=seed)
+            naive_results = []
+            for i in range(compare_episodes):
+                obs, info = env.reset(seed=eval_seed_base + i)
+                naive_agent.reset()
+                steps = 0
+                while True:
+                    action = naive_agent.select_action(obs)
+                    obs, reward, term, trunc, info = env.step(action)
+                    steps += 1
+                    if term or trunc:
+                        break
+                naive_results.append({
+                    "success": info["best_result"] == "success",
+                    "partial": info["best_result"] == "partial",
+                    "cost": INITIAL_BUDGET - info["remaining_budget"],
+                    "steps": steps,
+                })
+            st.write("Training and evaluating RL agent...")
+            rl_agent = ReinforceAgent(max_trials=max_trials)
+            for ep in range(500):
+                rl_agent.run_episode(env, seed=seed + ep, train=True)
+            rl_results = [rl_agent.run_episode(env, seed=eval_seed_base + i, train=False) for i in range(compare_episodes)]
+            st.write("Research LLM agent...")
+            llm_agent = ResearchLLMAgent(max_trials=max_trials)
+            llm_results = [llm_agent.run_episode(env, seed=eval_seed_base + i) for i in range(compare_episodes)]
+            env.close()
+            def agg(results: list[dict]) -> dict:
+                n = len(results)
+                succ = sum(r["success"] for r in results) / n
+                steps_succ = [r["steps"] for r in results if r["success"]]
+                exp_to_succ = sum(steps_succ) / len(steps_succ) if steps_succ else 0
+                cost = sum(r["cost"] for r in results) / n
+                return {"success_rate": succ, "experiments_to_success": exp_to_succ, "cost": cost}
+            st.session_state.comparison_table = {
+                "Naive (random)": agg(naive_results),
+                "RL (MLP)": agg(rl_results),
+                "LLM Researcher": agg(llm_results),
+            }
+            status.update(label="Done!", state="complete")
+        except Exception as e:
+            status.update(label="Error", state="error")
+            st.exception(e)
+if st.session_state.comparison_table:
+    st.dataframe(
+        [
+            {
+                "Agent": name,
+                "Success rate": f"{data['success_rate']:.0%}",
+                "Experiments to success": f"{data['experiments_to_success']:.1f}",
+                "Cost/episode": f"${data['cost']:.1f}",
+            }
+            for name, data in st.session_state.comparison_table.items()
+        ],
+        use_container_width=True,
+        hide_index=True,
+    )
+else:
+    st.caption("Click **Run 3-agent comparison** to benchmark Naive, RL, and LLM agents.")

knowledge/pcr_protocols.json ADDED Viewed

	@@ -0,0 +1,32 @@

+[
+  {
+    "title": "Optimization of Annealing Temperature for AT-Rich Primer Pairs",
+    "abstract": "We systematically evaluated annealing temperatures between 50 and 62°C for primers with high AT content. Best amplification and specificity were achieved in the 55–58°C range, with 57°C yielding optimal balance for most templates.",
+    "keywords": ["annealing", "AT-rich", "temperature", "primers", "PCR"],
+    "recommendations": "AT-rich primers: use annealing 55–58°C. Avoid >60°C to prevent poor yield."
+  },
+  {
+    "title": "Cycle Number and Fidelity in Standard PCR",
+    "abstract": "Cycle counts from 25 to 40 were compared for amplicon yield and error rate. For most targets, 30–35 cycles gave high yield without excessive nonspecific product. Conservative protocols favor 28–32 cycles.",
+    "keywords": ["cycles", "fidelity", "yield", "PCR", "amplification"],
+    "recommendations": "High-fidelity applications: 30–35 cycles. Use 25–28 for long amplicons."
+  },
+  {
+    "title": "Reagent Ratios and Primer Dimer Formation",
+    "abstract": "Conservative versus aggressive primer-to-template ratios were tested across 200 reactions. Conservative ratio reduced primer dimers and improved specificity in 78% of cases; aggressive ratio increased yield when template was limiting.",
+    "keywords": ["ratio", "conservative", "aggressive", "primer dimer", "specificity"],
+    "recommendations": "Conservative ratio for long amplicons and low template; aggressive when maximizing yield."
+  },
+  {
+    "title": "Annealing Temperature Gradients for Multiplex PCR",
+    "abstract": "Gradient optimization showed that 65°C annealing worked well for GC-rich primers, while 55–58°C suited AT-rich primers. Middle range 60–62°C was a compromise with lower peak efficiency.",
+    "keywords": ["gradient", "GC-rich", "AT-rich", "multiplex", "annealing"],
+    "recommendations": "GC-rich primers: try 63–67°C. AT-rich: 55–59°C. Avoid 60–62°C as default."
+  },
+  {
+    "title": "Extension Temperature and Cycle Count Interaction",
+    "abstract": "Extension at 72°C with 25 versus 35 cycles was compared. Fewer cycles (25–30) reduced background; 32–35 cycles improved sensitivity for low-copy targets. Combined with lower annealing (56–58°C), 32 cycles was optimal in our assay.",
+    "keywords": ["extension", "cycles", "sensitivity", "background", "72C"],
+    "recommendations": "For difficult templates: annealing 56–58°C, 32 cycles. Reduce to 28–30 if nonspecific bands appear."
+  }
+]

lab_env/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from lab_env.env import LabEnv
+from lab_env.spec import ExperimentSpec, pcr_experiment_spec, elisa_experiment_spec, get_spec_for_workflow
+__all__ = ["LabEnv", "ExperimentSpec", "pcr_experiment_spec", "elisa_experiment_spec", "get_spec_for_workflow"]

lab_env/env.py ADDED Viewed

	@@ -0,0 +1,369 @@

+"""
+LabEnv — A Gymnasium-style simulated wet-lab environment for RL training.
+Simulates a single experiment workflow (e.g. PCR, ELISA) where the agent must
+discover a hidden optimal protocol under time and budget constraints. The
+experiment type is defined by an ExperimentSpec so any protocol-discovery
+experiment can be modelled.
+Designed for compatibility with OpenEnv's sandboxed execution model:
+the reset/step/close interface can be served over HTTP via the adapter in
+``openenv_adapter.py`` and uploaded to the OpenEnv hub on Hugging Face as a
+standardized agentic environment for lab-automation research.
+"""
+from __future__ import annotations
+from typing import Any
+import gymnasium as gym
+import numpy as np
+from gymnasium import spaces
+from lab_env.spec import ExperimentSpec, pcr_experiment_spec
+# ---------------------------------------------------------------------------
+# Backward compatibility: expose constants for default (PCR) spec
+# ---------------------------------------------------------------------------
+_DEFAULT_SPEC = pcr_experiment_spec()
+NUM_PRESETS: int = _DEFAULT_SPEC.num_presets
+ACTION_SETUP_START: int = _DEFAULT_SPEC.action_setup_start()
+ACTION_SETUP_END: int = _DEFAULT_SPEC.action_setup_end()
+ACTION_RUN_ASSAY: int = _DEFAULT_SPEC.action_run_assay()
+ACTION_ORDER_TIPS: int = _DEFAULT_SPEC.action_order_start() + 0
+ACTION_ORDER_BUFFER: int = _DEFAULT_SPEC.action_order_start() + 1
+ACTION_ORDER_POLYMERASE: int = _DEFAULT_SPEC.action_order_start() + 2
+ACTION_WAIT: int = _DEFAULT_SPEC.action_wait()
+ACTION_FINISH: int = _DEFAULT_SPEC.action_finish()
+NUM_ACTIONS: int = _DEFAULT_SPEC.num_actions
+OBS_DIM: int = _DEFAULT_SPEC.obs_dim
+# Legacy constants used by scripts
+INITIAL_BUDGET: float = _DEFAULT_SPEC.initial_budget
+RESULT_LABELS = _DEFAULT_SPEC.result_labels
+RESULT_TO_IDX = {label: i for i, label in enumerate(RESULT_LABELS)}
+class LabEnv(gym.Env):
+    """Simulated wet-lab environment for any experiment type.
+    The experiment (protocol presets, inventory, rewards, outcome model) is
+    defined by an ExperimentSpec. Use LabEnv() for default PCR; use
+    LabEnv(spec=my_spec) for custom experiments.
+    Observation (Box, shape from spec):
+        [0]     step_index (normalised)
+        [1]     elapsed_minutes (normalised)
+        [2]     remaining_budget (normalised)
+        [3..]   inventory (one slot per inventory_items, normalised)
+        [...]   last_result one-hot (len(result_labels))
+        [...]   has_setup, current_preset_idx (norm), best_result_score
+    Actions (Discrete, from spec):
+        0 .. num_presets-1   setup_reaction(preset_index)
+        num_presets          run_assay
+        num_presets+1 ..     order_reagents(item) for each orderable item
+        ...                  wait, finish
+    """
+    metadata = {"render_modes": []}
+    def __init__(
+        self,
+        spec: ExperimentSpec | None = None,
+        render_mode: str | None = None,
+    ) -> None:
+        super().__init__()
+        self.spec = spec if spec is not None else pcr_experiment_spec()
+        self.observation_space = spaces.Box(
+            low=0.0, high=1.0, shape=(self.spec.obs_dim,), dtype=np.float32
+        )
+        self.action_space = spaces.Discrete(self.spec.num_actions)
+        self._rng: np.random.Generator | None = None
+        self._current_protocol_override: dict[str, Any] | None = None
+        self._reset_state()
+    # ------------------------------------------------------------------
+    # Gymnasium API
+    # ------------------------------------------------------------------
+    def reset(
+        self,
+        *,
+        seed: int | None = None,
+        options: dict[str, Any] | None = None,
+    ) -> tuple[np.ndarray, dict[str, Any]]:
+        super().reset(seed=seed)
+        self._rng = np.random.default_rng(seed)
+        self._reset_state()
+        self._sample_hidden_optimum()
+        return self._obs(), self._info()
+    def step(
+        self, action: int
+    ) -> tuple[np.ndarray, float, bool, bool, dict[str, Any]]:
+        if self._terminated or self._truncated:
+            raise RuntimeError("Episode already done — call reset().")
+        reward = 0.0
+        self._step_index += 1
+        if self.spec.action_setup_start() <= action < self.spec.action_setup_end():
+            reward += self._do_setup(action)
+        elif action == self.spec.action_run_assay():
+            reward += self._do_run_assay()
+        elif self.spec.action_order_start() <= action < self.spec.action_order_end():
+            reward += self._do_order(action)
+        elif action == self.spec.action_wait():
+            reward += self._do_wait()
+        elif action == self.spec.action_finish():
+            reward += self._do_finish()
+        else:
+            raise ValueError(f"Invalid action {action}")
+        self._check_forced_termination()
+        if self._terminated or self._truncated:
+            reward += self._terminal_reward()
+        return self._obs(), reward, self._terminated, self._truncated, self._info()
+    def run_assay_with_protocol(
+        self, protocol: dict[str, Any]
+    ) -> tuple[np.ndarray, float, bool, bool, dict[str, Any]]:
+        """Run one assay with an arbitrary protocol dict (no preset).
+        The spec must have evaluate_custom_protocol set (e.g. PCR/ELISA). Consumes
+        inventory and time like a normal assay; outcome is from the spec's outcome
+        model. Use this for agent-generated protocols.
+        """
+        if self._terminated or self._truncated:
+            raise RuntimeError("Episode already done — call reset().")
+        if self.spec.evaluate_custom_protocol is None:
+            raise ValueError(
+                "This spec does not support custom protocols; evaluate_custom_protocol is not set."
+            )
+        self._step_index += 1
+        self._current_protocol_override = dict(protocol)
+        self._has_setup = True
+        reward = self._do_run_assay()
+        self._check_forced_termination()
+        if self._terminated or self._truncated:
+            reward += self._terminal_reward()
+        return self._obs(), reward, self._terminated, self._truncated, self._info()
+    def close(self) -> None:
+        pass
+    # ------------------------------------------------------------------
+    # Action implementations
+    # ------------------------------------------------------------------
+    def _do_setup(self, action: int) -> float:
+        preset_idx = action - self.spec.action_setup_start()
+        self._current_preset_idx = preset_idx
+        self._has_setup = True
+        self._elapsed_minutes += 1.0
+        return 0.0
+    def _fail_result_label(self) -> str:
+        if "fail" in self.spec.result_labels:
+            return "fail"
+        return self.spec.result_labels[-1] if self.spec.result_labels else "fail"
+    def _do_run_assay(self) -> float:
+        if not self._has_setup:
+            self._last_result = self._fail_result_label()
+            self._elapsed_minutes += self.spec.assay_time_minutes
+            return self.spec.assay_penalty
+        inv = self._inventory
+        for item in self.spec.inventory_items:
+            if inv.get(item, 0) < 1:
+                self._last_result = self._fail_result_label()
+                return self.spec.assay_penalty
+        for item in self.spec.inventory_items:
+            inv[item] = inv.get(item, 0) - 1
+            inv[item] = max(0, inv[item])
+        self._elapsed_minutes += self.spec.assay_time_minutes
+        result = self._sample_assay_result()
+        self._last_result = result
+        self._update_best(result)
+        imm = self.spec.immediate_result_reward.get(result, 0.0)
+        return self.spec.assay_penalty + imm
+    def _do_order(self, action: int) -> float:
+        idx = action - self.spec.action_order_start()
+        if idx < 0 or idx >= len(self.spec.orderable_items):
+            return 0.0
+        item = self.spec.orderable_items[idx]
+        if item not in self.spec.order_costs:
+            return 0.0
+        qty, cost = self.spec.order_costs[item]
+        if self._remaining_budget < cost:
+            return 0.0
+        self._remaining_budget -= cost
+        self._inventory[item] = min(
+            self._inventory.get(item, 0) + qty, self.spec.max_inventory
+        )
+        self._elapsed_minutes += self.spec.order_time_minutes
+        return 0.0
+    def _do_wait(self) -> float:
+        self._elapsed_minutes += self.spec.wait_minutes
+        return 0.0
+    def _do_finish(self) -> float:
+        self._terminated = True
+        return 0.0
+    # ------------------------------------------------------------------
+    # Termination
+    # ------------------------------------------------------------------
+    def _check_forced_termination(self) -> None:
+        if self._terminated:
+            return
+        if self._elapsed_minutes >= self.spec.max_minutes:
+            self._truncated = True
+            return
+        if self._remaining_budget <= 0:
+            self._truncated = True
+            return
+        if self._step_index >= self.spec.max_steps:
+            self._truncated = True
+            return
+        inv = self._inventory
+        can_run = all(inv.get(item, 0) >= 1 for item in self.spec.inventory_items)
+        can_order = any(
+            self._remaining_budget >= self.spec.order_costs.get(k, (0, float("inf")))[1]
+            for k in self.spec.orderable_items
+        )
+        if not can_run and not can_order:
+            self._truncated = True
+    def _terminal_reward(self) -> float:
+        bonus = self.spec.terminal_bonus.get(self._best_result, 0.0)
+        time_penalty = self.spec.time_penalty_per_min * self._elapsed_minutes
+        no_success = (
+            self.spec.no_success_penalty
+            if self._best_result in ("none", "fail") or self._best_result not in self.spec.terminal_bonus
+            else 0.0
+        )
+        return bonus + time_penalty + no_success
+    # ------------------------------------------------------------------
+    # Outcome model (delegated to spec)
+    # ------------------------------------------------------------------
+    def _sample_hidden_optimum(self) -> None:
+        if self._rng is None:
+            return
+        if self.spec.sample_hidden_optimum is not None:
+            self._hidden_optimum = self.spec.sample_hidden_optimum(self._rng)
+        else:
+            self._hidden_optimum = {}
+    def _sample_assay_result(self) -> str:
+        if self._rng is None:
+            return self.spec.result_labels[1] if len(self.spec.result_labels) > 1 else "fail"
+        if self._current_protocol_override is not None and self.spec.evaluate_custom_protocol is not None:
+            result = self.spec.evaluate_custom_protocol(
+                self._hidden_optimum,
+                self._current_protocol_override,
+                self._rng,
+            )
+            self._current_protocol_override = None
+            return result
+        if self.spec.sample_assay_result is not None:
+            return self.spec.sample_assay_result(
+                self._hidden_optimum,
+                self._current_preset_idx,
+                self.spec.presets,
+                self._rng,
+            )
+        # Default: random non-none result
+        choices = [r for r in self.spec.result_labels if r != "none"]
+        if not choices:
+            return "fail"
+        return str(self._rng.choice(choices))
+    def _update_best(self, result: str) -> None:
+        rank = {"fail": 0, "none": 0, "partial": 1, "success": 2}
+        if rank.get(result, 0) > rank.get(self._best_result, 0):
+            self._best_result = result
+    # ------------------------------------------------------------------
+    # Observation helpers
+    # ------------------------------------------------------------------
+    def _result_to_onehot(self, result: str) -> list[float]:
+        out = [0.0] * len(self.spec.result_labels)
+        for i, label in enumerate(self.spec.result_labels):
+            if label == result:
+                out[i] = 1.0
+                break
+        return out
+    def _obs(self) -> np.ndarray:
+        inv = self._inventory
+        result_onehot = self._result_to_onehot(self._last_result)
+        best_score = {"none": 0.0, "fail": 0.0, "partial": 0.5, "success": 1.0}.get(
+            self._best_result, 0.0
+        )
+        inv_slice = [
+            inv.get(item, 0) / self.spec.max_inventory
+            for item in self.spec.inventory_items
+        ]
+        obs = np.array(
+            [
+                self._step_index / self.spec.max_steps,
+                self._elapsed_minutes / self.spec.max_minutes,
+                self._remaining_budget / self.spec.initial_budget,
+                *inv_slice,
+                *result_onehot,
+                float(self._has_setup),
+                (self._current_preset_idx / self.spec.num_presets) if self._has_setup else 0.0,
+                best_score,
+            ],
+            dtype=np.float32,
+        )
+        return obs
+    def _info(self) -> dict[str, Any]:
+        return {
+            "step_index": self._step_index,
+            "elapsed_minutes": self._elapsed_minutes,
+            "remaining_budget": self._remaining_budget,
+            "inventory": dict(self._inventory),
+            "last_result": self._last_result,
+            "best_result": self._best_result,
+        }
+    # ------------------------------------------------------------------
+    # Internal state management
+    # ------------------------------------------------------------------
+    def _reset_state(self) -> None:
+        self._step_index = 0
+        self._elapsed_minutes = 0.0
+        self._remaining_budget = self.spec.initial_budget
+        self._inventory = dict(self.spec.initial_inventory)
+        self._last_result = self.spec.result_labels[0] if self.spec.result_labels else "none"
+        self._best_result = self._last_result
+        self._has_setup = False
+        self._current_preset_idx = 0
+        self._terminated = False
+        self._truncated = False
+        self._hidden_optimum: dict[str, Any] = {}

lab_env/openenv_adapter.py ADDED Viewed

	@@ -0,0 +1,231 @@

+"""
+OpenEnv adapter for LabEnv.
+Wraps :class:`LabEnv` in the OpenEnv Environment interface so it can be
+served over HTTP/WebSocket via openenv-core and deployed to the OpenEnv hub
+on Hugging Face.
+Usage:
+    # Run the OpenEnv HTTP server (POST /reset, POST /step, GET /state, WebSocket /ws)
+    uvicorn lab_env.openenv_adapter:app --host 0.0.0.0 --port 8000
+    # Or from Python
+    from lab_env.openenv_adapter import app
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+"""
+from __future__ import annotations
+from typing import Any, Optional
+from uuid import uuid4
+from openenv.core.env_server import create_app
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import (
+    Action,
+    EnvironmentMetadata,
+    Observation,
+    State,
+)
+from pydantic import Field
+from lab_env.env import LabEnv
+from lab_env.spec import ExperimentSpec, pcr_experiment_spec
+# ---------------------------------------------------------------------------
+# OpenEnv types (Pydantic models for action, observation, state)
+# ---------------------------------------------------------------------------
+class LabAction(Action):
+    """Discrete action index for LabEnv (range depends on experiment spec)."""
+    action: int = Field(..., ge=0, description="Action index (0 to num_actions-1)")
+class LabObservation(Observation):
+    """Observation returned after reset / step. Vector and info live in metadata."""
+    metadata: dict[str, Any] = Field(
+        default_factory=dict,
+        description="Contains obs_vector, terminated, truncated, info",
+    )
+class LabState(State):
+    """Full environment state snapshot for LabEnv."""
+    model_config = {"extra": "allow"}
+    episode_id: Optional[str] = Field(default=None, description="Episode identifier")
+    step_count: int = Field(default=0, ge=0, description="Steps taken")
+    elapsed_minutes: float = Field(default=0.0, description="Elapsed time (min)")
+    remaining_budget: float = Field(default=0.0, description="Remaining budget")
+    inventory: dict[str, int] = Field(default_factory=dict, description="Inventory counts")
+    last_result: str = Field(default="none", description="Last assay result")
+    best_result: str = Field(default="none", description="Best result so far")
+# ---------------------------------------------------------------------------
+# OpenEnv Environment implementation
+# ---------------------------------------------------------------------------
+class LabEnvironment(Environment[LabAction, LabObservation, LabState]):
+    """OpenEnv Environment that wraps a single LabEnv instance.
+    Each session gets its own LabEnv. Compatible with OpenEnv HTTP server
+    and WebSocket endpoints.
+    """
+    SUPPORTS_CONCURRENT_SESSIONS = True
+    def __init__(self, spec: Optional[ExperimentSpec] = None) -> None:
+        super().__init__()
+        self._env = LabEnv(spec=spec)
+        self._episode_id: Optional[str] = None
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        **kwargs: Any,
+    ) -> LabObservation:
+        obs, info = self._env.reset(seed=seed)
+        self._episode_id = episode_id or str(uuid4())
+        return LabObservation(
+            done=False,
+            reward=0.0,
+            metadata={
+                "obs_vector": obs.tolist(),
+                "terminated": False,
+                "truncated": False,
+                "info": info,
+            },
+        )
+    def step(
+        self,
+        action: LabAction,
+        timeout_s: Optional[float] = None,
+        **kwargs: Any,
+    ) -> LabObservation:
+        obs, reward, terminated, truncated, info = self._env.step(action.action)
+        return LabObservation(
+            done=terminated or truncated,
+            reward=float(reward),
+            metadata={
+                "obs_vector": obs.tolist(),
+                "terminated": terminated,
+                "truncated": truncated,
+                "info": info,
+            },
+        )
+    @property
+    def state(self) -> LabState:
+        e = self._env
+        return LabState(
+            episode_id=self._episode_id,
+            step_count=e._step_index,
+            elapsed_minutes=e._elapsed_minutes,
+            remaining_budget=e._remaining_budget,
+            inventory=dict(e._inventory),
+            last_result=e._last_result,
+            best_result=e._best_result,
+        )
+    def get_metadata(self) -> EnvironmentMetadata:
+        exp_name = getattr(self._env.spec, "name", "pcr")
+        return EnvironmentMetadata(
+            name="SimLab",
+            description=f"Gymnasium-style simulated wet-lab for protocol discovery ({exp_name})",
+            version="0.1.0",
+            documentation_url="https://github.com/openrl/simlab",
+        )
+    def close(self) -> None:
+        self._env.close()
+# ---------------------------------------------------------------------------
+# FastAPI app for OpenEnv HTTP/WebSocket server
+# ---------------------------------------------------------------------------
+app = create_app(
+    LabEnvironment,
+    LabAction,
+    LabObservation,
+    env_name="simlab",
+    max_concurrent_envs=4,
+)
+# ---------------------------------------------------------------------------
+# Legacy session-based adapter (for direct use without HTTP server)
+# ---------------------------------------------------------------------------
+class LabEnvOpenEnvAdapter:
+    """Manages multiple concurrent LabEnv sessions keyed by env_id.
+    Use this when you need to drive LabEnv by env_id (e.g. from another
+    service) without going through the OpenEnv HTTP server. For standard
+    OpenEnv deployment, use LabEnvironment + the `app` above instead.
+    """
+    def __init__(self) -> None:
+        self._envs: dict[str, LabEnv] = {}
+    def env_reset(
+        self,
+        env_id: str,
+        seed: Optional[int] = None,
+    ) -> dict[str, Any]:
+        """Create or reset an environment instance; return initial observation."""
+        env = self._envs.get(env_id)
+        if env is None:
+            env = LabEnv()
+            self._envs[env_id] = env
+        obs, info = env.reset(seed=seed)
+        return {
+            "observation": obs.tolist(),
+            "info": info,
+        }
+    def env_step(
+        self,
+        env_id: str,
+        action: int,
+    ) -> dict[str, Any]:
+        """Advance the environment by one action; return transition."""
+        env = self._envs[env_id]
+        obs, reward, terminated, truncated, info = env.step(action)
+        return {
+            "observation": obs.tolist(),
+            "reward": float(reward),
+            "terminated": terminated,
+            "truncated": truncated,
+            "info": info,
+        }
+    def env_state(self, env_id: str) -> dict[str, Any]:
+        """Return a JSON-serializable snapshot of the current state."""
+        env = self._envs[env_id]
+        return {
+            "episode_id": env_id,
+            "step_index": env._step_index,
+            "elapsed_minutes": env._elapsed_minutes,
+            "remaining_budget": env._remaining_budget,
+            "inventory": dict(env._inventory),
+            "last_result": env._last_result,
+            "best_result": env._best_result,
+        }
+    def env_close(self, env_id: str) -> None:
+        """Tear down an environment instance."""
+        env = self._envs.pop(env_id, None)
+        if env is not None:
+            env.close()

lab_env/spec.py ADDED Viewed

	@@ -0,0 +1,367 @@

+"""
+Experiment specification for the generic lab environment.
+Defines protocol presets, inventory, rewards, and outcome model so LabEnv can
+simulate any experiment type (PCR, ELISA, etc.) from a single spec.
+"""
+from __future__ import annotations
+from dataclasses import dataclass, field
+from typing import Any, Callable
+import numpy as np
+@dataclass
+class ExperimentSpec:
+    """Specification for a single experiment type (PCR, ELISA, etc.).
+    The environment uses this to build action/observation spaces and dynamics.
+    Outcome logic is pluggable via sample_hidden_optimum and sample_assay_result.
+    """
+    name: str
+    """Short name for this experiment (e.g. 'pcr', 'elisa')."""
+    presets: list[dict[str, Any]]
+    """List of protocol presets the agent can choose (e.g. temp/cycles/ratio for PCR)."""
+    inventory_items: list[str]
+    """Ordered list of inventory item names (tips, buffer, polymerase, samples, ...)."""
+    orderable_items: list[str]
+    """Subset of inventory_items that can be reordered (each gets an order action)."""
+    initial_inventory: dict[str, int]
+    """Starting count per inventory item."""
+    order_costs: dict[str, tuple[int, float]]
+    """For each orderable item: (quantity_per_order, cost_per_order)."""
+    result_labels: list[str]
+    """Possible assay outcomes, e.g. ['none', 'success', 'partial', 'fail']."""
+    # Limits
+    max_steps: int = 30
+    max_minutes: float = 240.0
+    initial_budget: float = 500.0
+    max_inventory: int = 20
+    # Time costs
+    assay_time_minutes: float = 20.0
+    order_time_minutes: float = 5.0
+    wait_minutes: float = 15.0
+    # Rewards
+    assay_penalty: float = -3.0
+    time_penalty_per_min: float = -0.25
+    no_success_penalty: float = -20.0
+    immediate_result_reward: dict[str, float] = field(default_factory=dict)
+    terminal_bonus: dict[str, float] = field(default_factory=dict)
+    # Outcome model: callables that take (rng) or (hidden_state, preset_idx, presets, rng)
+    sample_hidden_optimum: Callable[[np.random.Generator], dict[str, Any]] | None = None
+    sample_assay_result: (
+        Callable[
+            [dict[str, Any], int, list[dict[str, Any]], np.random.Generator],
+            str,
+        ]
+        | None
+    ) = None
+    # Custom protocol support: evaluate arbitrary protocol dict (for agent-generated protocols)
+    evaluate_custom_protocol: (
+        Callable[
+            [dict[str, Any], dict[str, Any], np.random.Generator],
+            str,
+        ]
+        | None
+    ) = None
+    """If set, (hidden_optimum, protocol_dict, rng) -> result label. Enables run_assay_with_protocol()."""
+    protocol_param_schema: dict[str, Any] = field(default_factory=dict)
+    """Schema describing protocol params for codegen/LLM: e.g. {"temp": {"type": "number", "description": "°C"}, ...}."""
+    @property
+    def num_presets(self) -> int:
+        return len(self.presets)
+    @property
+    def num_actions(self) -> int:
+        return (
+            self.num_presets
+            + 1  # run_assay
+            + len(self.orderable_items)
+            + 2   # wait, finish
+        )
+    @property
+    def obs_dim(self) -> int:
+        return (
+            3  # step_index, elapsed_minutes, remaining_budget
+            + len(self.inventory_items)
+            + len(self.result_labels)
+            + 3   # has_setup, current_preset_idx (norm), best_result_score
+        )
+    def action_setup_start(self) -> int:
+        return 0
+    def action_setup_end(self) -> int:
+        return self.num_presets
+    def action_run_assay(self) -> int:
+        return self.num_presets
+    def action_order_start(self) -> int:
+        return self.num_presets + 1
+    def action_order_end(self) -> int:
+        return self.num_presets + 1 + len(self.orderable_items)
+    def action_wait(self) -> int:
+        return self.num_presets + 1 + len(self.orderable_items)
+    def action_finish(self) -> int:
+        return self.num_presets + 2 + len(self.orderable_items)
+# ---------------------------------------------------------------------------
+# PCR experiment spec (default / backward compatibility)
+# ---------------------------------------------------------------------------
+def _pcr_sample_hidden_optimum(rng: np.random.Generator) -> dict[str, Any]:
+    temps = [55.0, 65.0, 72.0]
+    cycles = [25, 35]
+    ratios = ["conservative", "aggressive"]
+    opt_temp = float(rng.choice(temps, p=[0.2, 0.5, 0.3])) + rng.uniform(-3.0, 3.0)
+    opt_cycles = float(rng.choice(cycles, p=[0.6, 0.4])) + rng.uniform(-2.0, 2.0)
+    opt_ratio = str(rng.choice(ratios, p=[0.6, 0.4]))
+    return {"temp": opt_temp, "cycles": opt_cycles, "ratio": opt_ratio}
+def _pcr_sample_assay_result(
+    hidden: dict[str, Any],
+    preset_idx: int,
+    presets: list[dict[str, Any]],
+    rng: np.random.Generator,
+) -> str:
+    preset = presets[preset_idx]
+    chosen_temp = float(preset["temp"])
+    chosen_cycles = float(preset["cycles"])
+    chosen_ratio = str(preset["ratio"])
+    opt_temp = hidden["temp"]
+    opt_cycles = hidden["cycles"]
+    opt_ratio = hidden["ratio"]
+    temp_close = 1.0 - min(abs(chosen_temp - opt_temp) / 20.0, 1.0)
+    cycle_close = 1.0 - min(abs(chosen_cycles - opt_cycles) / 15.0, 1.0)
+    ratio_match = 1.0 if chosen_ratio == opt_ratio else 0.3
+    closeness = temp_close * cycle_close * ratio_match
+    p_success = closeness ** 2
+    p_partial = closeness * (1.0 - closeness) * 0.8
+    p_fail = 1.0 - p_success - p_partial
+    return str(
+        rng.choice(["success", "partial", "fail"], p=[p_success, p_partial, p_fail])
+    )
+def _pcr_evaluate_custom_protocol(
+    hidden: dict[str, Any],
+    protocol: dict[str, Any],
+    rng: np.random.Generator,
+) -> str:
+    """Evaluate any protocol dict (temp, cycles, ratio) against hidden optimum."""
+    chosen_temp = float(protocol.get("temp", 60.0))
+    chosen_cycles = float(protocol.get("cycles", 30))
+    r = str(protocol.get("ratio", "conservative")).strip().lower()
+    chosen_ratio = "conservative" if "conservative" in r else "aggressive"
+    opt_temp = hidden["temp"]
+    opt_cycles = hidden["cycles"]
+    opt_ratio = hidden["ratio"]
+    temp_close = 1.0 - min(abs(chosen_temp - opt_temp) / 20.0, 1.0)
+    cycle_close = 1.0 - min(abs(chosen_cycles - opt_cycles) / 15.0, 1.0)
+    ratio_match = 1.0 if chosen_ratio == opt_ratio else 0.3
+    closeness = temp_close * cycle_close * ratio_match
+    p_success = closeness ** 2
+    p_partial = closeness * (1.0 - closeness) * 0.8
+    p_fail = 1.0 - p_success - p_partial
+    return str(
+        rng.choice(["success", "partial", "fail"], p=[p_success, p_partial, p_fail])
+    )
+PCR_PROTOCOL_SCHEMA = {
+    "temp": {"type": "number", "description": "Annealing temperature in °C (e.g. 55–72)"},
+    "cycles": {"type": "integer", "description": "Number of PCR cycles (e.g. 25–40)"},
+    "ratio": {"type": "string", "enum": ["conservative", "aggressive"], "description": "Reagent ratio"},
+}
+def pcr_experiment_spec() -> ExperimentSpec:
+    """Build the default PCR experiment spec (same behaviour as original LabEnv)."""
+    from itertools import product
+    temps = [55.0, 65.0, 72.0]
+    cycles = [25, 35]
+    ratios = ["conservative", "aggressive"]
+    presets = [
+        {"temp": t, "cycles": c, "ratio": r}
+        for t, c, r in product(temps, cycles, ratios)
+    ]
+    return ExperimentSpec(
+        name="pcr",
+        presets=presets,
+        inventory_items=["tips", "buffer", "polymerase", "samples"],
+        orderable_items=["tips", "buffer", "polymerase"],
+        initial_inventory={"tips": 10, "buffer": 10, "polymerase": 5, "samples": 8},
+        order_costs={
+            "tips": (5, 10.0),
+            "buffer": (5, 15.0),
+            "polymerase": (3, 30.0),
+        },
+        result_labels=["none", "success", "partial", "fail"],
+        max_steps=30,
+        max_minutes=240.0,
+        initial_budget=500.0,
+        max_inventory=20,
+        assay_time_minutes=20.0,
+        order_time_minutes=5.0,
+        wait_minutes=15.0,
+        assay_penalty=-3.0,
+        time_penalty_per_min=-0.25,
+        no_success_penalty=-20.0,
+        immediate_result_reward={"success": 15.0, "partial": 5.0, "fail": 0.0},
+        terminal_bonus={"success": 60.0, "partial": 25.0},
+        sample_hidden_optimum=_pcr_sample_hidden_optimum,
+        sample_assay_result=_pcr_sample_assay_result,
+        evaluate_custom_protocol=_pcr_evaluate_custom_protocol,
+        protocol_param_schema=PCR_PROTOCOL_SCHEMA,
+    )
+# ---------------------------------------------------------------------------
+# ELISA experiment spec (same obs/action shape as PCR for agent compatibility)
+# ---------------------------------------------------------------------------
+def _elisa_sample_hidden_optimum(rng: np.random.Generator) -> dict[str, Any]:
+    coating_hrs = [1.0, 2.0, 3.0]
+    temps = [4.0, 37.0]
+    blocks = ["bsa", "casein"]
+    opt_coating = float(rng.choice(coating_hrs, p=[0.3, 0.5, 0.2])) + rng.uniform(-0.2, 0.2)
+    opt_temp = float(rng.choice(temps, p=[0.5, 0.5])) + rng.uniform(-2.0, 2.0)
+    opt_block = str(rng.choice(blocks, p=[0.6, 0.4]))
+    return {"coating_hr": opt_coating, "temp": opt_temp, "block": opt_block}
+def _elisa_sample_assay_result(
+    hidden: dict[str, Any],
+    preset_idx: int,
+    presets: list[dict[str, Any]],
+    rng: np.random.Generator,
+) -> str:
+    preset = presets[preset_idx]
+    c = float(preset["coating_hr"])
+    t = float(preset["temp"])
+    b = str(preset["block"])
+    oc = hidden["coating_hr"]
+    ot = hidden["temp"]
+    ob = hidden["block"]
+    coat_close = 1.0 - min(abs(c - oc) / 2.0, 1.0)
+    temp_close = 1.0 - min(abs(t - ot) / 35.0, 1.0)
+    block_match = 1.0 if b == ob else 0.3
+    closeness = coat_close * temp_close * block_match
+    p_success = closeness ** 2
+    p_partial = closeness * (1.0 - closeness) * 0.8
+    p_fail = 1.0 - p_success - p_partial
+    return str(
+        rng.choice(["success", "partial", "fail"], p=[p_success, p_partial, p_fail])
+    )
+def _elisa_evaluate_custom_protocol(
+    hidden: dict[str, Any],
+    protocol: dict[str, Any],
+    rng: np.random.Generator,
+) -> str:
+    """Evaluate any protocol dict (coating_hr, temp, block) against hidden optimum."""
+    c = float(protocol.get("coating_hr", 2.0))
+    t = float(protocol.get("temp", 25.0))
+    b = str(protocol.get("block", "bsa")).strip().lower()
+    block_clean = "bsa" if "bsa" in b else "casein"
+    oc, ot, ob = hidden["coating_hr"], hidden["temp"], hidden["block"]
+    coat_close = 1.0 - min(abs(c - oc) / 2.0, 1.0)
+    temp_close = 1.0 - min(abs(t - ot) / 35.0, 1.0)
+    block_match = 1.0 if block_clean == ob else 0.3
+    closeness = coat_close * temp_close * block_match
+    p_success = closeness ** 2
+    p_partial = closeness * (1.0 - closeness) * 0.8
+    p_fail = 1.0 - p_success - p_partial
+    return str(
+        rng.choice(["success", "partial", "fail"], p=[p_success, p_partial, p_fail])
+    )
+ELISA_PROTOCOL_SCHEMA = {
+    "coating_hr": {"type": "number", "description": "Coating time in hours (e.g. 1–3)"},
+    "temp": {"type": "number", "description": "Incubation temperature °C (e.g. 4 or 37)"},
+    "block": {"type": "string", "enum": ["bsa", "casein"], "description": "Blocking agent"},
+}
+def elisa_experiment_spec() -> ExperimentSpec:
+    """ELISA readout: coating time (hr), temperature (°C), blocking type. Same obs/action dims as PCR."""
+    from itertools import product
+    coating_hrs = [1.0, 2.0, 3.0]
+    temps = [4.0, 37.0]
+    blocks = ["bsa", "casein"]
+    presets = [
+        {"coating_hr": ch, "temp": t, "block": bl}
+        for ch, t, bl in product(coating_hrs, temps, blocks)
+    ]
+    return ExperimentSpec(
+        name="elisa",
+        presets=presets,
+        inventory_items=["tips", "buffer", "polymerase", "samples"],
+        orderable_items=["tips", "buffer", "polymerase"],
+        initial_inventory={"tips": 10, "buffer": 10, "polymerase": 5, "samples": 8},
+        order_costs={
+            "tips": (5, 10.0),
+            "buffer": (5, 15.0),
+            "polymerase": (3, 30.0),
+        },
+        result_labels=["none", "success", "partial", "fail"],
+        max_steps=30,
+        max_minutes=240.0,
+        initial_budget=500.0,
+        max_inventory=20,
+        assay_time_minutes=20.0,
+        order_time_minutes=5.0,
+        wait_minutes=15.0,
+        assay_penalty=-3.0,
+        time_penalty_per_min=-0.25,
+        no_success_penalty=-20.0,
+        immediate_result_reward={"success": 15.0, "partial": 5.0, "fail": 0.0},
+        terminal_bonus={"success": 60.0, "partial": 25.0},
+        sample_hidden_optimum=_elisa_sample_hidden_optimum,
+        sample_assay_result=_elisa_sample_assay_result,
+        evaluate_custom_protocol=_elisa_evaluate_custom_protocol,
+        protocol_param_schema=ELISA_PROTOCOL_SCHEMA,
+    )
+# ---------------------------------------------------------------------------
+# Workflow ID -> spec registry (for UI / API)
+# ---------------------------------------------------------------------------
+def get_spec_for_workflow(workflow_id: str) -> ExperimentSpec:
+    """Return the experiment spec for a given workflow ID. Unknown IDs default to PCR."""
+    _registry: dict[str, Callable[[], ExperimentSpec]] = {
+        "pcr-amplification": pcr_experiment_spec,
+        "elisa-readout": elisa_experiment_spec,
+    }
+    factory = _registry.get(workflow_id) or pcr_experiment_spec
+    return factory()

pyproject.toml ADDED Viewed

	@@ -0,0 +1,24 @@

+[build-system]
+requires = ["setuptools>=68.0", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "simlab"
+version = "0.1.0"
+description = "Lab Automation RL Environment — a Gymnasium-style simulated wet-lab for agentic RL training"
+readme = "README.md"
+requires-python = ">=3.10"
+license = {text = "MIT"}
+dependencies = [
+    "numpy>=1.24",
+    "torch>=2.0",
+    "gymnasium>=0.29",
+    "openenv-core>=0.2.0",
+]
+[project.optional-dependencies]
+dev = ["pytest", "ruff"]
+demo = ["openai", "streamlit"]
+[tool.setuptools.packages.find]
+include = ["lab_env*", "agents*"]

scripts/compare_all_agents.py ADDED Viewed

	@@ -0,0 +1,139 @@

+#!/usr/bin/env python3
+"""Benchmark Naive, RL, and Research LLM agents on the same eval seeds."""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv, INITIAL_BUDGET
+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+from agents.research_llm_agent import ResearchLLMAgent
+def run_episode_naive(env: LabEnv, agent: NaiveAgent, seed: int) -> dict:
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    total_reward = 0.0
+    steps = 0
+    while True:
+        action = agent.select_action(obs)
+        obs, reward, terminated, truncated, info = env.step(action)
+        total_reward += reward
+        steps += 1
+        if terminated or truncated:
+            break
+    return {
+        "reward": total_reward,
+        "success": info["best_result"] == "success",
+        "partial": info["best_result"] == "partial",
+        "minutes": info["elapsed_minutes"],
+        "cost": INITIAL_BUDGET - info["remaining_budget"],
+        "steps": steps,
+    }
+def aggregate(results: list[dict]) -> dict:
+    n = len(results)
+    successes = [r["success"] for r in results]
+    steps_to_success = [r["steps"] for r in results if r["success"]] or [0]
+    return {
+        "n": n,
+        "avg_reward": sum(r["reward"] for r in results) / n,
+        "success_rate": sum(successes) / n,
+        "partial_rate": sum(r["partial"] for r in results) / n,
+        "avg_minutes": sum(r["minutes"] for r in results) / n,
+        "avg_cost": sum(r["cost"] for r in results) / n,
+        "avg_steps": sum(r["steps"] for r in results) / n,
+        "experiments_to_success": sum(steps_to_success) / len(steps_to_success) if steps_to_success else 0,
+    }
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Compare Naive, RL, and Research LLM agents")
+    parser.add_argument("--eval-episodes", type=int, default=50, help="Episodes per agent (eval)")
+    parser.add_argument("--train-episodes", type=int, default=500, help="RL training episodes before eval")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--max-trials", type=int, default=5, help="Max trials per episode (RL and LLM)")
+    parser.add_argument("--no-llm", action="store_true", help="Skip LLM agent (no API key)")
+    args = parser.parse_args()
+    eval_seed_base = 100_000 + args.seed
+    env = LabEnv()
+    # ---- Naive ----
+    print("Running Naive agent...")
+    naive_agent = NaiveAgent(num_trials=3, seed=args.seed)
+    naive_results = [
+        run_episode_naive(env, naive_agent, eval_seed_base + i)
+        for i in range(args.eval_episodes)
+    ]
+    naive_stats = aggregate(naive_results)
+    # ---- RL (train then eval) ----
+    print("Training REINFORCE agent...")
+    rl_agent = ReinforceAgent(max_trials=args.max_trials)
+    for ep in range(1, args.train_episodes + 1):
+        rl_agent.run_episode(env, seed=args.seed + ep, train=True)
+        if ep % 100 == 0:
+            print(f"  RL train episode {ep}/{args.train_episodes}")
+    print("Evaluating REINFORCE agent...")
+    rl_results = [
+        rl_agent.run_episode(env, seed=eval_seed_base + i, train=False)
+        for i in range(args.eval_episodes)
+    ]
+    rl_stats = aggregate(rl_results)
+    # ---- Research LLM ----
+    llm_stats = None
+    if not args.no_llm:
+        print("Running Research LLM agent...")
+        try:
+            llm_agent = ResearchLLMAgent(max_trials=args.max_trials)
+            llm_results = [
+                llm_agent.run_episode(env, seed=eval_seed_base + i)
+                for i in range(args.eval_episodes)
+            ]
+            llm_stats = aggregate(llm_results)
+        except Exception as e:
+            print(f"  Skipping LLM agent: {e}")
+    env.close()
+    # ---- Table ----
+    header = f"{'Metric':<22} {'Naive':>12} {'RL (MLP)':>12}"
+    if llm_stats is not None:
+        header += f" {'LLM Researcher':>14}"
+    sep = "-" * len(header)
+    print()
+    print(sep)
+    print("  Agent comparison (same eval seeds)")
+    print(sep)
+    print(header)
+    print(sep)
+    def row(label: str, n_val: str, r_val: str, l_val: str | None = None) -> None:
+        line = f"{label:<22} {n_val:>12} {r_val:>12}"
+        if l_val is not None:
+            line += f" {l_val:>14}"
+        print(line)
+    row("Success rate", f"{naive_stats['success_rate']:.1%}", f"{rl_stats['success_rate']:.1%}",
+        f"{llm_stats['success_rate']:.1%}" if llm_stats else None)
+    row("Experiments to success", f"{naive_stats['experiments_to_success']:.1f}", f"{rl_stats['experiments_to_success']:.1f}",
+        f"{llm_stats['experiments_to_success']:.1f}" if llm_stats else None)
+    row("Cost/episode", f"${naive_stats['avg_cost']:.1f}", f"${rl_stats['avg_cost']:.1f}",
+        f"${llm_stats['avg_cost']:.1f}" if llm_stats else None)
+    row("Avg reward", f"{naive_stats['avg_reward']:.1f}", f"{rl_stats['avg_reward']:.1f}",
+        f"{llm_stats['avg_reward']:.1f}" if llm_stats else None)
+    row("Avg steps", f"{naive_stats['avg_steps']:.1f}", f"{rl_stats['avg_steps']:.1f}",
+        f"{llm_stats['avg_steps']:.1f}" if llm_stats else None)
+    print(sep)
+if __name__ == "__main__":
+    main()

scripts/demo_hackathon.sh ADDED Viewed

	@@ -0,0 +1,24 @@

+#!/usr/bin/env bash
+# Hackathon live demo — start API + remind steps.
+# Run from repo root:  ./scripts/demo_hackathon.sh
+set -e
+cd "$(dirname "$0")/.."
+echo "=== SimLab Hackathon Demo ==="
+echo ""
+echo "1. Start the API (leave this running):"
+echo "   uvicorn server.app:app --host 0.0.0.0 --port 8000"
+echo ""
+echo "2. In another terminal, start the UI:"
+echo "   cd v0ap && pnpm dev"
+echo ""
+echo "3. Open http://localhost:3000"
+echo "   - Training:  /training   (set 500 episodes, Start Training, show chart + comparison)"
+echo "   - Lab run:   /workflows/pcr-amplification   (Run Naive → Run with AI Agent)"
+echo ""
+read -p "Start API in this terminal now? [y/N] " -n 1 -r
+echo
+if [[ $REPLY =~ ^[Yy]$ ]]; then
+  exec uvicorn server.app:app --host 0.0.0.0 --port 8000
+fi

scripts/demo_research_agent.py ADDED Viewed

	@@ -0,0 +1,45 @@

+#!/usr/bin/env python3
+"""Run 1–2 episodes of the research LLM agent with verbose terminal output."""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv
+from agents.research_llm_agent import ResearchLLMAgent
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Demo: Research LLM agent (verbose)")
+    parser.add_argument("--episodes", type=int, default=2, help="Number of episodes to run")
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--max-trials", type=int, default=5, help="Max trials per episode")
+    args = parser.parse_args()
+    env = LabEnv()
+    agent = ResearchLLMAgent(max_trials=args.max_trials)
+    print("=" * 60)
+    print("  Research LLM Agent — Self-Improving Lab Scientist Demo")
+    print("=" * 60)
+    for ep in range(1, args.episodes + 1):
+        print(f"\n--- Episode {ep}/{args.episodes} (seed={args.seed + ep}) ---")
+        callback: list[dict] = []
+        result = agent.run_episode(env, seed=args.seed + ep, verbose=True, episode_callback=callback)
+        for step in callback:
+            print(f"  Trial {step['trial']}: hypothesis {step['hypothesis']} -> ran {step['params_used']} -> {step['result']}")
+        print(f"  Outcome: {'SUCCESS' if result['success'] else 'partial' if result['partial'] else 'fail'}")
+        print(f"  Reward: {result['reward']:.1f}  Cost: ${result['cost']:.1f}  Steps: {result['steps']}")
+        print(f"  Knowledge: temp_range={agent.knowledge['temp_range']}, cycle_range={agent.knowledge['cycle_range']}")
+    env.close()
+    print("\nDone.")
+if __name__ == "__main__":
+    main()

scripts/run_naive_baseline.py ADDED Viewed

	@@ -0,0 +1,75 @@

+#!/usr/bin/env python3
+"""Run the naive baseline agent on LabEnv and report aggregate metrics."""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv, INITIAL_BUDGET
+from agents.naive_agent import NaiveAgent
+def run_episode(env: LabEnv, agent: NaiveAgent, seed: int) -> dict:
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    total_reward = 0.0
+    steps = 0
+    while True:
+        action = agent.select_action(obs)
+        obs, reward, terminated, truncated, info = env.step(action)
+        total_reward += reward
+        steps += 1
+        if terminated or truncated:
+            break
+    return {
+        "reward": total_reward,
+        "success": info["best_result"] == "success",
+        "partial": info["best_result"] == "partial",
+        "minutes": info["elapsed_minutes"],
+        "cost": INITIAL_BUDGET - info["remaining_budget"],
+        "steps": steps,
+    }
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Naive baseline evaluation")
+    parser.add_argument("--episodes", type=int, default=200)
+    parser.add_argument("--seed", type=int, default=42)
+    args = parser.parse_args()
+    env = LabEnv()
+    agent = NaiveAgent(num_trials=3, seed=args.seed)
+    results = [run_episode(env, agent, seed=args.seed + i) for i in range(args.episodes)]
+    env.close()
+    rewards = [r["reward"] for r in results]
+    successes = sum(r["success"] for r in results)
+    partials = sum(r["partial"] for r in results)
+    minutes = [r["minutes"] for r in results]
+    costs = [r["cost"] for r in results]
+    steps = [r["steps"] for r in results]
+    n = len(results)
+    print("=" * 50)
+    print("  Naive Baseline Results")
+    print("=" * 50)
+    print(f"  Episodes:        {n}")
+    print(f"  Avg reward:      {sum(rewards) / n:8.2f}")
+    print(f"  Success rate:    {successes / n:8.2%}")
+    print(f"  Partial rate:    {partials / n:8.2%}")
+    print(f"  Avg time (min):  {sum(minutes) / n:8.1f}")
+    print(f"  Avg cost ($):    {sum(costs) / n:8.1f}")
+    print(f"  Avg steps:       {sum(steps) / n:8.1f}")
+    print("=" * 50)
+if __name__ == "__main__":
+    main()

scripts/run_research_generate_agent.py ADDED Viewed

	@@ -0,0 +1,91 @@

+#!/usr/bin/env python3
+"""
+Run the Research & Generate agent: research → generate any protocol → run → learn from feedback.
+Uses env.run_assay_with_protocol() so the agent can try arbitrary parameter values
+(not limited to presets). Feedback from each run is passed into the next trial.
+Works with any spec that has evaluate_custom_protocol (PCR, ELISA, etc.).
+Requires: pip install openai, OPENAI_API_KEY set.
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv
+from lab_env.spec import get_spec_for_workflow
+from agents.research_generate_agent import ResearchGenerateAgent
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Run Research & Generate agent (research → generate protocol → run → learn)"
+    )
+    parser.add_argument(
+        "--episodes",
+        type=int,
+        default=5,
+        help="Number of episodes to run",
+    )
+    parser.add_argument(
+        "--workflow",
+        type=str,
+        default="pcr-amplification",
+        choices=["pcr-amplification", "elisa-readout"],
+        help="Experiment type (uses spec with custom protocol support)",
+    )
+    parser.add_argument(
+        "--max-trials",
+        type=int,
+        default=6,
+        help="Max protocol attempts per episode",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=42,
+    )
+    parser.add_argument(
+        "--verbose",
+        action="store_true",
+        help="Print each trial's protocol and result",
+    )
+    args = parser.parse_args()
+    spec = get_spec_for_workflow(args.workflow)
+    env = LabEnv(spec=spec)
+    agent = ResearchGenerateAgent(max_trials=args.max_trials)
+    print(f"Research & Generate agent — workflow={args.workflow}, episodes={args.episodes}")
+    print("(Research → generate protocol → run in lab → learn from feedback)\n")
+    results = []
+    for ep in range(args.episodes):
+        seed = args.seed + ep * 1000
+        if args.verbose:
+            print(f"--- Episode {ep + 1} (seed={seed}) ---")
+        out = agent.run_episode(env, seed=seed, verbose=args.verbose)
+        results.append(out)
+        if not args.verbose:
+            print(
+                f"  Episode {ep + 1}: reward={out['reward']:.1f}, "
+                f"success={out['success']}, protocols_tried={out['num_protocols_tried']}"
+            )
+    env.close()
+    n = len(results)
+    print("\n--- Summary ---")
+    print(f"  Success rate:     {sum(r['success'] for r in results) / n:.1%}")
+    print(f"  Partial rate:     {sum(r['partial'] for r in results) / n:.1%}")
+    print(f"  Avg reward:       {sum(r['reward'] for r in results) / n:.1f}")
+    print(f"  Avg protocols:    {sum(r['num_protocols_tried'] for r in results) / n:.1f} per episode")
+if __name__ == "__main__":
+    main()

scripts/train_and_eval_agent.py ADDED Viewed

	@@ -0,0 +1,148 @@

+#!/usr/bin/env python3
+"""Train a REINFORCE agent on LabEnv and compare against the naive baseline."""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv, INITIAL_BUDGET
+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+# ------------------------------------------------------------------
+# Naive episode runner
+# ------------------------------------------------------------------
+def run_episode_naive(env: LabEnv, agent: NaiveAgent, seed: int) -> dict:
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    total_reward = 0.0
+    steps = 0
+    while True:
+        action = agent.select_action(obs)
+        obs, reward, terminated, truncated, info = env.step(action)
+        total_reward += reward
+        steps += 1
+        if terminated or truncated:
+            break
+    return {
+        "reward": total_reward,
+        "success": info["best_result"] == "success",
+        "partial": info["best_result"] == "partial",
+        "minutes": info["elapsed_minutes"],
+        "cost": INITIAL_BUDGET - info["remaining_budget"],
+        "steps": steps,
+    }
+# ------------------------------------------------------------------
+# Aggregation
+# ------------------------------------------------------------------
+def aggregate(results: list[dict]) -> dict:
+    n = len(results)
+    return {
+        "n": n,
+        "avg_reward": sum(r["reward"] for r in results) / n,
+        "success_rate": sum(r["success"] for r in results) / n,
+        "partial_rate": sum(r["partial"] for r in results) / n,
+        "avg_minutes": sum(r["minutes"] for r in results) / n,
+        "avg_cost": sum(r["cost"] for r in results) / n,
+        "avg_steps": sum(r["steps"] for r in results) / n,
+    }
+# ------------------------------------------------------------------
+# Main
+# ------------------------------------------------------------------
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Train & evaluate REINFORCE agent")
+    parser.add_argument("--train-episodes", type=int, default=2000)
+    parser.add_argument("--eval-episodes", type=int, default=100)
+    parser.add_argument("--log-interval", type=int, default=100)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--lr", type=float, default=3e-3)
+    parser.add_argument("--gamma", type=float, default=0.99)
+    parser.add_argument("--max-trials", type=int, default=4)
+    args = parser.parse_args()
+    env = LabEnv()
+    rl_agent = ReinforceAgent(lr=args.lr, gamma=args.gamma, max_trials=args.max_trials)
+    # ---- Training ----
+    print("=" * 60)
+    print("  Training REINFORCE agent")
+    print("=" * 60)
+    window: list[float] = []
+    successes_window: list[bool] = []
+    for ep in range(1, args.train_episodes + 1):
+        result = rl_agent.run_episode(env, seed=args.seed + ep, train=True)
+        window.append(result["reward"])
+        successes_window.append(result["success"])
+        if ep % args.log_interval == 0:
+            avg = sum(window) / len(window)
+            sr = sum(successes_window) / len(successes_window)
+            print(
+                f"  Episode {ep:5d} | avg reward (last {args.log_interval}): "
+                f"{avg:7.1f} | success rate: {sr:.0%}"
+            )
+            window.clear()
+            successes_window.clear()
+    # ---- Evaluation ----
+    print()
+    print("=" * 60)
+    print("  Evaluating on fixed seed range")
+    print("=" * 60)
+    eval_seed_base = 999_999
+    rl_results = [
+        rl_agent.run_episode(env, seed=eval_seed_base + i, train=False)
+        for i in range(args.eval_episodes)
+    ]
+    naive_agent = NaiveAgent(num_trials=3, seed=0)
+    naive_results = [
+        run_episode_naive(env, naive_agent, seed=eval_seed_base + i)
+        for i in range(args.eval_episodes)
+    ]
+    env.close()
+    rl_stats = aggregate(rl_results)
+    naive_stats = aggregate(naive_results)
+    header = f"{'Metric':<20s} {'REINFORCE':>12s} {'Naive':>12s}"
+    sep = "-" * len(header)
+    rows = [
+        ("Avg reward",   f"{rl_stats['avg_reward']:.1f}",   f"{naive_stats['avg_reward']:.1f}"),
+        ("Success rate", f"{rl_stats['success_rate']:.1%}",  f"{naive_stats['success_rate']:.1%}"),
+        ("Partial rate", f"{rl_stats['partial_rate']:.1%}",  f"{naive_stats['partial_rate']:.1%}"),
+        ("Avg time",     f"{rl_stats['avg_minutes']:.1f}m",  f"{naive_stats['avg_minutes']:.1f}m"),
+        ("Avg cost",     f"${rl_stats['avg_cost']:.1f}",     f"${naive_stats['avg_cost']:.1f}"),
+        ("Avg steps",    f"{rl_stats['avg_steps']:.1f}",     f"{naive_stats['avg_steps']:.1f}"),
+    ]
+    print()
+    print(header)
+    print(sep)
+    for label, rl_val, naive_val in rows:
+        print(f"{label:<20s} {rl_val:>12s} {naive_val:>12s}")
+    print(sep)
+    print()
+if __name__ == "__main__":
+    main()

scripts/train_per_protocol.py ADDED Viewed

	@@ -0,0 +1,82 @@

+#!/usr/bin/env python3
+"""
+Train a separate REINFORCE agent for each protocol set (e.g. PCR, ELISA).
+Each protocol has its own presets and outcome model. Training one agent per
+protocol gives you a policy tailored to that protocol's action/observation
+space. Checkpoints are saved under checkpoints/<workflow_id>.pt.
+Usage:
+  python scripts/train_per_protocol.py --workflows pcr-amplification elisa-readout
+  python scripts/train_per_protocol.py --workflows pcr-amplification --train-episodes 1000
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import LabEnv
+from lab_env.spec import get_spec_for_workflow
+from agents.rl_agent import ReinforceAgent
+def main() -> None:
+    parser = argparse.ArgumentParser(
+        description="Train one RL agent per protocol set (different presets / specs)"
+    )
+    parser.add_argument(
+        "--workflows",
+        nargs="+",
+        default=["pcr-amplification", "elisa-readout"],
+        help="Workflow IDs to train (each gets its own agent and checkpoint)",
+    )
+    parser.add_argument("--train-episodes", type=int, default=1500)
+    parser.add_argument("--eval-episodes", type=int, default=50)
+    parser.add_argument("--lr", type=float, default=3e-3)
+    parser.add_argument("--max-trials", type=int, default=4)
+    parser.add_argument("--checkpoint-dir", type=str, default="checkpoints")
+    parser.add_argument("--seed", type=int, default=42)
+    args = parser.parse_args()
+    Path(args.checkpoint_dir).mkdir(parents=True, exist_ok=True)
+    for workflow_id in args.workflows:
+        spec = get_spec_for_workflow(workflow_id)
+        env = LabEnv(spec=spec)
+        agent = ReinforceAgent(
+            lr=args.lr,
+            max_trials=args.max_trials,
+            spec=spec,
+        )
+        print(f"\n{'='*60}")
+        print(f"  Training for protocol: {workflow_id} (presets={spec.num_presets}, obs_dim={spec.obs_dim})")
+        print("=" * 60)
+        for ep in range(1, args.train_episodes + 1):
+            result = agent.run_episode(env, seed=args.seed + ep, train=True)
+            if ep % 200 == 0 or ep == args.train_episodes:
+                print(f"  Episode {ep:5d} | reward: {result['reward']:7.1f} | success: {result['success']}")
+        checkpoint_path = Path(args.checkpoint_dir) / f"{workflow_id}.pt"
+        agent.save(str(checkpoint_path))
+        print(f"  Saved checkpoint: {checkpoint_path}")
+        # Quick eval
+        successes = 0
+        for i in range(args.eval_episodes):
+            r = agent.run_episode(env, seed=999_000 + i, train=False)
+            successes += r["success"]
+        print(f"  Eval success rate: {successes / args.eval_episodes:.0%}")
+        env.close()
+    print("\nDone. Use each checkpoint with LabEnv(spec=<same_spec>) and ReinforceAgent(spec=spec).load(path).")
+if __name__ == "__main__":
+    main()

scripts/visualize.py ADDED Viewed

	@@ -0,0 +1,258 @@

+#!/usr/bin/env python3
+"""Train, evaluate, and visualize REINFORCE vs Naive agent on LabEnv.
+Produces a 2x2 figure:
+  Top-left:     Training reward curve (smoothed)
+  Top-right:    Training success-rate curve (smoothed)
+  Bottom-left:  Final comparison bar chart (reward, success%, partial%)
+  Bottom-right: Single-episode trace showing the RL agent's actions
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+import numpy as np
+import matplotlib.pyplot as plt
+import matplotlib.ticker as mticker
+from lab_env.env import (
+    LabEnv,
+    INITIAL_BUDGET,
+    ACTION_SETUP_START,
+    ACTION_SETUP_END,
+    ACTION_RUN_ASSAY,
+    ACTION_ORDER_TIPS,
+    ACTION_ORDER_BUFFER,
+    ACTION_ORDER_POLYMERASE,
+    ACTION_WAIT,
+    ACTION_FINISH,
+    PRESETS,
+)
+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+def smooth(values: list[float], window: int = 50) -> np.ndarray:
+    if len(values) < window:
+        return np.array(values)
+    kernel = np.ones(window) / window
+    return np.convolve(values, kernel, mode="valid")
+def run_episode_naive(env: LabEnv, agent: NaiveAgent, seed: int) -> dict:
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    total_reward = 0.0
+    steps = 0
+    while True:
+        action = agent.select_action(obs)
+        obs, reward, terminated, truncated, info = env.step(action)
+        total_reward += reward
+        steps += 1
+        if terminated or truncated:
+            break
+    return {
+        "reward": total_reward,
+        "success": info["best_result"] == "success",
+        "partial": info["best_result"] == "partial",
+        "minutes": info["elapsed_minutes"],
+        "cost": INITIAL_BUDGET - info["remaining_budget"],
+        "steps": steps,
+    }
+def trace_rl_episode(env: LabEnv, agent: ReinforceAgent, seed: int) -> list[dict]:
+    """Run one episode and return a step-by-step trace for visualization."""
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    trace: list[dict] = []
+    for trial in range(agent.max_trials):
+        if agent._inventory_low(obs):
+            for act in (ACTION_ORDER_TIPS, ACTION_ORDER_BUFFER, ACTION_ORDER_POLYMERASE):
+                obs, rew, done, trunc, info = env.step(act)
+                trace.append({"action": "order", "label": "Order", "result": "", "reward": rew, "minutes": info["elapsed_minutes"]})
+                if done or trunc:
+                    return trace
+        preset = agent._select_preset(obs, deterministic=True)
+        p = PRESETS[preset]
+        label = f"Setup {p['temp']}C/{p['cycles']}cy/{p['ratio'][:4]}"
+        obs, rew, done, trunc, info = env.step(ACTION_SETUP_START + preset)
+        trace.append({"action": "setup", "label": label, "result": "", "reward": rew, "minutes": info["elapsed_minutes"]})
+        if done or trunc:
+            return trace
+        obs, rew, done, trunc, info = env.step(ACTION_RUN_ASSAY)
+        trace.append({"action": "run", "label": "Run assay", "result": info["last_result"], "reward": rew, "minutes": info["elapsed_minutes"]})
+        if done or trunc:
+            return trace
+        if info.get("best_result") == "success":
+            obs, rew, _, _, info = env.step(ACTION_FINISH)
+            trace.append({"action": "finish", "label": "Finish", "result": "success", "reward": rew, "minutes": info["elapsed_minutes"]})
+            return trace
+    if not (done or trunc):
+        obs, rew, _, _, info = env.step(ACTION_FINISH)
+        trace.append({"action": "finish", "label": "Finish", "result": info["best_result"], "reward": rew, "minutes": info["elapsed_minutes"]})
+    return trace
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Visualize training & evaluation")
+    parser.add_argument("--train-episodes", type=int, default=2000)
+    parser.add_argument("--eval-episodes", type=int, default=200)
+    parser.add_argument("--seed", type=int, default=42)
+    parser.add_argument("--save", type=str, default="", help="Save figure to path instead of showing")
+    args = parser.parse_args()
+    env = LabEnv()
+    rl_agent = ReinforceAgent(max_trials=4)
+    # ---- Training with metric collection ----
+    print(f"Training REINFORCE for {args.train_episodes} episodes...")
+    train_rewards: list[float] = []
+    train_successes: list[float] = []
+    for ep in range(1, args.train_episodes + 1):
+        result = rl_agent.run_episode(env, seed=args.seed + ep, train=True)
+        train_rewards.append(result["reward"])
+        train_successes.append(float(result["success"]))
+        if ep % 500 == 0:
+            print(f"  ...episode {ep}/{args.train_episodes}")
+    # ---- Evaluation ----
+    print(f"Evaluating both agents for {args.eval_episodes} episodes...")
+    eval_seed = 999_999
+    naive_agent = NaiveAgent(num_trials=3, seed=0)
+    rl_eval = [rl_agent.run_episode(env, seed=eval_seed + i, train=False) for i in range(args.eval_episodes)]
+    naive_eval = [run_episode_naive(env, naive_agent, seed=eval_seed + i) for i in range(args.eval_episodes)]
+    # ---- Episode trace ----
+    trace = trace_rl_episode(env, rl_agent, seed=12345)
+    env.close()
+    # ---- Aggregate ----
+    def agg(results):
+        n = len(results)
+        return {
+            "reward": sum(r["reward"] for r in results) / n,
+            "success": sum(r["success"] for r in results) / n,
+            "partial": sum(r["partial"] for r in results) / n,
+            "minutes": sum(r["minutes"] for r in results) / n,
+        }
+    rl_stats = agg(rl_eval)
+    naive_stats = agg(naive_eval)
+    # ==================================================================
+    # Plot
+    # ==================================================================
+    fig, axes = plt.subplots(2, 2, figsize=(14, 10))
+    fig.suptitle("SimLab — Lab Automation RL Environment", fontsize=16, fontweight="bold")
+    # -- Top-left: reward curve --
+    ax = axes[0, 0]
+    smoothed_r = smooth(train_rewards, window=50)
+    ax.plot(range(len(smoothed_r)), smoothed_r, color="#2563eb", linewidth=1.5)
+    ax.axhline(y=0, color="gray", linestyle="--", alpha=0.5)
+    ax.set_title("Training Reward (smoothed, window=50)")
+    ax.set_xlabel("Episode")
+    ax.set_ylabel("Total Episode Reward")
+    ax.grid(True, alpha=0.3)
+    # -- Top-right: success rate curve --
+    ax = axes[0, 1]
+    smoothed_s = smooth(train_successes, window=100) * 100
+    ax.plot(range(len(smoothed_s)), smoothed_s, color="#16a34a", linewidth=1.5)
+    ax.set_title("Training Success Rate (smoothed, window=100)")
+    ax.set_xlabel("Episode")
+    ax.set_ylabel("Success %")
+    ax.yaxis.set_major_formatter(mticker.PercentFormatter())
+    ax.set_ylim(0, 100)
+    ax.grid(True, alpha=0.3)
+    # -- Bottom-left: comparison bars --
+    ax = axes[1, 0]
+    metrics = ["Avg Reward", "Success %", "Partial %", "Avg Time (min)"]
+    rl_vals = [rl_stats["reward"], rl_stats["success"] * 100, rl_stats["partial"] * 100, rl_stats["minutes"]]
+    naive_vals = [naive_stats["reward"], naive_stats["success"] * 100, naive_stats["partial"] * 100, naive_stats["minutes"]]
+    x = np.arange(len(metrics))
+    w = 0.35
+    bars_rl = ax.bar(x - w / 2, rl_vals, w, label="REINFORCE", color="#2563eb", edgecolor="white")
+    bars_naive = ax.bar(x + w / 2, naive_vals, w, label="Naive", color="#f97316", edgecolor="white")
+    ax.set_xticks(x)
+    ax.set_xticklabels(metrics, fontsize=9)
+    ax.set_title("Evaluation Comparison")
+    ax.legend()
+    ax.grid(True, alpha=0.3, axis="y")
+    for bar_group in (bars_rl, bars_naive):
+        for bar in bar_group:
+            h = bar.get_height()
+            ax.annotate(f"{h:.1f}", xy=(bar.get_x() + bar.get_width() / 2, h),
+                        xytext=(0, 4), textcoords="offset points",
+                        ha="center", va="bottom", fontsize=8)
+    # -- Bottom-right: episode trace --
+    ax = axes[1, 1]
+    if trace:
+        y_labels = []
+        colors = []
+        for i, step in enumerate(trace):
+            y_labels.append(step["label"])
+            if step["result"] == "success":
+                colors.append("#16a34a")
+            elif step["result"] == "partial":
+                colors.append("#eab308")
+            elif step["result"] == "fail":
+                colors.append("#dc2626")
+            else:
+                colors.append("#6b7280")
+        y_pos = np.arange(len(trace))
+        minutes = [s["minutes"] for s in trace]
+        ax.barh(y_pos, minutes, color=colors, edgecolor="white", height=0.6)
+        ax.set_yticks(y_pos)
+        ax.set_yticklabels(y_labels, fontsize=8)
+        ax.invert_yaxis()
+        ax.set_xlabel("Elapsed Minutes")
+        ax.set_title("Single Episode Trace (RL Agent)")
+        for i, step in enumerate(trace):
+            if step["result"] in ("success", "partial", "fail"):
+                ax.annotate(step["result"], xy=(minutes[i], i),
+                            xytext=(5, 0), textcoords="offset points",
+                            va="center", fontsize=8, fontweight="bold",
+                            color=colors[i])
+    else:
+        ax.text(0.5, 0.5, "No trace data", ha="center", va="center", transform=ax.transAxes)
+        ax.set_title("Single Episode Trace (RL Agent)")
+    plt.tight_layout(rect=[0, 0, 1, 0.95])
+    if args.save:
+        fig.savefig(args.save, dpi=150, bbox_inches="tight")
+        print(f"Saved to {args.save}")
+    else:
+        plt.show()
+    # Print summary
+    print()
+    print(f"  REINFORCE: reward={rl_stats['reward']:.1f}  success={rl_stats['success']:.1%}  time={rl_stats['minutes']:.0f}m")
+    print(f"  Naive:     reward={naive_stats['reward']:.1f}  success={naive_stats['success']:.1%}  time={naive_stats['minutes']:.0f}m")
+if __name__ == "__main__":
+    main()

server/app.py ADDED Viewed

	@@ -0,0 +1,621 @@

+"""
+FastAPI server bridging the LabEnv Python backend to the Next.js frontend.
+Endpoints:
+    POST /api/training/start   — train the agent (SSE stream)
+    POST /api/run/ai           — run one AI-agent episode
+    POST /api/run/naive        — run one naive-agent episode
+    POST /api/env/reset        — reset environment
+    POST /api/env/step         — take one step
+    GET  /api/stats            — dashboard aggregate stats
+"""
+from __future__ import annotations
+import json
+import sys
+import time
+from pathlib import Path
+from typing import Any
+from fastapi import FastAPI, Request
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import StreamingResponse
+from pydantic import BaseModel
+sys.path.insert(0, str(Path(__file__).resolve().parent.parent))
+from lab_env.env import (
+    LabEnv,
+    INITIAL_BUDGET,
+    ACTION_SETUP_START,
+    ACTION_RUN_ASSAY,
+    ACTION_ORDER_TIPS,
+    ACTION_ORDER_BUFFER,
+    ACTION_ORDER_POLYMERASE,
+    ACTION_WAIT,
+    ACTION_FINISH,
+)
+from lab_env.spec import pcr_experiment_spec, get_spec_for_workflow
+from agents.naive_agent import NaiveAgent
+from agents.rl_agent import ReinforceAgent
+# Per-workflow envs (created on first use). RL agent is shared and trained on PCR.
+_envs: dict[str, LabEnv] = {}
+try:
+    from agents.research_llm_agent import ResearchLLMAgent
+    HAS_RESEARCH_AGENT = True
+except ImportError:
+    ResearchLLMAgent = None
+    HAS_RESEARCH_AGENT = False
+try:
+    from agents.research_generate_agent import ResearchGenerateAgent
+    HAS_RESEARCH_GENERATE_AGENT = True
+except ImportError:
+    ResearchGenerateAgent = None
+    HAS_RESEARCH_GENERATE_AGENT = False
+app = FastAPI(title="SimLab API")
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+rl_agent: ReinforceAgent | None = None
+_trained_agents: dict[str, ReinforceAgent] = {}  # workflow_id -> agent (for UI per-protocol training)
+run_history: list[dict] = []
+def _get_env(workflow_id: str) -> LabEnv:
+    """Get or create LabEnv for this workflow. Uses spec from get_spec_for_workflow(workflow_id)."""
+    if workflow_id not in _envs:
+        spec = get_spec_for_workflow(workflow_id)
+        _envs[workflow_id] = LabEnv(spec=spec)
+    return _envs[workflow_id]
+# ──────────────────────────────────────────────
+# Request / response models
+# ──────────────────────────────────────────────
+class TrainRequest(BaseModel):
+    episodes: int = 2000
+    lr: float = 3e-3
+    max_trials: int = 4
+    eval_episodes: int = 100
+    workflow_id: str = "pcr-amplification"
+class StepRequest(BaseModel):
+    action: int
+    workflow_id: str = "pcr-amplification"
+class RunRequest(BaseModel):
+    seed: int = 42
+    workflow_id: str = "pcr-amplification"
+# ──────────────────────────────────────────────
+# Helpers
+# ──────────────────────────────────────────────
+def _env_state_dict(env: LabEnv) -> dict[str, Any]:
+    info = env._info()
+    return {
+        "step_index": info["step_index"],
+        "elapsed_minutes": info["elapsed_minutes"],
+        "remaining_budget": info["remaining_budget"],
+        "inventory": info["inventory"],
+        "last_result": info["last_result"],
+        "best_result": info["best_result"],
+        "max_time": 240,
+        "max_budget": 500,
+    }
+def _trace_episode(env: LabEnv, agent: ReinforceAgent, seed: int) -> dict:
+    """Run an AI episode and produce a step-by-step timeline."""
+    presets = env.spec.presets
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    timeline: list[dict] = []
+    presets_tried: dict[int, str] = {}
+    for trial in range(agent.max_trials):
+        if agent._inventory_low(obs):
+            for act in (ACTION_ORDER_TIPS, ACTION_ORDER_BUFFER, ACTION_ORDER_POLYMERASE):
+                obs, rew, done, trunc, info = env.step(act)
+                timeline.append({
+                    "title": "Order Reagents",
+                    "description": _order_label(act),
+                    "time": f"{info['elapsed_minutes']:.0f} min",
+                    "status": "action",
+                    "icon": "order",
+                })
+                if done or trunc:
+                    return _build_run_result(env, info, timeline, presets_tried)
+        preset = agent._select_preset(obs, deterministic=True)
+        p = presets[preset]
+        label = _preset_label(p)
+        obs, rew, done, trunc, info = env.step(ACTION_SETUP_START + preset)
+        timeline.append({
+            "title": "Setup",
+            "description": label,
+            "time": f"{info['elapsed_minutes']:.0f} min",
+            "status": "pending",
+            "icon": "setup",
+        })
+        if done or trunc:
+            return _build_run_result(env, info, timeline, presets_tried)
+        obs, rew, done, trunc, info = env.step(ACTION_RUN_ASSAY)
+        result = info["last_result"]
+        presets_tried[preset] = result
+        timeline.append({
+            "title": "Run Assay",
+            "description": _result_description(result),
+            "time": f"{info['elapsed_minutes']:.0f} min",
+            "status": result,
+            "icon": "run",
+        })
+        if done or trunc:
+            return _build_run_result(env, info, timeline, presets_tried)
+        if info.get("best_result") == "success":
+            obs, rew, _, _, info = env.step(ACTION_FINISH)
+            timeline.append({
+                "title": "Finish",
+                "description": "Experiment complete — success!",
+                "time": f"{info['elapsed_minutes']:.0f} min",
+                "status": "success",
+                "icon": "finish",
+            })
+            return _build_run_result(env, info, timeline, presets_tried)
+    obs, rew, _, _, info = env.step(ACTION_FINISH)
+    timeline.append({
+        "title": "Finish",
+        "description": f"Experiment complete — best: {info['best_result']}",
+        "time": f"{info['elapsed_minutes']:.0f} min",
+        "status": info["best_result"] if info["best_result"] in ("success", "partial") else "fail",
+        "icon": "finish",
+    })
+    return _build_run_result(env, info, timeline, presets_tried)
+def _trace_naive_episode(env: LabEnv, agent: NaiveAgent, seed: int) -> dict:
+    presets = env.spec.presets
+    num_presets = len(presets)
+    obs, info = env.reset(seed=seed)
+    agent.reset()
+    timeline: list[dict] = []
+    presets_tried: dict[int, str] = {}
+    total_reward = 0.0
+    while True:
+        action = agent.select_action(obs)
+        obs, reward, done, trunc, info = env.step(action)
+        total_reward += reward
+        if ACTION_SETUP_START <= action < ACTION_SETUP_START + num_presets:
+            p = presets[action - ACTION_SETUP_START]
+            timeline.append({
+                "title": "Setup",
+                "description": _preset_label(p),
+                "time": f"{info['elapsed_minutes']:.0f} min",
+                "status": "pending",
+                "icon": "setup",
+            })
+        elif action == ACTION_RUN_ASSAY:
+            result = info["last_result"]
+            timeline.append({
+                "title": "Run Assay",
+                "description": _result_description(result),
+                "time": f"{info['elapsed_minutes']:.0f} min",
+                "status": result,
+                "icon": "run",
+            })
+        elif action in (ACTION_ORDER_TIPS, ACTION_ORDER_BUFFER, ACTION_ORDER_POLYMERASE):
+            timeline.append({
+                "title": "Order Reagents",
+                "description": _order_label(action),
+                "time": f"{info['elapsed_minutes']:.0f} min",
+                "status": "action",
+                "icon": "order",
+            })
+        elif action == ACTION_FINISH:
+            timeline.append({
+                "title": "Finish",
+                "description": f"Experiment complete — best: {info['best_result']}",
+                "time": f"{info['elapsed_minutes']:.0f} min",
+                "status": info["best_result"] if info["best_result"] in ("success", "partial") else "fail",
+                "icon": "finish",
+            })
+        if done or trunc:
+            break
+    return _build_run_result(env, info, timeline, presets_tried)
+def _build_run_result(env: LabEnv, info: dict, timeline: list[dict], presets_tried: dict[int, str]) -> dict:
+    presets = env.spec.presets
+    spec = env.spec
+    preset_statuses = []
+    for i, p in enumerate(presets):
+        row: dict[str, Any] = {
+            "id": str(i),
+            "status": presets_tried.get(i, "untried"),
+            "label": _preset_label(p),
+        }
+        if "temp" in p:
+            row["temp"] = p["temp"]
+            row["cycles"] = p["cycles"]
+            row["ratio"] = p["ratio"]
+        if "coating_hr" in p:
+            row["coating_hr"] = p["coating_hr"]
+            row["block"] = p.get("block", "")
+        preset_statuses.append(row)
+    return {
+        "state": {
+            "elapsed_minutes": info["elapsed_minutes"],
+            "remaining_budget": info["remaining_budget"],
+            "inventory": info["inventory"],
+            "best_result": info["best_result"],
+            "max_time": getattr(spec, "max_minutes", 240),
+            "max_budget": getattr(spec, "initial_budget", 500),
+        },
+        "timeline": timeline,
+        "presets": preset_statuses,
+        "reward": float(INITIAL_BUDGET - info["remaining_budget"]),
+        "best_result": info["best_result"],
+    }
+def _result_description(result: str) -> str:
+    return {"success": "Success!", "partial": "Partial — low yield", "fail": "Failed — no amplification"}.get(result, result)
+def _order_label(action: int) -> str:
+    return {ACTION_ORDER_TIPS: "+5 tips", ACTION_ORDER_BUFFER: "+5 buffer", ACTION_ORDER_POLYMERASE: "+3 polymerase"}.get(action, "reagents")
+def _preset_label(preset: dict) -> str:
+    """Human-readable preset description for timeline/UI (PCR or ELISA)."""
+    if "coating_hr" in preset:
+        return f"{preset['coating_hr']}hr coat / {preset['temp']}°C / {preset.get('block', '')}"
+    return f"{preset.get('temp', '?')}°C / {preset.get('cycles', '?')} cyc / {preset.get('ratio', '?')}"
+def _trace_research_episode(env: LabEnv, seed: int, max_trials: int = 5) -> dict:
+    """Run Research LLM agent episode and build timeline (Research → Hypothesis → Experiment → Learn). PCR only."""
+    presets = env.spec.presets
+    if not HAS_RESEARCH_AGENT:
+        return _build_run_result(env, env._info(), [{"title": "Research agent unavailable", "description": "Install openai and set OPENAI_API_KEY", "time": "0 min", "status": "fail", "icon": "run"}], {})
+    if env.spec.name != "pcr":
+        return _build_run_result(env, env._info(), [{"title": "Research agent", "description": "Research agent is only supported for PCR workflow.", "time": "0 min", "status": "fail", "icon": "run"}], {})
+    agent = ResearchLLMAgent(max_trials=max_trials)
+    callback: list[dict] = []
+    result = agent.run_episode(env, seed=seed, episode_callback=callback)
+    info = env._info()
+    timeline: list[dict] = []
+    presets_tried: dict[int, str] = {}
+    for step in callback:
+        research = (step.get("research") or "")[:200]
+        if len(step.get("research") or "") > 200:
+            research += "..."
+        timeline.append({
+            "title": "Research",
+            "description": research or "Literature search for PCR protocol",
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": "action",
+            "icon": "research",
+        })
+        hyp = step.get("hypothesis") or {}
+        timeline.append({
+            "title": "Hypothesis",
+            "description": f"temp={hyp.get('temp', '?')}°C, cycles={hyp.get('cycles', '?')}, ratio={hyp.get('ratio', '?')}",
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": "pending",
+            "icon": "hypothesis",
+        })
+        params = step.get("params_used") or {}
+        res = step.get("result", "fail")
+        timeline.append({
+            "title": "Run Assay",
+            "description": _result_description(res),
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": res,
+            "icon": "run",
+        })
+        for i, p in enumerate(presets):
+            if p.get("temp") == params.get("temp") and p.get("cycles") == params.get("cycles") and p.get("ratio") == params.get("ratio"):
+                presets_tried[i] = res
+                break
+        timeline.append({
+            "title": "Learn",
+            "description": f"temp_range={agent.knowledge.get('temp_range', [])}, cycle_range={agent.knowledge.get('cycle_range', [])}",
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": "action",
+            "icon": "learn",
+        })
+    return _build_run_result(env, info, timeline, presets_tried)
+def _protocol_dict_label(protocol: dict) -> str:
+    """Human-readable label for a protocol dict (PCR or ELISA)."""
+    if "coating_hr" in protocol:
+        return f"{protocol.get('coating_hr', '?')}hr / {protocol.get('temp', '?')}°C / {protocol.get('block', '?')}"
+    return f"{protocol.get('temp', '?')}°C / {protocol.get('cycles', '?')} cyc / {protocol.get('ratio', '?')}"
+def _trace_research_generate_episode(env: LabEnv, seed: int, max_trials: int = 6) -> dict:
+    """Run Research & Generate agent (research → generate any protocol → run → learn). Works for PCR, ELISA, etc."""
+    if not HAS_RESEARCH_GENERATE_AGENT:
+        return _build_run_result(
+            env, env._info(),
+            [{"title": "Research & Generate agent unavailable", "description": "Install openai and set OPENAI_API_KEY", "time": "0 min", "status": "fail", "icon": "run"}],
+            {},
+        )
+    if env.spec.evaluate_custom_protocol is None:
+        return _build_run_result(
+            env, env._info(),
+            [{"title": "Research & Generate", "description": "This workflow does not support custom protocols.", "time": "0 min", "status": "fail", "icon": "run"}],
+            {},
+        )
+    agent = ResearchGenerateAgent(max_trials=max_trials)
+    agent.run_episode(env, seed=seed, verbose=False)
+    info = env._info()
+    timeline: list[dict] = []
+    preset_statuses: list[dict[str, Any]] = []
+    for i, entry in enumerate(agent.feedback_history):
+        protocol = entry.get("protocol", {})
+        result = entry.get("result", "fail")
+        label = _protocol_dict_label(protocol)
+        timeline.append({
+            "title": "Research & Generate",
+            "description": f"Generated: {label}",
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": "pending",
+            "icon": "research",
+        })
+        timeline.append({
+            "title": "Run Assay",
+            "description": _result_description(result),
+            "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+            "status": result,
+            "icon": "run",
+        })
+        row: dict[str, Any] = {"id": str(i), "status": result, "label": label}
+        if "temp" in protocol:
+            row["temp"] = protocol.get("temp")
+            row["cycles"] = protocol.get("cycles")
+            row["ratio"] = protocol.get("ratio", "")
+        if "coating_hr" in protocol:
+            row["coating_hr"] = protocol.get("coating_hr")
+            row["block"] = protocol.get("block", "")
+        preset_statuses.append(row)
+    timeline.append({
+        "title": "Finish",
+        "description": f"Best result: {info.get('best_result', 'none')}",
+        "time": f"{info.get('elapsed_minutes', 0):.0f} min",
+        "status": info["best_result"] if info["best_result"] in ("success", "partial") else "fail",
+        "icon": "finish",
+    })
+    return {
+        "state": {
+            "elapsed_minutes": info["elapsed_minutes"],
+            "remaining_budget": info["remaining_budget"],
+            "inventory": info["inventory"],
+            "best_result": info["best_result"],
+            "max_time": getattr(env.spec, "max_minutes", 240),
+            "max_budget": getattr(env.spec, "initial_budget", 500),
+        },
+        "timeline": timeline,
+        "presets": preset_statuses,
+        "reward": float(INITIAL_BUDGET - info["remaining_budget"]),
+        "best_result": info["best_result"],
+    }
+# ──────────────────────────────────────────────
+# Training endpoint (SSE stream)
+# ──────────────────────────────────────────────
+@app.post("/api/training/start")
+async def training_start(req: TrainRequest):
+    global rl_agent, _trained_agents
+    def generate():
+        global rl_agent, _trained_agents
+        spec = get_spec_for_workflow(req.workflow_id)
+        agent = ReinforceAgent(lr=req.lr, max_trials=req.max_trials, spec=spec)
+        train_env = LabEnv(spec=spec)
+        window_rewards: list[float] = []
+        window_successes: list[float] = []
+        chart_data: list[dict] = []
+        log_interval = max(req.episodes // 40, 10)
+        for ep in range(1, req.episodes + 1):
+            result = agent.run_episode(train_env, seed=42 + ep, train=True)
+            window_rewards.append(result["reward"])
+            window_successes.append(float(result["success"]))
+            if ep % log_interval == 0 or ep == req.episodes:
+                avg_reward = sum(window_rewards) / len(window_rewards)
+                avg_success = sum(window_successes) / len(window_successes) * 100
+                chart_data.append({
+                    "episode": ep,
+                    "reward": round(avg_reward, 2),
+                    "successRate": round(avg_success, 1),
+                })
+                progress = round(ep / req.episodes * 100)
+                event = {
+                    "type": "progress",
+                    "episode": ep,
+                    "total": req.episodes,
+                    "progress": progress,
+                    "reward": round(avg_reward, 2),
+                    "successRate": round(avg_success, 1),
+                    "chartData": chart_data,
+                }
+                yield f"data: {json.dumps(event)}\n\n"
+                window_rewards.clear()
+                window_successes.clear()
+        rl_agent = agent
+        _trained_agents[req.workflow_id] = agent
+        eval_seed = 999_999
+        rl_results = [agent.run_episode(train_env, seed=eval_seed + i, train=False) for i in range(req.eval_episodes)]
+        naive = NaiveAgent(num_trials=3, seed=0)
+        naive_results = []
+        for i in range(req.eval_episodes):
+            obs, info = train_env.reset(seed=eval_seed + i)
+            naive.reset()
+            total_r = 0.0
+            while True:
+                a = naive.select_action(obs)
+                obs, r, d, t, info = train_env.step(a)
+                total_r += r
+                if d or t:
+                    break
+            naive_results.append({"reward": total_r, "success": info["best_result"] == "success",
+                                   "partial": info["best_result"] == "partial",
+                                   "minutes": info["elapsed_minutes"],
+                                   "cost": 500.0 - info["remaining_budget"]})
+        train_env.close()
+        n_rl = len(rl_results)
+        n_nv = len(naive_results)
+        def agg(res, n):
+            return {
+                "reward": round(sum(r["reward"] for r in res) / n, 1),
+                "success": round(sum(r["success"] for r in res) / n * 100, 1),
+                "partial": round(sum(r["partial"] for r in res) / n * 100, 1),
+                "minutes": round(sum(r["minutes"] for r in res) / n, 0),
+                "cost": round(sum(r["cost"] for r in res) / n, 1),
+            }
+        rl_s = agg(rl_results, n_rl)
+        nv_s = agg(naive_results, n_nv)
+        def imp(rl_v, nv_v):
+            if nv_v == 0:
+                return None
+            return round((rl_v - nv_v) / abs(nv_v) * 100)
+        comparison = [
+            {"metric": "Avg Reward", "reinforce": rl_s["reward"], "baseline": nv_s["reward"], "improvement": imp(rl_s["reward"], nv_s["reward"]), "unit": ""},
+            {"metric": "Success Rate", "reinforce": rl_s["success"], "baseline": nv_s["success"], "improvement": imp(rl_s["success"], nv_s["success"]), "unit": "%"},
+            {"metric": "Partial Rate", "reinforce": rl_s["partial"], "baseline": nv_s["partial"], "improvement": imp(rl_s["partial"], nv_s["partial"]), "unit": "%"},
+            {"metric": "Avg Time", "reinforce": rl_s["minutes"], "baseline": nv_s["minutes"], "improvement": imp(nv_s["minutes"], rl_s["minutes"]), "unit": "min"},
+            {"metric": "Avg Cost", "reinforce": rl_s["cost"], "baseline": nv_s["cost"], "improvement": imp(nv_s["cost"], rl_s["cost"]), "unit": "$"},
+        ]
+        final_event = {
+            "type": "done",
+            "chartData": chart_data,
+            "comparison": comparison,
+        }
+        yield f"data: {json.dumps(final_event)}\n\n"
+    return StreamingResponse(generate(), media_type="text/event-stream")
+# ──────────────────────────────────────────────
+# Run endpoints
+# ──────────────────────────────────────────────
+@app.post("/api/run/ai")
+async def run_ai(req: RunRequest):
+    global rl_agent, _trained_agents
+    env = _get_env(req.workflow_id)
+    agent = _trained_agents.get(req.workflow_id) or rl_agent
+    if agent is None:
+        spec = get_spec_for_workflow(req.workflow_id)
+        agent = ReinforceAgent(max_trials=4, spec=spec)
+        rl_agent = agent
+        _trained_agents[req.workflow_id] = agent
+    return _trace_episode(env, agent, seed=req.seed)
+@app.post("/api/run/naive")
+async def run_naive(req: RunRequest):
+    env = _get_env(req.workflow_id)
+    agent = NaiveAgent(num_trials=3, seed=req.seed)
+    return _trace_naive_episode(env, agent, seed=req.seed)
+@app.post("/api/run/research")
+async def run_research(req: RunRequest):
+    """Run Research LLM agent (research → hypothesize → experiment → learn). PCR workflow only."""
+    env = _get_env(req.workflow_id)
+    return _trace_research_episode(env, seed=req.seed, max_trials=5)
+@app.post("/api/run/research-generate")
+async def run_research_generate(req: RunRequest):
+    """Run Research & Generate agent (research → generate any protocol → run → learn). PCR, ELISA, any spec with evaluate_custom_protocol."""
+    env = _get_env(req.workflow_id)
+    return _trace_research_generate_episode(env, seed=req.seed, max_trials=6)
+# ──────────────────────────────────────────────
+# Step-by-step endpoint
+# ──────────────────────────────────────────────
+@app.post("/api/env/reset")
+async def env_reset(req: RunRequest):
+    env = _get_env(req.workflow_id)
+    obs, info = env.reset(seed=req.seed)
+    return _env_state_dict(env)
+@app.post("/api/env/step")
+async def env_step(req: StepRequest):
+    env = _get_env(req.workflow_id)
+    obs, reward, terminated, truncated, info = env.step(req.action)
+    return {
+        **_env_state_dict(env),
+        "reward": float(reward),
+        "terminated": terminated,
+        "truncated": truncated,
+    }
+# ──────────────────────────────────────────────
+# Stats endpoint
+# ──────────────────────────────────────────────
+@app.get("/api/stats")
+async def get_stats():
+    n_runs = len(run_history)
+    if n_runs == 0:
+        return {
+            "active_workflows": 1,
+            "total_experiments": 0,
+            "success_rate": "—",
+            "budget_spent": "$0",
+        }
+    successes = sum(1 for r in run_history if r.get("best_result") == "success")
+    return {
+        "active_workflows": 1,
+        "total_experiments": n_runs,
+        "success_rate": f"{successes / n_runs:.0%}",
+        "budget_spent": f"${sum(r.get('cost', 0) for r in run_history):.0f}",
+    }
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

v0ap/.gitignore ADDED Viewed

	@@ -0,0 +1,10 @@

+# v0 runtime files
+__v0_runtime_loader.js
+__v0_devtools.tsx
+__v0_jsx-dev-runtime.ts
+# Common ignores
+node_modules/
+.next/
+.env*.local
+.DS_Store

v0ap/app/docs/page.tsx ADDED Viewed

	@@ -0,0 +1,87 @@

+import { SidebarTrigger } from "@/components/ui/sidebar"
+import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"
+import { Badge } from "@/components/ui/badge"
+import { BookOpen, Code, FlaskConical, GraduationCap, Lightbulb, Zap } from "lucide-react"
+const docs = [
+  {
+    title: "Getting Started",
+    description: "Learn the basics of SimLab and run your first experiment",
+    icon: Lightbulb,
+    badge: "Beginner",
+  },
+  {
+    title: "Workflow Reference",
+    description: "Complete documentation for all available workflows",
+    icon: FlaskConical,
+    badge: "Reference",
+  },
+  {
+    title: "RL Agent Architecture",
+    description: "Deep dive into the REINFORCE algorithm implementation",
+    icon: GraduationCap,
+    badge: "Advanced",
+  },
+  {
+    title: "API Documentation",
+    description: "REST API endpoints for programmatic access",
+    icon: Code,
+    badge: "Developer",
+  },
+  {
+    title: "Best Practices",
+    description: "Tips for optimizing experiment success rates",
+    icon: Zap,
+    badge: "Guide",
+  },
+  {
+    title: "OpenEnv Integration",
+    description: "Connect SimLab with the OpenEnv ecosystem",
+    icon: BookOpen,
+    badge: "Integration",
+  },
+]
+export default function DocsPage() {
+  return (
+    <div className="flex flex-col min-h-screen">
+      <header className="sticky top-0 z-10 flex h-14 items-center gap-4 border-b bg-background/95 backdrop-blur supports-[backdrop-filter]:bg-background/60 px-6">
+        <SidebarTrigger className="-ml-2" />
+        <div className="flex-1">
+          <h1 className="text-lg font-semibold">Documentation</h1>
+        </div>
+      </header>
+      <div className="flex-1 p-6">
+        <div className="mb-6">
+          <h2 className="text-2xl font-bold tracking-tight">Documentation</h2>
+          <p className="text-muted-foreground">
+            Learn how to use SimLab effectively
+          </p>
+        </div>
+        <div className="grid gap-4 sm:grid-cols-2 lg:grid-cols-3">
+          {docs.map((doc) => (
+            <Card
+              key={doc.title}
+              className="border-border/50 hover:border-primary/50 transition-colors cursor-pointer group"
+            >
+              <CardHeader className="pb-3">
+                <div className="flex items-center justify-between">
+                  <div className="flex h-10 w-10 items-center justify-center rounded-lg bg-primary/10 text-primary group-hover:bg-primary/20 transition-colors">
+                    <doc.icon className="h-5 w-5" />
+                  </div>
+                  <Badge variant="secondary" className="text-xs">
+                    {doc.badge}
+                  </Badge>
+                </div>
+                <CardTitle className="text-base mt-3">{doc.title}</CardTitle>
+                <CardDescription className="text-sm">
+                  {doc.description}
+                </CardDescription>
+              </CardHeader>
+            </Card>
+          ))}
+        </div>
+      </div>
+    </div>
+  )
+}

v0ap/app/globals.css ADDED Viewed

	@@ -0,0 +1,137 @@

+@import 'tailwindcss';
+@import 'tw-animate-css';
+@custom-variant dark (&:is(.dark *));
+:root {
+  --background: oklch(0.13 0.02 260);
+  --foreground: oklch(0.95 0.01 260);
+  --card: oklch(0.16 0.02 260);
+  --card-foreground: oklch(0.95 0.01 260);
+  --popover: oklch(0.16 0.02 260);
+  --popover-foreground: oklch(0.95 0.01 260);
+  --primary: oklch(0.65 0.18 230);
+  --primary-foreground: oklch(0.98 0 0);
+  --secondary: oklch(0.22 0.02 260);
+  --secondary-foreground: oklch(0.9 0.01 260);
+  --muted: oklch(0.2 0.02 260);
+  --muted-foreground: oklch(0.6 0.02 260);
+  --accent: oklch(0.22 0.02 260);
+  --accent-foreground: oklch(0.95 0.01 260);
+  --destructive: oklch(0.55 0.22 25);
+  --destructive-foreground: oklch(0.98 0 0);
+  --border: oklch(0.25 0.02 260);
+  --input: oklch(0.22 0.02 260);
+  --ring: oklch(0.65 0.18 230);
+  --success: oklch(0.7 0.19 145);
+  --success-foreground: oklch(0.15 0.05 145);
+  --warning: oklch(0.75 0.18 85);
+  --warning-foreground: oklch(0.2 0.05 85);
+  --chart-1: oklch(0.65 0.18 230);
+  --chart-2: oklch(0.7 0.19 145);
+  --chart-3: oklch(0.75 0.18 85);
+  --chart-4: oklch(0.55 0.22 25);
+  --chart-5: oklch(0.6 0.15 280);
+  --radius: 0.625rem;
+  --sidebar: oklch(0.11 0.02 260);
+  --sidebar-foreground: oklch(0.9 0.01 260);
+  --sidebar-primary: oklch(0.65 0.18 230);
+  --sidebar-primary-foreground: oklch(0.98 0 0);
+  --sidebar-accent: oklch(0.18 0.02 260);
+  --sidebar-accent-foreground: oklch(0.95 0.01 260);
+  --sidebar-border: oklch(0.22 0.02 260);
+  --sidebar-ring: oklch(0.65 0.18 230);
+}
+.dark {
+  --background: oklch(0.13 0.02 260);
+  --foreground: oklch(0.95 0.01 260);
+  --card: oklch(0.16 0.02 260);
+  --card-foreground: oklch(0.95 0.01 260);
+  --popover: oklch(0.16 0.02 260);
+  --popover-foreground: oklch(0.95 0.01 260);
+  --primary: oklch(0.65 0.18 230);
+  --primary-foreground: oklch(0.98 0 0);
+  --secondary: oklch(0.22 0.02 260);
+  --secondary-foreground: oklch(0.9 0.01 260);
+  --muted: oklch(0.2 0.02 260);
+  --muted-foreground: oklch(0.6 0.02 260);
+  --accent: oklch(0.22 0.02 260);
+  --accent-foreground: oklch(0.95 0.01 260);
+  --destructive: oklch(0.55 0.22 25);
+  --destructive-foreground: oklch(0.98 0 0);
+  --border: oklch(0.25 0.02 260);
+  --input: oklch(0.22 0.02 260);
+  --ring: oklch(0.65 0.18 230);
+  --success: oklch(0.7 0.19 145);
+  --success-foreground: oklch(0.15 0.05 145);
+  --warning: oklch(0.75 0.18 85);
+  --warning-foreground: oklch(0.2 0.05 85);
+  --chart-1: oklch(0.65 0.18 230);
+  --chart-2: oklch(0.7 0.19 145);
+  --chart-3: oklch(0.75 0.18 85);
+  --chart-4: oklch(0.55 0.22 25);
+  --chart-5: oklch(0.6 0.15 280);
+  --sidebar: oklch(0.11 0.02 260);
+  --sidebar-foreground: oklch(0.9 0.01 260);
+  --sidebar-primary: oklch(0.65 0.18 230);
+  --sidebar-primary-foreground: oklch(0.98 0 0);
+  --sidebar-accent: oklch(0.18 0.02 260);
+  --sidebar-accent-foreground: oklch(0.95 0.01 260);
+  --sidebar-border: oklch(0.22 0.02 260);
+  --sidebar-ring: oklch(0.65 0.18 230);
+}
+@theme inline {
+  --font-sans: 'Geist', 'Geist Fallback';
+  --font-mono: 'Geist Mono', 'Geist Mono Fallback';
+  --color-background: var(--background);
+  --color-foreground: var(--foreground);
+  --color-card: var(--card);
+  --color-card-foreground: var(--card-foreground);
+  --color-popover: var(--popover);
+  --color-popover-foreground: var(--popover-foreground);
+  --color-primary: var(--primary);
+  --color-primary-foreground: var(--primary-foreground);
+  --color-secondary: var(--secondary);
+  --color-secondary-foreground: var(--secondary-foreground);
+  --color-muted: var(--muted);
+  --color-muted-foreground: var(--muted-foreground);
+  --color-accent: var(--accent);
+  --color-accent-foreground: var(--accent-foreground);
+  --color-destructive: var(--destructive);
+  --color-destructive-foreground: var(--destructive-foreground);
+  --color-border: var(--border);
+  --color-input: var(--input);
+  --color-ring: var(--ring);
+  --color-success: var(--success);
+  --color-success-foreground: var(--success-foreground);
+  --color-warning: var(--warning);
+  --color-warning-foreground: var(--warning-foreground);
+  --color-chart-1: var(--chart-1);
+  --color-chart-2: var(--chart-2);
+  --color-chart-3: var(--chart-3);
+  --color-chart-4: var(--chart-4);
+  --color-chart-5: var(--chart-5);
+  --radius-sm: calc(var(--radius) - 4px);
+  --radius-md: calc(var(--radius) - 2px);
+  --radius-lg: var(--radius);
+  --radius-xl: calc(var(--radius) + 4px);
+  --color-sidebar: var(--sidebar);
+  --color-sidebar-foreground: var(--sidebar-foreground);
+  --color-sidebar-primary: var(--sidebar-primary);
+  --color-sidebar-primary-foreground: var(--sidebar-primary-foreground);
+  --color-sidebar-accent: var(--sidebar-accent);
+  --color-sidebar-accent-foreground: var(--sidebar-accent-foreground);
+  --color-sidebar-border: var(--sidebar-border);
+  --color-sidebar-ring: var(--sidebar-ring);
+}
+@layer base {
+  * {
+    @apply border-border outline-ring/50;
+  }
+  body {
+    @apply bg-background text-foreground;
+  }
+}

v0ap/app/layout.tsx ADDED Viewed

	@@ -0,0 +1,68 @@

+import type { Metadata, Viewport } from 'next'
+import { Geist, Geist_Mono } from 'next/font/google'
+import { Analytics } from '@vercel/analytics/next'
+import './globals.css'
+import { ThemeProvider } from '@/components/theme-provider'
+import { SidebarProvider } from '@/components/ui/sidebar'
+import { AppSidebar } from '@/components/app-sidebar'
+const _geist = Geist({ subsets: ["latin"] });
+const _geistMono = Geist_Mono({ subsets: ["latin"] });
+export const metadata: Metadata = {
+  title: 'SimLab - Lab Automation RL Environment',
+  description: 'AI-powered lab automation environment for optimizing wet-lab experiment workflows',
+  generator: 'v0.app',
+  icons: {
+    icon: [
+      {
+        url: '/icon-light-32x32.png',
+        media: '(prefers-color-scheme: light)',
+      },
+      {
+        url: '/icon-dark-32x32.png',
+        media: '(prefers-color-scheme: dark)',
+      },
+      {
+        url: '/icon.svg',
+        type: 'image/svg+xml',
+      },
+    ],
+    apple: '/apple-icon.png',
+  },
+}
+export const viewport: Viewport = {
+  themeColor: [
+    { media: '(prefers-color-scheme: light)', color: '#ffffff' },
+    { media: '(prefers-color-scheme: dark)', color: '#0f172a' },
+  ],
+}
+export default function RootLayout({
+  children,
+}: Readonly<{
+  children: React.ReactNode
+}>) {
+  return (
+    <html lang="en" suppressHydrationWarning>
+      <body className="font-sans antialiased">
+        <ThemeProvider
+          attribute="class"
+          defaultTheme="dark"
+          enableSystem
+          disableTransitionOnChange
+        >
+          <SidebarProvider>
+            <AppSidebar />
+            <main className="flex-1 overflow-auto">
+              {children}
+            </main>
+          </SidebarProvider>
+        </ThemeProvider>
+        <Analytics />
+      </body>
+    </html>
+  )
+}

v0ap/app/page.tsx ADDED Viewed

	@@ -0,0 +1,24 @@

+import { StatsCards } from "@/components/dashboard/stats-cards"
+import { PerformanceChart } from "@/components/dashboard/performance-chart"
+import { RecentExperiments } from "@/components/dashboard/recent-experiments"
+import { SidebarTrigger } from "@/components/ui/sidebar"
+export default function DashboardPage() {
+  return (
+    <div className="flex flex-col min-h-screen">
+      <header className="sticky top-0 z-10 flex h-14 items-center gap-4 border-b bg-background/95 backdrop-blur supports-[backdrop-filter]:bg-background/60 px-6">
+        <SidebarTrigger className="-ml-2" />
+        <div className="flex-1">
+          <h1 className="text-lg font-semibold">Dashboard</h1>
+        </div>
+      </header>
+      <div className="flex-1 p-6 space-y-6">
+        <StatsCards />
+        <div className="grid gap-6 lg:grid-cols-2">
+          <PerformanceChart />
+          <RecentExperiments />
+        </div>
+      </div>
+    </div>
+  )
+}

v0ap/app/training/page.tsx ADDED Viewed

	@@ -0,0 +1,114 @@

+"use client"
+import { useState, useCallback } from "react"
+import { TrainingControls } from "@/components/training/training-controls"
+import { TrainingChart } from "@/components/training/training-chart"
+import { ComparisonTable } from "@/components/training/comparison-table"
+import { SidebarTrigger } from "@/components/ui/sidebar"
+interface ChartPoint {
+  episode: number
+  reward: number
+  successRate: number
+}
+interface ComparisonRow {
+  metric: string
+  reinforce: number
+  baseline: number
+  improvement: number | null
+  unit?: string
+}
+export default function TrainingPage() {
+  const [isTraining, setIsTraining] = useState(false)
+  const [progress, setProgress] = useState(0)
+  const [currentEpisode, setCurrentEpisode] = useState(0)
+  const [totalEpisodes, setTotalEpisodes] = useState(0)
+  const [chartData, setChartData] = useState<ChartPoint[]>([])
+  const [comparison, setComparison] = useState<ComparisonRow[]>([])
+  const [isDone, setIsDone] = useState(false)
+  const startTraining = useCallback(
+    async (episodes: number, lr: number, maxTrials: number, workflowId: string) => {
+      setIsTraining(true)
+      setIsDone(false)
+      setProgress(0)
+      setChartData([])
+      setComparison([])
+      setTotalEpisodes(episodes)
+      const res = await fetch("/api/training/start", {
+        method: "POST",
+        headers: { "Content-Type": "application/json" },
+        body: JSON.stringify({ episodes, lr, max_trials: maxTrials, workflow_id: workflowId }),
+      })
+      const reader = res.body?.getReader()
+      const decoder = new TextDecoder()
+      let buffer = ""
+      if (!reader) return
+      while (true) {
+        const { done, value } = await reader.read()
+        if (done) break
+        buffer += decoder.decode(value, { stream: true })
+        const lines = buffer.split("\n")
+        buffer = lines.pop() || ""
+        for (const line of lines) {
+          if (!line.startsWith("data: ")) continue
+          try {
+            const event = JSON.parse(line.slice(6))
+            if (event.type === "progress") {
+              setProgress(event.progress)
+              setCurrentEpisode(event.episode)
+              setChartData(event.chartData)
+            } else if (event.type === "done") {
+              setChartData(event.chartData)
+              setComparison(event.comparison)
+              setIsDone(true)
+            }
+          } catch {}
+        }
+      }
+      setIsTraining(false)
+    },
+    []
+  )
+  return (
+    <div className="flex flex-col min-h-screen">
+      <header className="sticky top-0 z-10 flex h-14 items-center gap-4 border-b bg-background/95 backdrop-blur supports-[backdrop-filter]:bg-background/60 px-6">
+        <SidebarTrigger className="-ml-2" />
+        <div className="flex-1">
+          <h1 className="text-lg font-semibold">Training</h1>
+        </div>
+      </header>
+      <div className="flex-1 p-6 space-y-6">
+        <div className="mb-6">
+          <h2 className="text-2xl font-bold tracking-tight">Agent Training</h2>
+          <p className="text-muted-foreground">
+            Train the RL agent to optimize experiment workflows
+          </p>
+        </div>
+        <div className="grid gap-6 lg:grid-cols-[320px_1fr]">
+          <TrainingControls
+            isTraining={isTraining}
+            progress={progress}
+            currentEpisode={currentEpisode}
+            totalEpisodes={totalEpisodes}
+            onStartTraining={startTraining}
+          />
+          <div className="space-y-6">
+            <TrainingChart data={chartData} />
+            {isDone && <ComparisonTable data={comparison} />}
+          </div>
+        </div>
+      </div>
+    </div>
+  )
+}

v0ap/app/workflows/[id]/page.tsx ADDED Viewed

	@@ -0,0 +1,180 @@

+"use client"
+import { useState, use } from "react"
+import { EnvironmentState } from "@/components/workflow-run/environment-state"
+import { ExperimentTimeline } from "@/components/workflow-run/experiment-timeline"
+import { ProtocolSelection } from "@/components/workflow-run/protocol-selection"
+import { SidebarTrigger } from "@/components/ui/sidebar"
+const workflowNames: Record<string, string> = {
+  "pcr-amplification": "PCR Amplification",
+  "elisa-readout": "ELISA Readout",
+  "dna-extraction": "DNA Extraction",
+  "rna-sequencing-prep": "RNA Sequencing Prep",
+  "gel-electrophoresis": "Gel Electrophoresis",
+  "cell-culture-passage": "Cell Culture Passage",
+}
+interface EnvState {
+  elapsed_minutes: number
+  remaining_budget: number
+  inventory: Record<string, number>
+  best_result: string
+  max_time: number
+  max_budget: number
+}
+interface TimelineEntry {
+  title: string
+  description: string
+  time: string
+  status: string
+  icon: string
+}
+interface PresetInfo {
+  id: string
+  temp: number
+  cycles: number
+  ratio: string
+  status: string
+}
+export default function WorkflowRunPage({
+  params,
+}: {
+  params: Promise<{ id: string }>
+}) {
+  const { id } = use(params)
+  const workflowName = workflowNames[id] || "Unknown Workflow"
+  const [envState, setEnvState] = useState<EnvState | null>(null)
+  const [timeline, setTimeline] = useState<TimelineEntry[]>([])
+  const [presets, setPresets] = useState<PresetInfo[]>([])
+  const [isRunning, setIsRunning] = useState(false)
+  const runAI = async () => {
+    setIsRunning(true)
+    setTimeline([])
+    setPresets([])
+    setEnvState(null)
+    const seed = Math.floor(Math.random() * 100000)
+    const res = await fetch("/api/run/ai", {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ seed, workflow_id: id }),
+    })
+    const data = await res.json()
+    setEnvState(data.state)
+    setPresets(data.presets)
+    for (let i = 0; i < data.timeline.length; i++) {
+      await new Promise((r) => setTimeout(r, 400))
+      setTimeline((prev) => [...prev, data.timeline[i]])
+    }
+    setIsRunning(false)
+  }
+  const runNaive = async () => {
+    setIsRunning(true)
+    setTimeline([])
+    setPresets([])
+    setEnvState(null)
+    const seed = Math.floor(Math.random() * 100000)
+    const res = await fetch("/api/run/naive", {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ seed, workflow_id: id }),
+    })
+    const data = await res.json()
+    setEnvState(data.state)
+    setPresets(data.presets)
+    for (let i = 0; i < data.timeline.length; i++) {
+      await new Promise((r) => setTimeout(r, 400))
+      setTimeline((prev) => [...prev, data.timeline[i]])
+    }
+    setIsRunning(false)
+  }
+  const runResearch = async () => {
+    setIsRunning(true)
+    setTimeline([])
+    setPresets([])
+    setEnvState(null)
+    const seed = Math.floor(Math.random() * 100000)
+    const res = await fetch("/api/run/research", {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ seed, workflow_id: id }),
+    })
+    const data = await res.json()
+    setEnvState(data.state)
+    setPresets(data.presets)
+    for (let i = 0; i < data.timeline.length; i++) {
+      await new Promise((r) => setTimeout(r, 400))
+      setTimeline((prev) => [...prev, data.timeline[i]])
+    }
+    setIsRunning(false)
+  }
+  const runResearchGenerate = async () => {
+    setIsRunning(true)
+    setTimeline([])
+    setPresets([])
+    setEnvState(null)
+    const seed = Math.floor(Math.random() * 100000)
+    const res = await fetch("/api/run/research-generate", {
+      method: "POST",
+      headers: { "Content-Type": "application/json" },
+      body: JSON.stringify({ seed, workflow_id: id }),
+    })
+    const data = await res.json()
+    setEnvState(data.state)
+    setPresets(data.presets)
+    for (let i = 0; i < data.timeline.length; i++) {
+      await new Promise((r) => setTimeout(r, 400))
+      setTimeline((prev) => [...prev, data.timeline[i]])
+    }
+    setIsRunning(false)
+  }
+  return (
+    <div className="flex flex-col min-h-screen">
+      <header className="sticky top-0 z-10 flex h-14 items-center gap-4 border-b bg-background/95 backdrop-blur supports-[backdrop-filter]:bg-background/60 px-6">
+        <SidebarTrigger className="-ml-2" />
+        <div className="flex-1">
+          <h1 className="text-lg font-semibold">{workflowName}</h1>
+        </div>
+      </header>
+      <div className="flex-1 p-6">
+        <div className="grid gap-6 lg:grid-cols-[280px_1fr_320px] h-[calc(100vh-8rem)]">
+          <EnvironmentState data={envState} />
+          <ExperimentTimeline entries={timeline} />
+          <ProtocolSelection
+            presets={presets}
+            isRunning={isRunning}
+            onRunAI={runAI}
+            onRunNaive={runNaive}
+            onRunResearch={runResearch}
+            onRunResearchGenerate={runResearchGenerate}
+          />
+        </div>
+      </div>
+    </div>
+  )
+}

v0ap/app/workflows/page.tsx ADDED Viewed

	@@ -0,0 +1,24 @@

+import { WorkflowGrid } from "@/components/workflows/workflow-grid"
+import { SidebarTrigger } from "@/components/ui/sidebar"
+export default function WorkflowsPage() {
+  return (
+    <div className="flex flex-col min-h-screen">
+      <header className="sticky top-0 z-10 flex h-14 items-center gap-4 border-b bg-background/95 backdrop-blur supports-[backdrop-filter]:bg-background/60 px-6">
+        <SidebarTrigger className="-ml-2" />
+        <div className="flex-1">
+          <h1 className="text-lg font-semibold">Workflows</h1>
+        </div>
+      </header>
+      <div className="flex-1 p-6">
+        <div className="mb-6">
+          <h2 className="text-2xl font-bold tracking-tight">Available Workflows</h2>
+          <p className="text-muted-foreground">
+            Select a workflow to configure and run with the AI agent
+          </p>
+        </div>
+        <WorkflowGrid />
+      </div>
+    </div>
+  )
+}

v0ap/components.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "$schema": "https://ui.shadcn.com/schema.json",
+  "style": "new-york",
+  "rsc": true,
+  "tsx": true,
+  "tailwind": {
+    "config": "",
+    "css": "app/globals.css",
+    "baseColor": "neutral",
+    "cssVariables": true,
+    "prefix": ""
+  },
+  "aliases": {
+    "components": "@/components",
+    "utils": "@/lib/utils",
+    "ui": "@/components/ui",
+    "lib": "@/lib",
+    "hooks": "@/hooks"
+  },
+  "iconLibrary": "lucide"
+}

v0ap/components/app-sidebar.tsx ADDED Viewed

	@@ -0,0 +1,91 @@

+"use client"
+import Link from "next/link"
+import { usePathname } from "next/navigation"
+import {
+  LayoutDashboard,
+  FlaskConical,
+  GraduationCap,
+  FileText,
+  Moon,
+  Sun,
+  Beaker,
+} from "lucide-react"
+import {
+  Sidebar,
+  SidebarContent,
+  SidebarFooter,
+  SidebarGroup,
+  SidebarGroupContent,
+  SidebarHeader,
+  SidebarMenu,
+  SidebarMenuButton,
+  SidebarMenuItem,
+} from "@/components/ui/sidebar"
+import { Button } from "@/components/ui/button"
+import { useTheme } from "@/components/theme-provider"
+const navItems = [
+  { title: "Dashboard", href: "/", icon: LayoutDashboard },
+  { title: "Workflows", href: "/workflows", icon: FlaskConical },
+  { title: "Training", href: "/training", icon: GraduationCap },
+  { title: "Docs", href: "/docs", icon: FileText },
+]
+export function AppSidebar() {
+  const pathname = usePathname()
+  const { theme, setTheme } = useTheme()
+  return (
+    <Sidebar collapsible="icon">
+      <SidebarHeader className="p-4">
+        <Link href="/" className="flex items-center gap-3 group-data-[collapsible=icon]:justify-center">
+          <div className="flex h-9 w-9 items-center justify-center rounded-lg bg-primary/10 text-primary">
+            <Beaker className="h-5 w-5" />
+          </div>
+          <span className="text-lg font-semibold tracking-tight group-data-[collapsible=icon]:hidden">
+            SimLab
+          </span>
+        </Link>
+      </SidebarHeader>
+      <SidebarContent>
+        <SidebarGroup>
+          <SidebarGroupContent>
+            <SidebarMenu>
+              {navItems.map((item) => {
+                const isActive = pathname === item.href ||
+                  (item.href !== "/" && pathname.startsWith(item.href))
+                return (
+                  <SidebarMenuItem key={item.title}>
+                    <SidebarMenuButton asChild isActive={isActive} tooltip={item.title}>
+                      <Link href={item.href}>
+                        <item.icon className="h-4 w-4" />
+                        <span>{item.title}</span>
+                      </Link>
+                    </SidebarMenuButton>
+                  </SidebarMenuItem>
+                )
+              })}
+            </SidebarMenu>
+          </SidebarGroupContent>
+        </SidebarGroup>
+      </SidebarContent>
+      <SidebarFooter className="p-4">
+        <Button
+          variant="ghost"
+          size="icon"
+          onClick={() => setTheme(theme === "dark" ? "light" : "dark")}
+          className="h-8 w-8 group-data-[collapsible=icon]:mx-auto"
+        >
+          <Sun className="h-4 w-4 rotate-0 scale-100 transition-all dark:-rotate-90 dark:scale-0" />
+          <Moon className="absolute h-4 w-4 rotate-90 scale-0 transition-all dark:rotate-0 dark:scale-100" />
+          <span className="sr-only">Toggle theme</span>
+        </Button>
+        <div className="mt-2 flex items-center gap-2 text-xs text-muted-foreground group-data-[collapsible=icon]:hidden">
+          <span className="text-[10px] px-1.5 py-0.5 rounded bg-muted">Powered by OpenEnv</span>
+        </div>
+      </SidebarFooter>
+    </Sidebar>
+  )
+}

v0ap/components/dashboard/performance-chart.tsx ADDED Viewed

	@@ -0,0 +1,88 @@

+"use client"
+import {
+  Area,
+  AreaChart,
+  CartesianGrid,
+  ResponsiveContainer,
+  Tooltip,
+  XAxis,
+  YAxis,
+} from "recharts"
+import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"
+import { ChartContainer, ChartTooltipContent } from "@/components/ui/chart"
+const performanceData = [
+  { episode: 0, successRate: 23 },
+  { episode: 50, successRate: 31 },
+  { episode: 100, successRate: 42 },
+  { episode: 150, successRate: 48 },
+  { episode: 200, successRate: 56 },
+  { episode: 250, successRate: 61 },
+  { episode: 300, successRate: 67 },
+  { episode: 350, successRate: 72 },
+  { episode: 400, successRate: 75 },
+  { episode: 450, successRate: 79 },
+  { episode: 500, successRate: 82 },
+  { episode: 550, successRate: 84 },
+  { episode: 600, successRate: 85 },
+  { episode: 650, successRate: 87 },
+  { episode: 700, successRate: 87.3 },
+]
+const chartConfig = {
+  successRate: {
+    label: "Success Rate",
+    color: "var(--color-success)",
+  },
+}
+export function PerformanceChart() {
+  return (
+    <Card className="border-border/50">
+      <CardHeader>
+        <CardTitle>Agent Performance Over Time</CardTitle>
+        <CardDescription>
+          Success rate trending upward as the RL agent learns
+        </CardDescription>
+      </CardHeader>
+      <CardContent>
+        <ChartContainer config={chartConfig} className="h-[300px] w-full">
+          <ResponsiveContainer width="100%" height="100%">
+            <AreaChart data={performanceData}>
+              <defs>
+                <linearGradient id="successGradient" x1="0" y1="0" x2="0" y2="1">
+                  <stop offset="0%" stopColor="var(--color-success)" stopOpacity={0.3} />
+                  <stop offset="100%" stopColor="var(--color-success)" stopOpacity={0.05} />
+                </linearGradient>
+              </defs>
+              <CartesianGrid strokeDasharray="3 3" className="stroke-border/50" />
+              <XAxis
+                dataKey="episode"
+                tickLine={false}
+                axisLine={false}
+                className="text-xs fill-muted-foreground"
+                tickFormatter={(value) => `Ep ${value}`}
+              />
+              <YAxis
+                tickLine={false}
+                axisLine={false}
+                className="text-xs fill-muted-foreground"
+                tickFormatter={(value) => `${value}%`}
+                domain={[0, 100]}
+              />
+              <Tooltip content={<ChartTooltipContent />} />
+              <Area
+                type="monotone"
+                dataKey="successRate"
+                stroke="var(--color-success)"
+                strokeWidth={2}
+                fill="url(#successGradient)"
+              />
+            </AreaChart>
+          </ResponsiveContainer>
+        </ChartContainer>
+      </CardContent>
+    </Card>
+  )
+}

v0ap/components/dashboard/recent-experiments.tsx ADDED Viewed

	@@ -0,0 +1,79 @@

+"use client"
+import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"
+import { Badge } from "@/components/ui/badge"
+import {
+  Table,
+  TableBody,
+  TableCell,
+  TableHead,
+  TableHeader,
+  TableRow,
+} from "@/components/ui/table"
+type ExperimentResult = "success" | "partial" | "fail"
+interface Experiment {
+  id: string
+  workflow: string
+  preset: string
+  result: ExperimentResult
+  time: string
+  cost: string
+}
+const recentExperiments: Experiment[] = [
+  { id: "1", workflow: "PCR Amplification", preset: "65°C / 30 cycles", result: "success", time: "42 min", cost: "$23.50" },
+  { id: "2", workflow: "ELISA Readout", preset: "37°C / standard", result: "success", time: "85 min", cost: "$45.00" },
+  { id: "3", workflow: "DNA Extraction", preset: "conservative", result: "partial", time: "38 min", cost: "$18.25" },
+  { id: "4", workflow: "RNA Sequencing Prep", preset: "high-yield", result: "fail", time: "120 min", cost: "$89.00" },
+  { id: "5", workflow: "Gel Electrophoresis", preset: "1% agarose", result: "success", time: "55 min", cost: "$12.00" },
+  { id: "6", workflow: "Cell Culture Passage", preset: "70% confluence", result: "success", time: "25 min", cost: "$8.50" },
+]
+const resultStyles: Record<ExperimentResult, string> = {
+  success: "bg-success/10 text-success border-success/20",
+  partial: "bg-warning/10 text-warning border-warning/20",
+  fail: "bg-destructive/10 text-destructive border-destructive/20",
+}
+export function RecentExperiments() {
+  return (
+    <Card className="border-border/50">
+      <CardHeader>
+        <CardTitle>Recent Experiments</CardTitle>
+        <CardDescription>
+          Latest experiment runs and their results
+        </CardDescription>
+      </CardHeader>
+      <CardContent>
+        <Table>
+          <TableHeader>
+            <TableRow className="border-border/50 hover:bg-transparent">
+              <TableHead className="text-muted-foreground">Workflow</TableHead>
+              <TableHead className="text-muted-foreground">Preset</TableHead>
+              <TableHead className="text-muted-foreground">Result</TableHead>
+              <TableHead className="text-muted-foreground text-right">Time</TableHead>
+              <TableHead className="text-muted-foreground text-right">Cost</TableHead>
+            </TableRow>
+          </TableHeader>
+          <TableBody>
+            {recentExperiments.map((exp) => (
+              <TableRow key={exp.id} className="border-border/50">
+                <TableCell className="font-medium">{exp.workflow}</TableCell>
+                <TableCell className="font-mono text-sm text-muted-foreground">{exp.preset}</TableCell>
+                <TableCell>
+                  <Badge variant="outline" className={resultStyles[exp.result]}>
+                    {exp.result}
+                  </Badge>
+                </TableCell>
+                <TableCell className="text-right font-mono text-sm">{exp.time}</TableCell>
+                <TableCell className="text-right font-mono text-sm">{exp.cost}</TableCell>
+              </TableRow>
+            ))}
+          </TableBody>
+        </Table>
+      </CardContent>
+    </Card>
+  )
+}

v0ap/components/dashboard/stats-cards.tsx ADDED Viewed

	@@ -0,0 +1,62 @@

+"use client"
+import { Activity, FlaskConical, TrendingUp, DollarSign } from "lucide-react"
+import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card"
+const stats = [
+  {
+    title: "Active Workflows",
+    value: "4",
+    description: "Currently running",
+    icon: Activity,
+    trend: "+2 from last hour",
+    color: "text-primary",
+  },
+  {
+    title: "Total Experiments",
+    value: "1,247",
+    description: "All time",
+    icon: FlaskConical,
+    trend: "+89 this week",
+    color: "text-chart-2",
+  },
+  {
+    title: "Success Rate",
+    value: "87.3%",
+    description: "Overall performance",
+    icon: TrendingUp,
+    trend: "+4.2% from baseline",
+    color: "text-success",
+  },
+  {
+    title: "Budget Spent",
+    value: "$12,450",
+    description: "This month",
+    icon: DollarSign,
+    trend: "$3,200 remaining",
+    color: "text-warning",
+  },
+]
+export function StatsCards() {
+  return (
+    <div className="grid gap-4 md:grid-cols-2 lg:grid-cols-4">
+      {stats.map((stat) => (
+        <Card key={stat.title} className="border-border/50">
+          <CardHeader className="flex flex-row items-center justify-between space-y-0 pb-2">
+            <CardTitle className="text-sm font-medium text-muted-foreground">
+              {stat.title}
+            </CardTitle>
+            <stat.icon className={`h-4 w-4 ${stat.color}`} />
+          </CardHeader>
+          <CardContent>
+            <div className="text-2xl font-bold font-mono">{stat.value}</div>
+            <p className="text-xs text-muted-foreground mt-1">
+              {stat.trend}
+            </p>
+          </CardContent>
+        </Card>
+      ))}
+    </div>
+  )
+}

v0ap/components/theme-provider.tsx ADDED Viewed

	@@ -0,0 +1,14 @@

+'use client'
+import * as React from 'react'
+import {
+  ThemeProvider as NextThemesProvider,
+  useTheme as useNextTheme,
+  type ThemeProviderProps,
+} from 'next-themes'
+export function ThemeProvider({ children, ...props }: ThemeProviderProps) {
+  return <NextThemesProvider {...props}>{children}</NextThemesProvider>
+}
+export const useTheme = useNextTheme

v0ap/components/training/comparison-table.tsx ADDED Viewed

	@@ -0,0 +1,116 @@

+"use client"
+import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"
+import { Badge } from "@/components/ui/badge"
+import {
+  Table,
+  TableBody,
+  TableCell,
+  TableHead,
+  TableHeader,
+  TableRow,
+} from "@/components/ui/table"
+import { ArrowUp, ArrowDown, Minus } from "lucide-react"
+interface ComparisonRow {
+  metric: string
+  reinforce: number
+  baseline: number
+  improvement: number | null
+  unit?: string
+}
+interface ComparisonTableProps {
+  data: ComparisonRow[]
+}
+function ImprovementBadge({ value }: { value: number | null }) {
+  if (value === null) {
+    return (
+      <Badge variant="outline" className="bg-muted/50 text-muted-foreground border-muted">
+        <Minus className="h-3 w-3 mr-1" />
+        N/A
+      </Badge>
+    )
+  }
+  const isPositive = value > 0
+  const absValue = Math.abs(value)
+  return (
+    <Badge
+      variant="outline"
+      className={
+        isPositive
+          ? "bg-success/10 text-success border-success/20"
+          : "bg-destructive/10 text-destructive border-destructive/20"
+      }
+    >
+      {isPositive ? (
+        <ArrowUp className="h-3 w-3 mr-1" />
+      ) : (
+        <ArrowDown className="h-3 w-3 mr-1" />
+      )}
+      {absValue}%
+    </Badge>
+  )
+}
+export function ComparisonTable({ data }: ComparisonTableProps) {
+  if (data.length === 0) return null
+  return (
+    <Card className="border-border/50 animate-in fade-in slide-in-from-bottom-4 duration-500">
+      <CardHeader>
+        <CardTitle>Agent Comparison</CardTitle>
+        <CardDescription>
+          REINFORCE Agent vs Naive Baseline — evaluated on 100 episodes
+        </CardDescription>
+      </CardHeader>
+      <CardContent>
+        <Table>
+          <TableHeader>
+            <TableRow className="border-border/50 hover:bg-transparent">
+              <TableHead className="text-muted-foreground">Metric</TableHead>
+              <TableHead className="text-muted-foreground text-right">
+                <div className="flex flex-col items-end">
+                  <span>REINFORCE</span>
+                  <Badge variant="secondary" className="text-[10px] mt-1">Agent</Badge>
+                </div>
+              </TableHead>
+              <TableHead className="text-muted-foreground text-right">
+                <div className="flex flex-col items-end">
+                  <span>Naive</span>
+                  <Badge variant="outline" className="text-[10px] mt-1">Baseline</Badge>
+                </div>
+              </TableHead>
+              <TableHead className="text-muted-foreground text-right">Improvement</TableHead>
+            </TableRow>
+          </TableHeader>
+          <TableBody>
+            {data.map((row) => (
+              <TableRow key={row.metric} className="border-border/50">
+                <TableCell className="font-medium">{row.metric}</TableCell>
+                <TableCell className="text-right font-mono">
+                  {row.unit === "$" && "$"}
+                  {row.reinforce}
+                  {row.unit === "%" && "%"}
+                  {row.unit === "min" && " min"}
+                </TableCell>
+                <TableCell className="text-right font-mono text-muted-foreground">
+                  {row.unit === "$" && "$"}
+                  {row.baseline}
+                  {row.unit === "%" && "%"}
+                  {row.unit === "min" && " min"}
+                </TableCell>
+                <TableCell className="text-right">
+                  <ImprovementBadge value={row.improvement} />
+                </TableCell>
+              </TableRow>
+            ))}
+          </TableBody>
+        </Table>
+      </CardContent>
+    </Card>
+  )
+}

v0ap/components/training/training-chart.tsx ADDED Viewed

	@@ -0,0 +1,119 @@

+"use client"
+import {
+  Line,
+  LineChart,
+  CartesianGrid,
+  ResponsiveContainer,
+  Tooltip,
+  XAxis,
+  YAxis,
+  Legend,
+} from "recharts"
+import { Card, CardContent, CardDescription, CardHeader, CardTitle } from "@/components/ui/card"
+import { ChartContainer, ChartTooltipContent } from "@/components/ui/chart"
+interface ChartPoint {
+  episode: number
+  reward: number
+  successRate: number
+}
+interface TrainingChartProps {
+  data: ChartPoint[]
+}
+const chartConfig = {
+  reward: {
+    label: "Reward",
+    color: "var(--color-primary)",
+  },
+  successRate: {
+    label: "Success Rate",
+    color: "var(--color-success)",
+  },
+}
+export function TrainingChart({ data }: TrainingChartProps) {
+  const hasData = data.length > 0
+  const rewardDomain: [number, number] = hasData
+    ? [
+        Math.min(...data.map((d) => d.reward)) - 5,
+        Math.max(...data.map((d) => d.reward)) + 5,
+      ]
+    : [-3, 4]
+  return (
+    <Card className="border-border/50">
+      <CardHeader>
+        <CardTitle>Training Progress</CardTitle>
+        <CardDescription>
+          {hasData
+            ? `Live reward curve and success rate — ${data[data.length - 1].episode} episodes`
+            : "Start training to see live metrics"}
+        </CardDescription>
+      </CardHeader>
+      <CardContent>
+        <ChartContainer config={chartConfig} className="h-[350px] w-full">
+          {hasData ? (
+            <ResponsiveContainer width="100%" height="100%">
+              <LineChart data={data}>
+                <CartesianGrid strokeDasharray="3 3" className="stroke-border/50" />
+                <XAxis
+                  dataKey="episode"
+                  tickLine={false}
+                  axisLine={false}
+                  className="text-xs fill-muted-foreground"
+                />
+                <YAxis
+                  yAxisId="left"
+                  tickLine={false}
+                  axisLine={false}
+                  className="text-xs fill-muted-foreground"
+                  domain={rewardDomain}
+                  tickFormatter={(value) => value.toFixed(0)}
+                />
+                <YAxis
+                  yAxisId="right"
+                  orientation="right"
+                  tickLine={false}
+                  axisLine={false}
+                  className="text-xs fill-muted-foreground"
+                  domain={[0, 100]}
+                  tickFormatter={(value) => `${value}%`}
+                />
+                <Tooltip content={<ChartTooltipContent />} />
+                <Legend />
+                <Line
+                  yAxisId="left"
+                  type="monotone"
+                  dataKey="reward"
+                  stroke="var(--color-primary)"
+                  strokeWidth={2}
+                  dot={false}
+                  name="Reward"
+                  isAnimationActive={false}
+                />
+                <Line
+                  yAxisId="right"
+                  type="monotone"
+                  dataKey="successRate"
+                  stroke="var(--color-success)"
+                  strokeWidth={2}
+                  dot={false}
+                  name="Success Rate (%)"
+                  isAnimationActive={false}
+                />
+              </LineChart>
+            </ResponsiveContainer>
+          ) : (
+            <div className="h-full flex items-center justify-center text-muted-foreground">
+              Configure parameters and click Start Training
+            </div>
+          )}
+        </ChartContainer>
+      </CardContent>
+    </Card>
+  )
+}

v0ap/components/training/training-controls.tsx ADDED Viewed

	@@ -0,0 +1,141 @@

+"use client"
+import { useState } from "react"
+import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card"
+import { Button } from "@/components/ui/button"
+import { Slider } from "@/components/ui/slider"
+import { Progress } from "@/components/ui/progress"
+import { Play, Square } from "lucide-react"
+import { Field, FieldLabel } from "@/components/ui/field"
+import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from "@/components/ui/select"
+const WORKFLOWS = [
+  { id: "pcr-amplification", label: "PCR Amplification" },
+  { id: "elisa-readout", label: "ELISA Readout" },
+] as const
+interface TrainingControlsProps {
+  isTraining: boolean
+  progress: number
+  currentEpisode: number
+  totalEpisodes: number
+  onStartTraining: (episodes: number, lr: number, maxTrials: number, workflowId: string) => void
+}
+export function TrainingControls({
+  isTraining,
+  progress,
+  currentEpisode,
+  totalEpisodes,
+  onStartTraining,
+}: TrainingControlsProps) {
+  const [episodes, setEpisodes] = useState([1000])
+  const [learningRate, setLearningRate] = useState([0.003])
+  const [maxTrials, setMaxTrials] = useState([4])
+  const [workflowId, setWorkflowId] = useState<string>("pcr-amplification")
+  return (
+    <Card className="border-border/50 h-fit">
+      <CardHeader className="pb-4">
+        <CardTitle className="text-base">Training Configuration</CardTitle>
+      </CardHeader>
+      <CardContent className="space-y-6">
+        <Field>
+          <FieldLabel>Protocol</FieldLabel>
+          <Select value={workflowId} onValueChange={setWorkflowId} disabled={isTraining}>
+            <SelectTrigger className="mt-2">
+              <SelectValue placeholder="Select protocol" />
+            </SelectTrigger>
+            <SelectContent>
+              {WORKFLOWS.map((w) => (
+                <SelectItem key={w.id} value={w.id}>
+                  {w.label}
+                </SelectItem>
+              ))}
+            </SelectContent>
+          </Select>
+        </Field>
+        <Field>
+          <div className="flex items-center justify-between">
+            <FieldLabel>Number of Episodes</FieldLabel>
+            <span className="text-sm font-mono text-muted-foreground">{episodes[0]}</span>
+          </div>
+          <Slider
+            value={episodes}
+            onValueChange={setEpisodes}
+            min={100}
+            max={5000}
+            step={100}
+            className="mt-2"
+            disabled={isTraining}
+          />
+        </Field>
+        <Field>
+          <div className="flex items-center justify-between">
+            <FieldLabel>Learning Rate</FieldLabel>
+            <span className="text-sm font-mono text-muted-foreground">{learningRate[0].toFixed(4)}</span>
+          </div>
+          <Slider
+            value={learningRate}
+            onValueChange={setLearningRate}
+            min={0.0001}
+            max={0.01}
+            step={0.0001}
+            className="mt-2"
+            disabled={isTraining}
+          />
+        </Field>
+        <Field>
+          <div className="flex items-center justify-between">
+            <FieldLabel>Max Trials per Episode</FieldLabel>
+            <span className="text-sm font-mono text-muted-foreground">{maxTrials[0]}</span>
+          </div>
+          <Slider
+            value={maxTrials}
+            onValueChange={setMaxTrials}
+            min={1}
+            max={8}
+            step={1}
+            className="mt-2"
+            disabled={isTraining}
+          />
+        </Field>
+        {isTraining && (
+          <div className="space-y-2 pt-4 border-t border-border/50">
+            <div className="flex items-center justify-between text-sm">
+              <span className="text-muted-foreground">Training Progress</span>
+              <span className="font-mono">{progress}%</span>
+            </div>
+            <Progress value={progress} className="h-2" />
+            <p className="text-xs text-muted-foreground">
+              Episode {currentEpisode} of {totalEpisodes}
+            </p>
+          </div>
+        )}
+        <div className="space-y-2 pt-4 border-t border-border/50">
+          <Button
+            className="w-full"
+            onClick={() => onStartTraining(episodes[0], learningRate[0], maxTrials[0], workflowId)}
+            disabled={isTraining}
+          >
+            {isTraining ? (
+              <>
+                <Square className="h-4 w-4 mr-2 animate-pulse" />
+                Training...
+              </>
+            ) : (
+              <>
+                <Play className="h-4 w-4 mr-2" />
+                Start Training
+              </>
+            )}
+          </Button>
+        </div>
+      </CardContent>
+    </Card>
+  )
+}

v0ap/components/ui/accordion.tsx ADDED Viewed

	@@ -0,0 +1,66 @@

+'use client'
+import * as React from 'react'
+import * as AccordionPrimitive from '@radix-ui/react-accordion'
+import { ChevronDownIcon } from 'lucide-react'
+import { cn } from '@/lib/utils'
+function Accordion({
+  ...props
+}: React.ComponentProps<typeof AccordionPrimitive.Root>) {
+  return <AccordionPrimitive.Root data-slot="accordion" {...props} />
+}
+function AccordionItem({
+  className,
+  ...props
+}: React.ComponentProps<typeof AccordionPrimitive.Item>) {
+  return (
+    <AccordionPrimitive.Item
+      data-slot="accordion-item"
+      className={cn('border-b last:border-b-0', className)}
+      {...props}
+    />
+  )
+}
+function AccordionTrigger({
+  className,
+  children,
+  ...props
+}: React.ComponentProps<typeof AccordionPrimitive.Trigger>) {
+  return (
+    <AccordionPrimitive.Header className="flex">
+      <AccordionPrimitive.Trigger
+        data-slot="accordion-trigger"
+        className={cn(
+          'focus-visible:border-ring focus-visible:ring-ring/50 flex flex-1 items-start justify-between gap-4 rounded-md py-4 text-left text-sm font-medium transition-all outline-none hover:underline focus-visible:ring-[3px] disabled:pointer-events-none disabled:opacity-50 [&[data-state=open]>svg]:rotate-180',
+          className,
+        )}
+        {...props}
+      >
+        {children}
+        <ChevronDownIcon className="text-muted-foreground pointer-events-none size-4 shrink-0 translate-y-0.5 transition-transform duration-200" />
+      </AccordionPrimitive.Trigger>
+    </AccordionPrimitive.Header>
+  )
+}
+function AccordionContent({
+  className,
+  children,
+  ...props
+}: React.ComponentProps<typeof AccordionPrimitive.Content>) {
+  return (
+    <AccordionPrimitive.Content
+      data-slot="accordion-content"
+      className="data-[state=closed]:animate-accordion-up data-[state=open]:animate-accordion-down overflow-hidden text-sm"
+      {...props}
+    >
+      <div className={cn('pt-0 pb-4', className)}>{children}</div>
+    </AccordionPrimitive.Content>
+  )
+}
+export { Accordion, AccordionItem, AccordionTrigger, AccordionContent }

v0ap/components/ui/alert-dialog.tsx ADDED Viewed

	@@ -0,0 +1,157 @@

+'use client'
+import * as React from 'react'
+import * as AlertDialogPrimitive from '@radix-ui/react-alert-dialog'
+import { cn } from '@/lib/utils'
+import { buttonVariants } from '@/components/ui/button'
+function AlertDialog({
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Root>) {
+  return <AlertDialogPrimitive.Root data-slot="alert-dialog" {...props} />
+}
+function AlertDialogTrigger({
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Trigger>) {
+  return (
+    <AlertDialogPrimitive.Trigger data-slot="alert-dialog-trigger" {...props} />
+  )
+}
+function AlertDialogPortal({
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Portal>) {
+  return (
+    <AlertDialogPrimitive.Portal data-slot="alert-dialog-portal" {...props} />
+  )
+}
+function AlertDialogOverlay({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Overlay>) {
+  return (
+    <AlertDialogPrimitive.Overlay
+      data-slot="alert-dialog-overlay"
+      className={cn(
+        'data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 fixed inset-0 z-50 bg-black/50',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function AlertDialogContent({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Content>) {
+  return (
+    <AlertDialogPortal>
+      <AlertDialogOverlay />
+      <AlertDialogPrimitive.Content
+        data-slot="alert-dialog-content"
+        className={cn(
+          'bg-background data-[state=open]:animate-in data-[state=closed]:animate-out data-[state=closed]:fade-out-0 data-[state=open]:fade-in-0 data-[state=closed]:zoom-out-95 data-[state=open]:zoom-in-95 fixed top-[50%] left-[50%] z-50 grid w-full max-w-[calc(100%-2rem)] translate-x-[-50%] translate-y-[-50%] gap-4 rounded-lg border p-6 shadow-lg duration-200 sm:max-w-lg',
+          className,
+        )}
+        {...props}
+      />
+    </AlertDialogPortal>
+  )
+}
+function AlertDialogHeader({
+  className,
+  ...props
+}: React.ComponentProps<'div'>) {
+  return (
+    <div
+      data-slot="alert-dialog-header"
+      className={cn('flex flex-col gap-2 text-center sm:text-left', className)}
+      {...props}
+    />
+  )
+}
+function AlertDialogFooter({
+  className,
+  ...props
+}: React.ComponentProps<'div'>) {
+  return (
+    <div
+      data-slot="alert-dialog-footer"
+      className={cn(
+        'flex flex-col-reverse gap-2 sm:flex-row sm:justify-end',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function AlertDialogTitle({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Title>) {
+  return (
+    <AlertDialogPrimitive.Title
+      data-slot="alert-dialog-title"
+      className={cn('text-lg font-semibold', className)}
+      {...props}
+    />
+  )
+}
+function AlertDialogDescription({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Description>) {
+  return (
+    <AlertDialogPrimitive.Description
+      data-slot="alert-dialog-description"
+      className={cn('text-muted-foreground text-sm', className)}
+      {...props}
+    />
+  )
+}
+function AlertDialogAction({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Action>) {
+  return (
+    <AlertDialogPrimitive.Action
+      className={cn(buttonVariants(), className)}
+      {...props}
+    />
+  )
+}
+function AlertDialogCancel({
+  className,
+  ...props
+}: React.ComponentProps<typeof AlertDialogPrimitive.Cancel>) {
+  return (
+    <AlertDialogPrimitive.Cancel
+      className={cn(buttonVariants({ variant: 'outline' }), className)}
+      {...props}
+    />
+  )
+}
+export {
+  AlertDialog,
+  AlertDialogPortal,
+  AlertDialogOverlay,
+  AlertDialogTrigger,
+  AlertDialogContent,
+  AlertDialogHeader,
+  AlertDialogFooter,
+  AlertDialogTitle,
+  AlertDialogDescription,
+  AlertDialogAction,
+  AlertDialogCancel,
+}

v0ap/components/ui/alert.tsx ADDED Viewed

	@@ -0,0 +1,66 @@

+import * as React from 'react'
+import { cva, type VariantProps } from 'class-variance-authority'
+import { cn } from '@/lib/utils'
+const alertVariants = cva(
+  'relative w-full rounded-lg border px-4 py-3 text-sm grid has-[>svg]:grid-cols-[calc(var(--spacing)*4)_1fr] grid-cols-[0_1fr] has-[>svg]:gap-x-3 gap-y-0.5 items-start [&>svg]:size-4 [&>svg]:translate-y-0.5 [&>svg]:text-current',
+  {
+    variants: {
+      variant: {
+        default: 'bg-card text-card-foreground',
+        destructive:
+          'text-destructive bg-card [&>svg]:text-current *:data-[slot=alert-description]:text-destructive/90',
+      },
+    },
+    defaultVariants: {
+      variant: 'default',
+    },
+  },
+)
+function Alert({
+  className,
+  variant,
+  ...props
+}: React.ComponentProps<'div'> & VariantProps<typeof alertVariants>) {
+  return (
+    <div
+      data-slot="alert"
+      role="alert"
+      className={cn(alertVariants({ variant }), className)}
+      {...props}
+    />
+  )
+}
+function AlertTitle({ className, ...props }: React.ComponentProps<'div'>) {
+  return (
+    <div
+      data-slot="alert-title"
+      className={cn(
+        'col-start-2 line-clamp-1 min-h-4 font-medium tracking-tight',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function AlertDescription({
+  className,
+  ...props
+}: React.ComponentProps<'div'>) {
+  return (
+    <div
+      data-slot="alert-description"
+      className={cn(
+        'text-muted-foreground col-start-2 grid justify-items-start gap-1 text-sm [&_p]:leading-relaxed',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+export { Alert, AlertTitle, AlertDescription }

v0ap/components/ui/aspect-ratio.tsx ADDED Viewed

	@@ -0,0 +1,11 @@

+'use client'
+import * as AspectRatioPrimitive from '@radix-ui/react-aspect-ratio'
+function AspectRatio({
+  ...props
+}: React.ComponentProps<typeof AspectRatioPrimitive.Root>) {
+  return <AspectRatioPrimitive.Root data-slot="aspect-ratio" {...props} />
+}
+export { AspectRatio }

v0ap/components/ui/avatar.tsx ADDED Viewed

	@@ -0,0 +1,53 @@

+'use client'
+import * as React from 'react'
+import * as AvatarPrimitive from '@radix-ui/react-avatar'
+import { cn } from '@/lib/utils'
+function Avatar({
+  className,
+  ...props
+}: React.ComponentProps<typeof AvatarPrimitive.Root>) {
+  return (
+    <AvatarPrimitive.Root
+      data-slot="avatar"
+      className={cn(
+        'relative flex size-8 shrink-0 overflow-hidden rounded-full',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function AvatarImage({
+  className,
+  ...props
+}: React.ComponentProps<typeof AvatarPrimitive.Image>) {
+  return (
+    <AvatarPrimitive.Image
+      data-slot="avatar-image"
+      className={cn('aspect-square size-full', className)}
+      {...props}
+    />
+  )
+}
+function AvatarFallback({
+  className,
+  ...props
+}: React.ComponentProps<typeof AvatarPrimitive.Fallback>) {
+  return (
+    <AvatarPrimitive.Fallback
+      data-slot="avatar-fallback"
+      className={cn(
+        'bg-muted flex size-full items-center justify-center rounded-full',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+export { Avatar, AvatarImage, AvatarFallback }

v0ap/components/ui/badge.tsx ADDED Viewed

	@@ -0,0 +1,46 @@

+import * as React from 'react'
+import { Slot } from '@radix-ui/react-slot'
+import { cva, type VariantProps } from 'class-variance-authority'
+import { cn } from '@/lib/utils'
+const badgeVariants = cva(
+  'inline-flex items-center justify-center rounded-md border px-2 py-0.5 text-xs font-medium w-fit whitespace-nowrap shrink-0 [&>svg]:size-3 gap-1 [&>svg]:pointer-events-none focus-visible:border-ring focus-visible:ring-ring/50 focus-visible:ring-[3px] aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 aria-invalid:border-destructive transition-[color,box-shadow] overflow-hidden',
+  {
+    variants: {
+      variant: {
+        default:
+          'border-transparent bg-primary text-primary-foreground [a&]:hover:bg-primary/90',
+        secondary:
+          'border-transparent bg-secondary text-secondary-foreground [a&]:hover:bg-secondary/90',
+        destructive:
+          'border-transparent bg-destructive text-white [a&]:hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:focus-visible:ring-destructive/40 dark:bg-destructive/60',
+        outline:
+          'text-foreground [a&]:hover:bg-accent [a&]:hover:text-accent-foreground',
+      },
+    },
+    defaultVariants: {
+      variant: 'default',
+    },
+  },
+)
+function Badge({
+  className,
+  variant,
+  asChild = false,
+  ...props
+}: React.ComponentProps<'span'> &
+  VariantProps<typeof badgeVariants> & { asChild?: boolean }) {
+  const Comp = asChild ? Slot : 'span'
+  return (
+    <Comp
+      data-slot="badge"
+      className={cn(badgeVariants({ variant }), className)}
+      {...props}
+    />
+  )
+}
+export { Badge, badgeVariants }

v0ap/components/ui/breadcrumb.tsx ADDED Viewed

	@@ -0,0 +1,109 @@

+import * as React from 'react'
+import { Slot } from '@radix-ui/react-slot'
+import { ChevronRight, MoreHorizontal } from 'lucide-react'
+import { cn } from '@/lib/utils'
+function Breadcrumb({ ...props }: React.ComponentProps<'nav'>) {
+  return <nav aria-label="breadcrumb" data-slot="breadcrumb" {...props} />
+}
+function BreadcrumbList({ className, ...props }: React.ComponentProps<'ol'>) {
+  return (
+    <ol
+      data-slot="breadcrumb-list"
+      className={cn(
+        'text-muted-foreground flex flex-wrap items-center gap-1.5 text-sm break-words sm:gap-2.5',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function BreadcrumbItem({ className, ...props }: React.ComponentProps<'li'>) {
+  return (
+    <li
+      data-slot="breadcrumb-item"
+      className={cn('inline-flex items-center gap-1.5', className)}
+      {...props}
+    />
+  )
+}
+function BreadcrumbLink({
+  asChild,
+  className,
+  ...props
+}: React.ComponentProps<'a'> & {
+  asChild?: boolean
+}) {
+  const Comp = asChild ? Slot : 'a'
+  return (
+    <Comp
+      data-slot="breadcrumb-link"
+      className={cn('hover:text-foreground transition-colors', className)}
+      {...props}
+    />
+  )
+}
+function BreadcrumbPage({ className, ...props }: React.ComponentProps<'span'>) {
+  return (
+    <span
+      data-slot="breadcrumb-page"
+      role="link"
+      aria-disabled="true"
+      aria-current="page"
+      className={cn('text-foreground font-normal', className)}
+      {...props}
+    />
+  )
+}
+function BreadcrumbSeparator({
+  children,
+  className,
+  ...props
+}: React.ComponentProps<'li'>) {
+  return (
+    <li
+      data-slot="breadcrumb-separator"
+      role="presentation"
+      aria-hidden="true"
+      className={cn('[&>svg]:size-3.5', className)}
+      {...props}
+    >
+      {children ?? <ChevronRight />}
+    </li>
+  )
+}
+function BreadcrumbEllipsis({
+  className,
+  ...props
+}: React.ComponentProps<'span'>) {
+  return (
+    <span
+      data-slot="breadcrumb-ellipsis"
+      role="presentation"
+      aria-hidden="true"
+      className={cn('flex size-9 items-center justify-center', className)}
+      {...props}
+    >
+      <MoreHorizontal className="size-4" />
+      <span className="sr-only">More</span>
+    </span>
+  )
+}
+export {
+  Breadcrumb,
+  BreadcrumbList,
+  BreadcrumbItem,
+  BreadcrumbLink,
+  BreadcrumbPage,
+  BreadcrumbSeparator,
+  BreadcrumbEllipsis,
+}

v0ap/components/ui/button-group.tsx ADDED Viewed

	@@ -0,0 +1,83 @@

+import { Slot } from '@radix-ui/react-slot'
+import { cva, type VariantProps } from 'class-variance-authority'
+import { cn } from '@/lib/utils'
+import { Separator } from '@/components/ui/separator'
+const buttonGroupVariants = cva(
+  "flex w-fit items-stretch [&>*]:focus-visible:z-10 [&>*]:focus-visible:relative [&>[data-slot=select-trigger]:not([class*='w-'])]:w-fit [&>input]:flex-1 has-[select[aria-hidden=true]:last-child]:[&>[data-slot=select-trigger]:last-of-type]:rounded-r-md has-[>[data-slot=button-group]]:gap-2",
+  {
+    variants: {
+      orientation: {
+        horizontal:
+          '[&>*:not(:first-child)]:rounded-l-none [&>*:not(:first-child)]:border-l-0 [&>*:not(:last-child)]:rounded-r-none',
+        vertical:
+          'flex-col [&>*:not(:first-child)]:rounded-t-none [&>*:not(:first-child)]:border-t-0 [&>*:not(:last-child)]:rounded-b-none',
+      },
+    },
+    defaultVariants: {
+      orientation: 'horizontal',
+    },
+  },
+)
+function ButtonGroup({
+  className,
+  orientation,
+  ...props
+}: React.ComponentProps<'div'> & VariantProps<typeof buttonGroupVariants>) {
+  return (
+    <div
+      role="group"
+      data-slot="button-group"
+      data-orientation={orientation}
+      className={cn(buttonGroupVariants({ orientation }), className)}
+      {...props}
+    />
+  )
+}
+function ButtonGroupText({
+  className,
+  asChild = false,
+  ...props
+}: React.ComponentProps<'div'> & {
+  asChild?: boolean
+}) {
+  const Comp = asChild ? Slot : 'div'
+  return (
+    <Comp
+      className={cn(
+        "bg-muted flex items-center gap-2 rounded-md border px-4 text-sm font-medium shadow-xs [&_svg]:pointer-events-none [&_svg:not([class*='size-'])]:size-4",
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+function ButtonGroupSeparator({
+  className,
+  orientation = 'vertical',
+  ...props
+}: React.ComponentProps<typeof Separator>) {
+  return (
+    <Separator
+      data-slot="button-group-separator"
+      orientation={orientation}
+      className={cn(
+        'bg-input relative !m-0 self-stretch data-[orientation=vertical]:h-auto',
+        className,
+      )}
+      {...props}
+    />
+  )
+}
+export {
+  ButtonGroup,
+  ButtonGroupSeparator,
+  ButtonGroupText,
+  buttonGroupVariants,
+}