Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

CreativeEngineer Claude Opus 4.6 commited on 27 days ago

Commit

c647aa0

1 Parent(s): ebd0ff3

feat: polish notebook and README for hackathon submission

- Add apt-get build deps for constellaration on Colab
- Replace FusionLabClient with requests for robust HF Space demo
- Update README with HF Space and notebook links, mark deployment complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (2) hide show

README.md +28 -27
training/notebooks/fusion_design_lab_training.ipynb +55 -45

README.md CHANGED Viewed

@@ -1,37 +1,37 @@
 # Fusion Design Lab
-Fusion Design Lab is an environment-first OpenEnv hackathon project for the `P1` stellarator benchmark.
-The repo is organized around one clear submission thesis:
-- an official `P1` task with `constellaration` as the verifier of record
-- a narrow, reproducible action space
-- real verifier feedback
-- explicit constraints and feasibility semantics
-- a reward function that is iteratively improved through observed behavior
-The environment is the product. A trained policy is still required as evidence that agents can learn and use the environment rather than only manual or scripted play.
-A trained model is required for this repo's submission story. A public Colab notebook artifact is also required by the hackathon, and that notebook should include a trained-policy demonstration rather than stay purely eval-first.
-## Current Status
-This repository is the clean hackathon workspace. The live docs now split cleanly by role:
-- planning and execution: `docs/FUSION_DESIGN_LAB_PLAN_V2.md`
-- technical contract: `docs/P1_ENV_CONTRACT_V1.md`
-- blocker and sweep evidence: `docs/P1_PARAMETERIZATION_DEEPDIVE.md`
-Implementation status:
-- `P1` is locked as the benchmark task
-- docs are aligned to fresh `P1` wiring in this repo
-- shared models, baselines, and server/client entry points now reflect the locked `P1` contract
-- the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
-- the repaired 4-knob low-dimensional family is now wired into the runtime path
-- the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
-- the first tiny low-fi PPO smoke artifact and paired high-fidelity fixture checks now exist
-- a one-trajectory submit-side manual trace has now been recorded
 ## Execution Status
@@ -56,7 +56,8 @@ Implementation status:
 - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
 - [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
 - [x] Refresh the heuristic baseline for the real verifier path
-- [ ] Deploy the real environment to HF Space
 ## Known Gaps
@@ -71,7 +72,7 @@ Implementation status:
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
 - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
-- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). It produced a valid trajectory artifact and exposed a repeated-action local failure, which is the right outcome for a smoke run.
 Current mode:
@@ -137,8 +138,8 @@ uv sync --extra notebooks
 - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
 - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
-- [ ] Deploy the environment to HF Space.
-- [ ] Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.

 # Fusion Design Lab
+Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
+**Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
+**Training Notebook**: [Colab (GRPO + Unsloth)](training/notebooks/fusion_design_lab_training.ipynb)
+## What It Does
+An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:
+| Constraint | Bound |
+|---|---|
+| `aspect_ratio` | ≤ 4.0 |
+| `average_triangularity` | ≤ -0.5 |
+| `edge_iota_over_nfp` | ≥ 0.3 |
+The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. Each episode has a budget of **6 evaluations** across **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit).
+## Architecture
+- **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
+- **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
+- **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
+- **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
+- **Training** (`training/`): GRPO notebook (Unsloth + TRL) and PPO smoke test
+## Current Status
+- `P1` is locked as the benchmark task with `constellaration` as verifier of record
+- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
+- Environment deployed to HF Spaces and verified (health, reset, step all operational)
+- GRPO training notebook created with Unsloth + TRL integration
+- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
 ## Execution Status
 - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
 - [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
 - [x] Refresh the heuristic baseline for the real verifier path
+- [x] Deploy the real environment to HF Space
+- [x] Add the Colab training notebook under `training/notebooks`
 ## Known Gaps
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
 - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
+- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
 Current mode:
 - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
 - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
+- [x] Deploy the environment to HF Space.
+- [x] Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.

training/notebooks/fusion_design_lab_training.ipynb CHANGED Viewed

@@ -37,10 +37,12 @@
    "outputs": [],
    "source": [
     "%%capture\n",
     "!pip install unsloth vllm\n",
     "!pip install --no-deps trl\n",
-    "!pip install constellaration openenv-core[core] pydantic fastapi uvicorn\n",
-    "!pip install matplotlib"
    ]
   },
   {
@@ -85,11 +87,7 @@
    "cell_type": "markdown",
    "id": "8edb47106e1a46a883d545849b8ab81b",
    "metadata": {},
-   "source": [
-    "## 3. Setup Stellarator Environment\n",
-    "\n",
-    "We install the environment package directly from the HF Space repository so training runs locally (no network latency). The same environment is deployed at the HF Space URL above."
-   ]
   },
   {
    "cell_type": "code",
@@ -99,7 +97,9 @@
    "outputs": [],
    "source": [
     "%%capture\n",
-    "!pip install git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab"
    ]
   },
   {
@@ -579,11 +579,7 @@
    "cell_type": "markdown",
    "id": "cb1e1581032b452c9409d6c6813c49d1",
    "metadata": {},
-   "source": [
-    "## 10. Connect to Deployed HF Space\n",
-    "\n",
-    "Demonstrate connecting to the live environment on Hugging Face Spaces."
-   ]
   },
   {
    "cell_type": "code",
@@ -592,43 +588,57 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "from fusion_lab.client import FusionLabClient\n",
-    "from fusion_lab.models import StellaratorAction\n",
     "\n",
     "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
     "\n",
-    "with FusionLabClient(base_url=HF_SPACE_URL).sync() as client:\n",
-    "    obs = client.reset()\n",
-    "    print(f\"Connected to HF Space: {HF_SPACE_URL}\")\n",
-    "    print(\"Initial observation:\")\n",
-    "    print(f\"  max_elongation: {obs.observation.max_elongation:.4f}\")\n",
-    "    print(f\"  aspect_ratio: {obs.observation.aspect_ratio:.4f}\")\n",
-    "    print(f\"  p1_score: {obs.observation.p1_score:.4f}\")\n",
-    "    print(f\"  constraints_satisfied: {obs.observation.constraints_satisfied}\")\n",
-    "    print(f\"  budget_remaining: {obs.observation.budget_remaining}\")\n",
-    "\n",
-    "    # Run one action from the trained model\n",
-    "    prompt = build_prompt(obs.observation)\n",
-    "    inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
-    "    outputs = model.generate(\n",
-    "        **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n",
-    "    )\n",
-    "    completion = tokenizer.decode(\n",
-    "        outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
     "    )\n",
-    "    actions = parse_action_plan(completion)\n",
-    "\n",
-    "    print(f\"\\nModel generated {len(actions)} actions:\")\n",
-    "    for i, action in enumerate(actions[:BUDGET]):\n",
-    "        result = client.step(action)\n",
-    "        print(\n",
-    "            f\"  Step {i + 1}: {action.intent} {action.parameter or ''} {action.direction or ''} {action.magnitude or ''} → reward={result.reward:.3f}\"\n",
-    "        )\n",
-    "        if result.done:\n",
-    "            print(f\"  Episode done. Final score: {result.observation.p1_score:.4f}\")\n",
-    "            break\n",
     "\n",
-    "print(\"\\nDone! Environment is live and accessible for training and evaluation.\")"
    ]
   }
  ],

    "outputs": [],
    "source": [
     "%%capture\n",
+    "# Build deps for constellaration (booz-xform compiles from source)\n",
+    "!apt-get update -qq && apt-get install -y -qq cmake ninja-build g++ gfortran libnetcdf-dev libnetcdff-dev > /dev/null\n",
+    "\n",
     "!pip install unsloth vllm\n",
     "!pip install --no-deps trl\n",
+    "!pip install matplotlib requests"
    ]
   },
   {
    "cell_type": "markdown",
    "id": "8edb47106e1a46a883d545849b8ab81b",
    "metadata": {},
+   "source": "## 3. Setup Stellarator Environment\n\nInstall the environment package directly from the repository so training runs locally (no network latency per step). The same environment is deployed at the HF Space URL above."
   },
   {
    "cell_type": "code",
    "outputs": [],
    "source": [
     "%%capture\n",
+    "# Install the fusion-design-lab environment (includes constellaration physics engine)\n",
+    "# This takes ~3 minutes due to booz-xform compilation\n",
+    "!pip install \"fusion-design-lab @ git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab\""
    ]
   },
   {
    "cell_type": "markdown",
    "id": "cb1e1581032b452c9409d6c6813c49d1",
    "metadata": {},
+   "source": "## 10. Connect to Deployed HF Space\n\nDemonstrate connecting to the live environment on Hugging Face Spaces and running the trained model against it."
   },
   {
    "cell_type": "code",
    "metadata": {},
    "outputs": [],
    "source": [
+    "import requests\n",
+    "\n",
+    "from fusion_lab.models import StellaratorObservation\n",
     "\n",
     "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
     "\n",
+    "# Check health\n",
+    "health = requests.get(f\"{HF_SPACE_URL}/health\").json()\n",
+    "print(f\"HF Space status: {health['status']}\")\n",
+    "\n",
+    "# Get task description\n",
+    "task = requests.get(f\"{HF_SPACE_URL}/task\").json()\n",
+    "print(f\"\\nTask: {task['description']}\")\n",
+    "print(f\"Constraints: {task['constraints']}\")\n",
+    "print(f\"Budget: {task['budget']}\")\n",
+    "\n",
+    "# Reset an episode on the remote environment\n",
+    "resp = requests.post(f\"{HF_SPACE_URL}/reset\", json={\"seed\": 42}).json()\n",
+    "obs_data = resp[\"observation\"]\n",
+    "print(f\"\\nRemote reset — max_elongation: {obs_data['max_elongation']:.4f}\")\n",
+    "print(f\"  aspect_ratio: {obs_data['aspect_ratio']:.4f}\")\n",
+    "print(f\"  constraints_satisfied: {obs_data['constraints_satisfied']}\")\n",
+    "print(f\"  budget_remaining: {obs_data['budget_remaining']}\")\n",
+    "\n",
+    "# Generate an action plan from the trained model\n",
+    "remote_obs = StellaratorObservation.model_validate(obs_data)\n",
+    "prompt = build_prompt(remote_obs)\n",
+    "inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
+    "outputs = model.generate(\n",
+    "    **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n",
+    ")\n",
+    "completion = tokenizer.decode(outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True)\n",
+    "actions = parse_action_plan(completion)\n",
+    "\n",
+    "print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n",
+    "for i, action in enumerate(actions[:BUDGET]):\n",
+    "    action_payload = action.model_dump(exclude_none=True)\n",
+    "    step_resp = requests.post(f\"{HF_SPACE_URL}/step\", json={\"action\": action_payload}).json()\n",
+    "    r = step_resp.get(\"reward\", 0)\n",
+    "    done = step_resp.get(\"done\", False)\n",
+    "    step_obs = step_resp[\"observation\"]\n",
+    "    print(\n",
+    "        f\"  Step {i + 1}: {action.intent} {action.parameter or ''} \"\n",
+    "        f\"{action.direction or ''} {action.magnitude or ''} \"\n",
+    "        f\"→ reward={r:.3f}, score={step_obs['p1_score']:.4f}\"\n",
     "    )\n",
+    "    if done:\n",
+    "        print(f\"  Episode done. Final score: {step_obs['p1_score']:.4f}\")\n",
+    "        break\n",
     "\n",
+    "print(\"\\nEnvironment is live and accessible for training and evaluation.\")"
    ]
   }
  ],