Commit ·
c647aa0
1
Parent(s): ebd0ff3
feat: polish notebook and README for hackathon submission
Browse files- Add apt-get build deps for constellaration on Colab
- Replace FusionLabClient with requests for robust HF Space demo
- Update README with HF Space and notebook links, mark deployment complete
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- README.md +28 -27
- training/notebooks/fusion_design_lab_training.ipynb +55 -45
README.md
CHANGED
|
@@ -1,37 +1,37 @@
|
|
| 1 |
# Fusion Design Lab
|
| 2 |
|
| 3 |
-
Fusion Design Lab is an environment-first OpenEnv hackathon project for the `P1` stellarator benchmark.
|
| 4 |
|
| 5 |
-
|
|
|
|
| 6 |
|
| 7 |
-
|
| 8 |
-
- a narrow, reproducible action space
|
| 9 |
-
- real verifier feedback
|
| 10 |
-
- explicit constraints and feasibility semantics
|
| 11 |
-
- a reward function that is iteratively improved through observed behavior
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
-
|
|
|
|
|
|
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
- `P1` is locked as the benchmark task
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
-
- the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
|
| 33 |
-
- the first tiny low-fi PPO smoke artifact and paired high-fidelity fixture checks now exist
|
| 34 |
-
- a one-trajectory submit-side manual trace has now been recorded
|
| 35 |
|
| 36 |
## Execution Status
|
| 37 |
|
|
@@ -56,7 +56,8 @@ Implementation status:
|
|
| 56 |
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
|
| 57 |
- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
|
| 58 |
- [x] Refresh the heuristic baseline for the real verifier path
|
| 59 |
-
- [
|
|
|
|
| 60 |
|
| 61 |
## Known Gaps
|
| 62 |
|
|
@@ -71,7 +72,7 @@ Implementation status:
|
|
| 71 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 72 |
- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
|
| 73 |
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
|
| 74 |
-
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md).
|
| 75 |
|
| 76 |
Current mode:
|
| 77 |
|
|
@@ -137,8 +138,8 @@ uv sync --extra notebooks
|
|
| 137 |
- [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
|
| 138 |
- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
|
| 139 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 140 |
-
- [
|
| 141 |
-
- [
|
| 142 |
|
| 143 |
These are implementation steps, not another planning phase.
|
| 144 |
|
|
|
|
| 1 |
# Fusion Design Lab
|
| 2 |
|
| 3 |
+
Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
|
| 4 |
|
| 5 |
+
**Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
|
| 6 |
+
**Training Notebook**: [Colab (GRPO + Unsloth)](training/notebooks/fusion_design_lab_training.ipynb)
|
| 7 |
|
| 8 |
+
## What It Does
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
+
An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:
|
| 11 |
|
| 12 |
+
| Constraint | Bound |
|
| 13 |
+
|---|---|
|
| 14 |
+
| `aspect_ratio` | ≤ 4.0 |
|
| 15 |
+
| `average_triangularity` | ≤ -0.5 |
|
| 16 |
+
| `edge_iota_over_nfp` | ≥ 0.3 |
|
| 17 |
|
| 18 |
+
The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. Each episode has a budget of **6 evaluations** across **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit).
|
| 19 |
|
| 20 |
+
## Architecture
|
| 21 |
|
| 22 |
+
- **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
|
| 23 |
+
- **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
|
| 24 |
+
- **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
|
| 25 |
+
- **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
|
| 26 |
+
- **Training** (`training/`): GRPO notebook (Unsloth + TRL) and PPO smoke test
|
| 27 |
|
| 28 |
+
## Current Status
|
| 29 |
|
| 30 |
+
- `P1` is locked as the benchmark task with `constellaration` as verifier of record
|
| 31 |
+
- The repaired 4-knob low-dimensional boundary family is wired into the runtime path
|
| 32 |
+
- Environment deployed to HF Spaces and verified (health, reset, step all operational)
|
| 33 |
+
- GRPO training notebook created with Unsloth + TRL integration
|
| 34 |
+
- Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Execution Status
|
| 37 |
|
|
|
|
| 56 |
- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
|
| 57 |
- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
|
| 58 |
- [x] Refresh the heuristic baseline for the real verifier path
|
| 59 |
+
- [x] Deploy the real environment to HF Space
|
| 60 |
+
- [x] Add the Colab training notebook under `training/notebooks`
|
| 61 |
|
| 62 |
## Known Gaps
|
| 63 |
|
|
|
|
| 72 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 73 |
- The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
|
| 74 |
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
|
| 75 |
+
- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
|
| 76 |
|
| 77 |
Current mode:
|
| 78 |
|
|
|
|
| 138 |
- [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
|
| 139 |
- [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
|
| 140 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 141 |
+
- [x] Deploy the environment to HF Space.
|
| 142 |
+
- [x] Add the Colab notebook under `training/notebooks`.
|
| 143 |
|
| 144 |
These are implementation steps, not another planning phase.
|
| 145 |
|
training/notebooks/fusion_design_lab_training.ipynb
CHANGED
|
@@ -37,10 +37,12 @@
|
|
| 37 |
"outputs": [],
|
| 38 |
"source": [
|
| 39 |
"%%capture\n",
|
|
|
|
|
|
|
|
|
|
| 40 |
"!pip install unsloth vllm\n",
|
| 41 |
"!pip install --no-deps trl\n",
|
| 42 |
-
"!pip install
|
| 43 |
-
"!pip install matplotlib"
|
| 44 |
]
|
| 45 |
},
|
| 46 |
{
|
|
@@ -85,11 +87,7 @@
|
|
| 85 |
"cell_type": "markdown",
|
| 86 |
"id": "8edb47106e1a46a883d545849b8ab81b",
|
| 87 |
"metadata": {},
|
| 88 |
-
"source":
|
| 89 |
-
"## 3. Setup Stellarator Environment\n",
|
| 90 |
-
"\n",
|
| 91 |
-
"We install the environment package directly from the HF Space repository so training runs locally (no network latency). The same environment is deployed at the HF Space URL above."
|
| 92 |
-
]
|
| 93 |
},
|
| 94 |
{
|
| 95 |
"cell_type": "code",
|
|
@@ -99,7 +97,9 @@
|
|
| 99 |
"outputs": [],
|
| 100 |
"source": [
|
| 101 |
"%%capture\n",
|
| 102 |
-
"
|
|
|
|
|
|
|
| 103 |
]
|
| 104 |
},
|
| 105 |
{
|
|
@@ -579,11 +579,7 @@
|
|
| 579 |
"cell_type": "markdown",
|
| 580 |
"id": "cb1e1581032b452c9409d6c6813c49d1",
|
| 581 |
"metadata": {},
|
| 582 |
-
"source":
|
| 583 |
-
"## 10. Connect to Deployed HF Space\n",
|
| 584 |
-
"\n",
|
| 585 |
-
"Demonstrate connecting to the live environment on Hugging Face Spaces."
|
| 586 |
-
]
|
| 587 |
},
|
| 588 |
{
|
| 589 |
"cell_type": "code",
|
|
@@ -592,43 +588,57 @@
|
|
| 592 |
"metadata": {},
|
| 593 |
"outputs": [],
|
| 594 |
"source": [
|
| 595 |
-
"
|
| 596 |
-
"
|
|
|
|
| 597 |
"\n",
|
| 598 |
"HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
|
| 599 |
"\n",
|
| 600 |
-
"
|
| 601 |
-
"
|
| 602 |
-
"
|
| 603 |
-
"
|
| 604 |
-
"
|
| 605 |
-
"
|
| 606 |
-
"
|
| 607 |
-
"
|
| 608 |
-
"
|
| 609 |
-
"\n",
|
| 610 |
-
"
|
| 611 |
-
"
|
| 612 |
-
"
|
| 613 |
-
"
|
| 614 |
-
"
|
| 615 |
-
"
|
| 616 |
-
"
|
| 617 |
-
"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 618 |
" )\n",
|
| 619 |
-
"
|
| 620 |
-
"\n",
|
| 621 |
-
"
|
| 622 |
-
" for i, action in enumerate(actions[:BUDGET]):\n",
|
| 623 |
-
" result = client.step(action)\n",
|
| 624 |
-
" print(\n",
|
| 625 |
-
" f\" Step {i + 1}: {action.intent} {action.parameter or ''} {action.direction or ''} {action.magnitude or ''} → reward={result.reward:.3f}\"\n",
|
| 626 |
-
" )\n",
|
| 627 |
-
" if result.done:\n",
|
| 628 |
-
" print(f\" Episode done. Final score: {result.observation.p1_score:.4f}\")\n",
|
| 629 |
-
" break\n",
|
| 630 |
"\n",
|
| 631 |
-
"print(\"\\
|
| 632 |
]
|
| 633 |
}
|
| 634 |
],
|
|
|
|
| 37 |
"outputs": [],
|
| 38 |
"source": [
|
| 39 |
"%%capture\n",
|
| 40 |
+
"# Build deps for constellaration (booz-xform compiles from source)\n",
|
| 41 |
+
"!apt-get update -qq && apt-get install -y -qq cmake ninja-build g++ gfortran libnetcdf-dev libnetcdff-dev > /dev/null\n",
|
| 42 |
+
"\n",
|
| 43 |
"!pip install unsloth vllm\n",
|
| 44 |
"!pip install --no-deps trl\n",
|
| 45 |
+
"!pip install matplotlib requests"
|
|
|
|
| 46 |
]
|
| 47 |
},
|
| 48 |
{
|
|
|
|
| 87 |
"cell_type": "markdown",
|
| 88 |
"id": "8edb47106e1a46a883d545849b8ab81b",
|
| 89 |
"metadata": {},
|
| 90 |
+
"source": "## 3. Setup Stellarator Environment\n\nInstall the environment package directly from the repository so training runs locally (no network latency per step). The same environment is deployed at the HF Space URL above."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
},
|
| 92 |
{
|
| 93 |
"cell_type": "code",
|
|
|
|
| 97 |
"outputs": [],
|
| 98 |
"source": [
|
| 99 |
"%%capture\n",
|
| 100 |
+
"# Install the fusion-design-lab environment (includes constellaration physics engine)\n",
|
| 101 |
+
"# This takes ~3 minutes due to booz-xform compilation\n",
|
| 102 |
+
"!pip install \"fusion-design-lab @ git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab\""
|
| 103 |
]
|
| 104 |
},
|
| 105 |
{
|
|
|
|
| 579 |
"cell_type": "markdown",
|
| 580 |
"id": "cb1e1581032b452c9409d6c6813c49d1",
|
| 581 |
"metadata": {},
|
| 582 |
+
"source": "## 10. Connect to Deployed HF Space\n\nDemonstrate connecting to the live environment on Hugging Face Spaces and running the trained model against it."
|
|
|
|
|
|
|
|
|
|
|
|
|
| 583 |
},
|
| 584 |
{
|
| 585 |
"cell_type": "code",
|
|
|
|
| 588 |
"metadata": {},
|
| 589 |
"outputs": [],
|
| 590 |
"source": [
|
| 591 |
+
"import requests\n",
|
| 592 |
+
"\n",
|
| 593 |
+
"from fusion_lab.models import StellaratorObservation\n",
|
| 594 |
"\n",
|
| 595 |
"HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
|
| 596 |
"\n",
|
| 597 |
+
"# Check health\n",
|
| 598 |
+
"health = requests.get(f\"{HF_SPACE_URL}/health\").json()\n",
|
| 599 |
+
"print(f\"HF Space status: {health['status']}\")\n",
|
| 600 |
+
"\n",
|
| 601 |
+
"# Get task description\n",
|
| 602 |
+
"task = requests.get(f\"{HF_SPACE_URL}/task\").json()\n",
|
| 603 |
+
"print(f\"\\nTask: {task['description']}\")\n",
|
| 604 |
+
"print(f\"Constraints: {task['constraints']}\")\n",
|
| 605 |
+
"print(f\"Budget: {task['budget']}\")\n",
|
| 606 |
+
"\n",
|
| 607 |
+
"# Reset an episode on the remote environment\n",
|
| 608 |
+
"resp = requests.post(f\"{HF_SPACE_URL}/reset\", json={\"seed\": 42}).json()\n",
|
| 609 |
+
"obs_data = resp[\"observation\"]\n",
|
| 610 |
+
"print(f\"\\nRemote reset — max_elongation: {obs_data['max_elongation']:.4f}\")\n",
|
| 611 |
+
"print(f\" aspect_ratio: {obs_data['aspect_ratio']:.4f}\")\n",
|
| 612 |
+
"print(f\" constraints_satisfied: {obs_data['constraints_satisfied']}\")\n",
|
| 613 |
+
"print(f\" budget_remaining: {obs_data['budget_remaining']}\")\n",
|
| 614 |
+
"\n",
|
| 615 |
+
"# Generate an action plan from the trained model\n",
|
| 616 |
+
"remote_obs = StellaratorObservation.model_validate(obs_data)\n",
|
| 617 |
+
"prompt = build_prompt(remote_obs)\n",
|
| 618 |
+
"inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
|
| 619 |
+
"outputs = model.generate(\n",
|
| 620 |
+
" **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n",
|
| 621 |
+
")\n",
|
| 622 |
+
"completion = tokenizer.decode(outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True)\n",
|
| 623 |
+
"actions = parse_action_plan(completion)\n",
|
| 624 |
+
"\n",
|
| 625 |
+
"print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n",
|
| 626 |
+
"for i, action in enumerate(actions[:BUDGET]):\n",
|
| 627 |
+
" action_payload = action.model_dump(exclude_none=True)\n",
|
| 628 |
+
" step_resp = requests.post(f\"{HF_SPACE_URL}/step\", json={\"action\": action_payload}).json()\n",
|
| 629 |
+
" r = step_resp.get(\"reward\", 0)\n",
|
| 630 |
+
" done = step_resp.get(\"done\", False)\n",
|
| 631 |
+
" step_obs = step_resp[\"observation\"]\n",
|
| 632 |
+
" print(\n",
|
| 633 |
+
" f\" Step {i + 1}: {action.intent} {action.parameter or ''} \"\n",
|
| 634 |
+
" f\"{action.direction or ''} {action.magnitude or ''} \"\n",
|
| 635 |
+
" f\"→ reward={r:.3f}, score={step_obs['p1_score']:.4f}\"\n",
|
| 636 |
" )\n",
|
| 637 |
+
" if done:\n",
|
| 638 |
+
" print(f\" Episode done. Final score: {step_obs['p1_score']:.4f}\")\n",
|
| 639 |
+
" break\n",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 640 |
"\n",
|
| 641 |
+
"print(\"\\nEnvironment is live and accessible for training and evaluation.\")"
|
| 642 |
]
|
| 643 |
}
|
| 644 |
],
|