CreativeEngineer Claude Opus 4.6 commited on
Commit
c647aa0
·
1 Parent(s): ebd0ff3

feat: polish notebook and README for hackathon submission

Browse files

- Add apt-get build deps for constellaration on Colab
- Replace FusionLabClient with requests for robust HF Space demo
- Update README with HF Space and notebook links, mark deployment complete

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

README.md CHANGED
@@ -1,37 +1,37 @@
1
  # Fusion Design Lab
2
 
3
- Fusion Design Lab is an environment-first OpenEnv hackathon project for the `P1` stellarator benchmark.
4
 
5
- The repo is organized around one clear submission thesis:
 
6
 
7
- - an official `P1` task with `constellaration` as the verifier of record
8
- - a narrow, reproducible action space
9
- - real verifier feedback
10
- - explicit constraints and feasibility semantics
11
- - a reward function that is iteratively improved through observed behavior
12
 
13
- The environment is the product. A trained policy is still required as evidence that agents can learn and use the environment rather than only manual or scripted play.
14
 
15
- A trained model is required for this repo's submission story. A public Colab notebook artifact is also required by the hackathon, and that notebook should include a trained-policy demonstration rather than stay purely eval-first.
 
 
 
 
16
 
17
- ## Current Status
18
 
19
- This repository is the clean hackathon workspace. The live docs now split cleanly by role:
20
 
21
- - planning and execution: `docs/FUSION_DESIGN_LAB_PLAN_V2.md`
22
- - technical contract: `docs/P1_ENV_CONTRACT_V1.md`
23
- - blocker and sweep evidence: `docs/P1_PARAMETERIZATION_DEEPDIVE.md`
 
 
24
 
25
- Implementation status:
26
 
27
- - `P1` is locked as the benchmark task
28
- - docs are aligned to fresh `P1` wiring in this repo
29
- - shared models, baselines, and server/client entry points now reflect the locked `P1` contract
30
- - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
31
- - the repaired 4-knob low-dimensional family is now wired into the runtime path
32
- - the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
33
- - the first tiny low-fi PPO smoke artifact and paired high-fidelity fixture checks now exist
34
- - a one-trajectory submit-side manual trace has now been recorded
35
 
36
  ## Execution Status
37
 
@@ -56,7 +56,8 @@ Implementation status:
56
  - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
57
  - [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
58
  - [x] Refresh the heuristic baseline for the real verifier path
59
- - [ ] Deploy the real environment to HF Space
 
60
 
61
  ## Known Gaps
62
 
@@ -71,7 +72,7 @@ Implementation status:
71
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
72
  - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
73
  - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
74
- - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). It produced a valid trajectory artifact and exposed a repeated-action local failure, which is the right outcome for a smoke run.
75
 
76
  Current mode:
77
 
@@ -137,8 +138,8 @@ uv sync --extra notebooks
137
  - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
138
  - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
139
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
140
- - [ ] Deploy the environment to HF Space.
141
- - [ ] Add the Colab notebook under `training/notebooks`.
142
 
143
  These are implementation steps, not another planning phase.
144
 
 
1
  # Fusion Design Lab
2
 
3
+ Fusion Design Lab is an environment-first [OpenEnv](https://openenv.dev) hackathon project for the `P1` stellarator benchmark.
4
 
5
+ **Live Environment**: [HF Space](https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab)
6
+ **Training Notebook**: [Colab (GRPO + Unsloth)](training/notebooks/fusion_design_lab_training.ipynb)
7
 
8
+ ## What It Does
 
 
 
 
9
 
10
+ An RL environment where agents optimize stellarator fusion reactor designs by adjusting 4 geometric knobs of a low-dimensional boundary family, aiming to **minimize max elongation** while satisfying 3 hard physics constraints:
11
 
12
+ | Constraint | Bound |
13
+ |---|---|
14
+ | `aspect_ratio` | ≤ 4.0 |
15
+ | `average_triangularity` | ≤ -0.5 |
16
+ | `edge_iota_over_nfp` | ≥ 0.3 |
17
 
18
+ The environment uses [`constellaration`](https://pypi.org/project/constellaration/) as the physics verifier — low-fidelity (~0.6s) for the RL inner loop, high-fidelity (~4s) for terminal submit. Each episode has a budget of **6 evaluations** across **26 discrete actions** (4 parameters × 2 directions × 3 magnitudes + restore_best + submit).
19
 
20
+ ## Architecture
21
 
22
+ - **Environment server** (`server/`): FastAPI app with `/reset`, `/step`, `/health`, `/task` endpoints
23
+ - **Physics engine** (`server/physics.py`): `constellaration` VMEC-backed boundary evaluation
24
+ - **Models** (`fusion_lab/models.py`): Pydantic schemas for actions, observations, state
25
+ - **Client** (`fusion_lab/client.py`): Typed OpenEnv client for remote interaction
26
+ - **Training** (`training/`): GRPO notebook (Unsloth + TRL) and PPO smoke test
27
 
28
+ ## Current Status
29
 
30
+ - `P1` is locked as the benchmark task with `constellaration` as verifier of record
31
+ - The repaired 4-knob low-dimensional boundary family is wired into the runtime path
32
+ - Environment deployed to HF Spaces and verified (health, reset, step all operational)
33
+ - GRPO training notebook created with Unsloth + TRL integration
34
+ - Low-fidelity PPO smoke artifacts and paired high-fidelity fixture checks exist
 
 
 
35
 
36
  ## Execution Status
37
 
 
56
  - [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
57
  - [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
58
  - [x] Refresh the heuristic baseline for the real verifier path
59
+ - [x] Deploy the real environment to HF Space
60
+ - [x] Add the Colab training notebook under `training/notebooks`
61
 
62
  ## Known Gaps
63
 
 
72
  - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
73
  - The refreshed real-verifier heuristic now follows the measured feasible sequence instead of the older threshold-only policy: on a fresh `uv run python baselines/compare.py 5` rerun, it finished with `5/5` feasible high-fidelity finals, mean final `P1` score `0.291951`, and `5/5` wins over random.
74
  - The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now reset-seed confirmation and one presentation-ready comparison trace backed by the paired high-fidelity evidence.
75
+ - The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). The repaired smoke trainer now finds a real positive repair signal on the easy seed, but it still does not generalize across all frozen seeds, which is the right diagnostic boundary for this stage.
76
 
77
  Current mode:
78
 
 
138
  - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
139
  - [ ] Save one presentation-ready comparison trace from the refreshed heuristic baseline.
140
  - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
141
+ - [x] Deploy the environment to HF Space.
142
+ - [x] Add the Colab notebook under `training/notebooks`.
143
 
144
  These are implementation steps, not another planning phase.
145
 
training/notebooks/fusion_design_lab_training.ipynb CHANGED
@@ -37,10 +37,12 @@
37
  "outputs": [],
38
  "source": [
39
  "%%capture\n",
 
 
 
40
  "!pip install unsloth vllm\n",
41
  "!pip install --no-deps trl\n",
42
- "!pip install constellaration openenv-core[core] pydantic fastapi uvicorn\n",
43
- "!pip install matplotlib"
44
  ]
45
  },
46
  {
@@ -85,11 +87,7 @@
85
  "cell_type": "markdown",
86
  "id": "8edb47106e1a46a883d545849b8ab81b",
87
  "metadata": {},
88
- "source": [
89
- "## 3. Setup Stellarator Environment\n",
90
- "\n",
91
- "We install the environment package directly from the HF Space repository so training runs locally (no network latency). The same environment is deployed at the HF Space URL above."
92
- ]
93
  },
94
  {
95
  "cell_type": "code",
@@ -99,7 +97,9 @@
99
  "outputs": [],
100
  "source": [
101
  "%%capture\n",
102
- "!pip install git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab"
 
 
103
  ]
104
  },
105
  {
@@ -579,11 +579,7 @@
579
  "cell_type": "markdown",
580
  "id": "cb1e1581032b452c9409d6c6813c49d1",
581
  "metadata": {},
582
- "source": [
583
- "## 10. Connect to Deployed HF Space\n",
584
- "\n",
585
- "Demonstrate connecting to the live environment on Hugging Face Spaces."
586
- ]
587
  },
588
  {
589
  "cell_type": "code",
@@ -592,43 +588,57 @@
592
  "metadata": {},
593
  "outputs": [],
594
  "source": [
595
- "from fusion_lab.client import FusionLabClient\n",
596
- "from fusion_lab.models import StellaratorAction\n",
 
597
  "\n",
598
  "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
599
  "\n",
600
- "with FusionLabClient(base_url=HF_SPACE_URL).sync() as client:\n",
601
- " obs = client.reset()\n",
602
- " print(f\"Connected to HF Space: {HF_SPACE_URL}\")\n",
603
- " print(\"Initial observation:\")\n",
604
- " print(f\" max_elongation: {obs.observation.max_elongation:.4f}\")\n",
605
- " print(f\" aspect_ratio: {obs.observation.aspect_ratio:.4f}\")\n",
606
- " print(f\" p1_score: {obs.observation.p1_score:.4f}\")\n",
607
- " print(f\" constraints_satisfied: {obs.observation.constraints_satisfied}\")\n",
608
- " print(f\" budget_remaining: {obs.observation.budget_remaining}\")\n",
609
- "\n",
610
- " # Run one action from the trained model\n",
611
- " prompt = build_prompt(obs.observation)\n",
612
- " inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
613
- " outputs = model.generate(\n",
614
- " **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n",
615
- " )\n",
616
- " completion = tokenizer.decode(\n",
617
- " outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True\n",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
618
  " )\n",
619
- " actions = parse_action_plan(completion)\n",
620
- "\n",
621
- " print(f\"\\nModel generated {len(actions)} actions:\")\n",
622
- " for i, action in enumerate(actions[:BUDGET]):\n",
623
- " result = client.step(action)\n",
624
- " print(\n",
625
- " f\" Step {i + 1}: {action.intent} {action.parameter or ''} {action.direction or ''} {action.magnitude or ''} → reward={result.reward:.3f}\"\n",
626
- " )\n",
627
- " if result.done:\n",
628
- " print(f\" Episode done. Final score: {result.observation.p1_score:.4f}\")\n",
629
- " break\n",
630
  "\n",
631
- "print(\"\\nDone! Environment is live and accessible for training and evaluation.\")"
632
  ]
633
  }
634
  ],
 
37
  "outputs": [],
38
  "source": [
39
  "%%capture\n",
40
+ "# Build deps for constellaration (booz-xform compiles from source)\n",
41
+ "!apt-get update -qq && apt-get install -y -qq cmake ninja-build g++ gfortran libnetcdf-dev libnetcdff-dev > /dev/null\n",
42
+ "\n",
43
  "!pip install unsloth vllm\n",
44
  "!pip install --no-deps trl\n",
45
+ "!pip install matplotlib requests"
 
46
  ]
47
  },
48
  {
 
87
  "cell_type": "markdown",
88
  "id": "8edb47106e1a46a883d545849b8ab81b",
89
  "metadata": {},
90
+ "source": "## 3. Setup Stellarator Environment\n\nInstall the environment package directly from the repository so training runs locally (no network latency per step). The same environment is deployed at the HF Space URL above."
 
 
 
 
91
  },
92
  {
93
  "cell_type": "code",
 
97
  "outputs": [],
98
  "source": [
99
  "%%capture\n",
100
+ "# Install the fusion-design-lab environment (includes constellaration physics engine)\n",
101
+ "# This takes ~3 minutes due to booz-xform compilation\n",
102
+ "!pip install \"fusion-design-lab @ git+https://huggingface.co/spaces/CreativeEngineer/fusion-design-lab\""
103
  ]
104
  },
105
  {
 
579
  "cell_type": "markdown",
580
  "id": "cb1e1581032b452c9409d6c6813c49d1",
581
  "metadata": {},
582
+ "source": "## 10. Connect to Deployed HF Space\n\nDemonstrate connecting to the live environment on Hugging Face Spaces and running the trained model against it."
 
 
 
 
583
  },
584
  {
585
  "cell_type": "code",
 
588
  "metadata": {},
589
  "outputs": [],
590
  "source": [
591
+ "import requests\n",
592
+ "\n",
593
+ "from fusion_lab.models import StellaratorObservation\n",
594
  "\n",
595
  "HF_SPACE_URL = \"https://creativeengineer-fusion-design-lab.hf.space\"\n",
596
  "\n",
597
+ "# Check health\n",
598
+ "health = requests.get(f\"{HF_SPACE_URL}/health\").json()\n",
599
+ "print(f\"HF Space status: {health['status']}\")\n",
600
+ "\n",
601
+ "# Get task description\n",
602
+ "task = requests.get(f\"{HF_SPACE_URL}/task\").json()\n",
603
+ "print(f\"\\nTask: {task['description']}\")\n",
604
+ "print(f\"Constraints: {task['constraints']}\")\n",
605
+ "print(f\"Budget: {task['budget']}\")\n",
606
+ "\n",
607
+ "# Reset an episode on the remote environment\n",
608
+ "resp = requests.post(f\"{HF_SPACE_URL}/reset\", json={\"seed\": 42}).json()\n",
609
+ "obs_data = resp[\"observation\"]\n",
610
+ "print(f\"\\nRemote reset — max_elongation: {obs_data['max_elongation']:.4f}\")\n",
611
+ "print(f\" aspect_ratio: {obs_data['aspect_ratio']:.4f}\")\n",
612
+ "print(f\" constraints_satisfied: {obs_data['constraints_satisfied']}\")\n",
613
+ "print(f\" budget_remaining: {obs_data['budget_remaining']}\")\n",
614
+ "\n",
615
+ "# Generate an action plan from the trained model\n",
616
+ "remote_obs = StellaratorObservation.model_validate(obs_data)\n",
617
+ "prompt = build_prompt(remote_obs)\n",
618
+ "inputs = tokenizer(prompt, return_tensors=\"pt\").to(model.device)\n",
619
+ "outputs = model.generate(\n",
620
+ " **inputs, max_new_tokens=MAX_COMPLETION_LENGTH, temperature=0.7, do_sample=True\n",
621
+ ")\n",
622
+ "completion = tokenizer.decode(outputs[0][inputs[\"input_ids\"].shape[1] :], skip_special_tokens=True)\n",
623
+ "actions = parse_action_plan(completion)\n",
624
+ "\n",
625
+ "print(f\"\\nTrained model generated {len(actions)} actions for remote env:\")\n",
626
+ "for i, action in enumerate(actions[:BUDGET]):\n",
627
+ " action_payload = action.model_dump(exclude_none=True)\n",
628
+ " step_resp = requests.post(f\"{HF_SPACE_URL}/step\", json={\"action\": action_payload}).json()\n",
629
+ " r = step_resp.get(\"reward\", 0)\n",
630
+ " done = step_resp.get(\"done\", False)\n",
631
+ " step_obs = step_resp[\"observation\"]\n",
632
+ " print(\n",
633
+ " f\" Step {i + 1}: {action.intent} {action.parameter or ''} \"\n",
634
+ " f\"{action.direction or ''} {action.magnitude or ''} \"\n",
635
+ " f\"→ reward={r:.3f}, score={step_obs['p1_score']:.4f}\"\n",
636
  " )\n",
637
+ " if done:\n",
638
+ " print(f\" Episode done. Final score: {step_obs['p1_score']:.4f}\")\n",
639
+ " break\n",
 
 
 
 
 
 
 
 
640
  "\n",
641
+ "print(\"\\nEnvironment is live and accessible for training and evaluation.\")"
642
  ]
643
  }
644
  ],