Spaces:
Running on CPU Upgrade
Fusion Design Lab — Plan V2
Hackathon: OpenEnv Hackathon, March 7-8, 2026 Track: Statement 3.1 (World Modeling — Professional Tasks) Role: Planning and execution SSOT for this repo Updated: March 8, 2026
1. Submission Thesis
Fusion Design Lab is not only a "trained model for fusion" submission.
It is a clear, reproducible environment for one constrained scientific design task:
- official
P1benchmark semantics - narrow, human-playable action space
- real verifier feedback from
constellaration - explicit constraints and failure semantics
- reward logic that can be explained and iterated
The environment is the product. A trained policy is required supporting evidence because it demonstrates that the environment is learnable in practice rather than only manually playable.
2. Current State
Completed:
P1is locked as the single benchmark task- the repaired 4-knob low-dimensional runtime is live in code
- the official
constellarationverifier path is wired - the live environment is now unified onto one low-fidelity reward and verifier surface
submitremains an explicit terminal action on that same live contract- explicit VMEC failure semantics are implemented
- the Northflank smoke workflow is committed
- the Northflank smoke test passed on the team H100
- baseline comparison has been rerun on the real verifier path
- a coarse measured sweep note now exists
- the first tracked low-fidelity fixtures now exist
- an initial low-fidelity manual playtest note now exists
- paired high-fidelity fixture checks for those tracked fixtures now exist
- one submit-side manual playtest trace exists
- the repository GRPO notebook is checked in and aligned to the shared
fusion_lab/llm_agent.pyhelper contract - model-driven fixed-seed low-fidelity
monitor/evaluatetooling exists for LLM baselines
Still open:
- decision on whether reset-seed pool should change from paired checks
- HF Space deployment evidence
- public Colab mirror or notebook submission link, if the submission surface still requires it
- before/after trained-policy evidence on the current unified low-fidelity workflow
- demo and README polish after the artifacts are real
Current caution:
- do not present repaired-family ranges, deltas, or budget choices as settled defaults until the measured sweep is recorded
- do not narrate low-fidelity rollout metrics as final submission truth
- the standard notebook and
training/llm_rollout.pypaths should stay on the same live low-fidelity contract as the environment, including explicitsubmit - reserve higher-fidelity validation for paired fixture checks, offline validation scripts, and final evidence
3. Locked Decisions
These decisions are fixed unless a hard blocker appears:
- benchmark task:
P1 - submission framing:
Statement 3.1 - verifier of record:
constellaration.problems.GeometricalProblem - repo strategy: fresh wiring in this repo
- reuse policy: do not port the old
ai-sci-feasible-designsharness - scope rule: one stable task only
Execution rule:
- do not reopen strategy unless a real blocker appears
- convert decisions into code, fixtures, traces, baselines, or deployment work
4. Non-Negotiables
- Keep scope to one stable task.
- Keep claims conservative and evidence-backed.
- Do not let training-first work outrun environment stability.
- Do not rely on reward curves alone; keep trajectory evidence.
- Do not use reward complexity to hide a blocked action family.
- Do not polish repo or video before the environment and baselines are real.
Practical fail-fast rule:
- allow a tiny low-fidelity PPO smoke run before full submit-side validation
- use it only to surface obvious learnability bugs, reward exploits, or action-space problems
- stop after a few readable trajectories or one clear failure mode
- run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
- do not use low-fidelity training alone as proof that the terminal
submitcontract is trustworthy - keep any checkpoint high-fidelity evaluation sparse enough that it does not replace the low-fidelity inner loop
5. Document Roles
Use the docs like this:
- this file defines planning order, status, gates, and fallback rules
P1_ENV_CONTRACT_V1.mddefines the live technical contractP1_PARAMETERIZATION_DEEPDIVE.mdkeeps blocker evidence, sweep evidence, and supporting rationale- archived legacy planning docs live under
archive/and are not active SSOT surfaces
6. Artifact Plan
Visible artifacts:
- HF Space environment
- Repository training notebook
- Public Colab mirror or submission notebook link if required
- 1-minute demo video
- Public repo and README
Compute surfaces:
- Northflank is the main compute workspace for verifier-heavy work
- HF Space is the hosted environment surface
- the public notebook artifact should show trained-policy behavior against the live environment and can be mirrored to Colab if the submission form still requires it
- trained-policy work should iterate on the same live low-fidelity environment contract that will be demoed publicly
Evidence order:
- measured sweep note
- fixture checks
- manual playtest log
- tiny low-fi PPO smoke trace
- shared-helper notebook alignment
- model-driven low-fi LLM evaluation tooling
- reward iteration note
- stable local and remote episodes
- random and heuristic baselines
- before/after trained-policy evidence
- demo and repo polish
7. Environment Summary
The environment contract must stay narrow and legible:
- one repaired low-dimensional boundary family derived from a rotating-ellipse seed
- discrete
run | submit | restore_bestinteraction - one low-fidelity verifier surface for all live environment actions
- readable observation surface with explicit fidelity labeling
Reward V2keeps the verifier-nativeReward V1core and adds small best-so-far / anti-stagnation shaping for the low-fi repair loop
The live technical details belong in P1_ENV_CONTRACT_V1.md, not here.
8. Execution Order
- Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
- Pair the tracked low-fidelity fixtures with higher-fidelity validation checks immediately after the PPO smoke pass.
- Decide whether the reset pool should change based on the measured sweep plus those paired checks.
- Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
- Save one fixed-seed untrained baseline with the unified live
training/llm_rollout.py evaluateworkflow. - Run one short H100 GRPO pass with the repository notebook on that same unified low-fidelity workflow.
- Re-run the same seeds after training and save one before/after artifact.
- Adjust reward or penalties only if playtesting exposes a concrete problem.
- Refresh the heuristic baseline using the repaired-family evidence.
- Prove a stable local episode path.
- Deploy the same task contract to HF Space and prove one clean remote episode.
- Publish or mirror the notebook artifact only after the live before/after path is real.
- Record the demo around environment clarity, reward iteration, and baseline evidence.
- Polish the public repo only after the artifacts above exist.
9. Success Gates
Gate 1: measured sweep exists
- repaired-family ranges, deltas, and reset seeds are justified by recorded evidence
Gate 2: tiny PPO smoke is sane
- a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
- trajectories are readable enough to debug
- the smoke run stops at that diagnostic threshold instead of turning into a broader training phase
- current status: passed as a plumbing/debugging gate, with the first exposed failure mode recorded in
P1_PPO_SMOKE_NOTE.md
Gate 3: fixture checks pass
- good, boundary, and bad references behave as expected
- the paired high-fidelity checks happen immediately after the PPO smoke run, not as optional later work
Gate 4: manual playtest passes
- a human can read the observation
- a human can choose a plausible next action
- a human can explain the reward change
Gate 5: local episode is stable
- one clean trajectory is reproducible enough for demo use
Gate 6: baseline story is credible
- heuristic behavior is at least interpretable and preferable to random on the repaired task
Gate 7: remote surface is real
- HF Space preserves the same task contract as local
Gate 8: submission artifacts exist
- the public notebook artifact, demo, and README all reflect the actual environment rather than a hypothetical future one
Gate 9: trained-policy evidence is real
- one fixed-seed untrained baseline exists
- one short low-fidelity training pass exists on the same workflow
- the repo can show a before/after comparison on the same seeds using the live environment contract, including
submit
10. Fallback Rules
If training evidence is weak:
- keep claims conservative about policy quality
- still ship a trained-policy demonstration and document its limitations plainly
- do not skip the paired higher-fidelity validation artifacts
- do not split the notebook back onto a different submit contract than the live environment
If HF Space deployment is delayed:
- keep local and Northflank evidence first
- document the deployment blocker plainly
- do not invent remote claims without a real run
If reward behavior is confusing:
- fix observation clarity, step magnitudes, seed choice, or terminal semantics before adding reward complexity
If the repaired family is too hard:
- adjust ranges, deltas, or seeds from measured evidence
- do not expand into a broad Fourier action space just to rescue the hackathon scope
If the repaired family is too easy:
- prefer fixture and seed adjustments before broadening the action schema
11. Immediate Next Actions
- Record the measured sweep and choose provisional defaults from evidence.
- Check in tracked fixtures.
- Record the first manual playtest log.
- Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
- Pair the tracked fixtures with higher-fidelity validation checks.
- Record one submit-side manual trace.
- Refresh the heuristic baseline from that playtest evidence.
- Save one fixed-seed untrained baseline with
training/llm_rollout.py evaluate. - Run one short H100 GRPO pass with
training/notebooks/fusion_design_lab_training.ipynb. - Re-run the same seeds and save a before/after artifact.
- Verify one clean HF Space episode with the same contract.