Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

fusion-design-lab / TODO.md

CreativeEngineer

feat: reward verifier alignment, notebook hardening, model name fix

cdc237b 12 days ago

preview code

raw

history blame contribute delete

12.9 kB

Fusion Design Lab TODO

This is the execution tracker for the hackathon repo.

Use this file for day-of build progress. Use the linked docs for rationale, contract truth, and submission framing:

Archived legacy references:

Priority source:

Plan V2 is the planning SSOT
P1 Environment Contract is the technical contract SSOT
P1 Parameterization Deep-Dive is the evidence and rationale record
this file should track execution progress only

Current State

P1 strategy is locked
shared models reflect the repaired low-dimensional P1 contract
environment loop reflects the repaired low-dimensional P1 contract
API/task surface reflects P1
baselines reflect the P1 contract
repo docs call out the low-fi/high-fi constellaration split honestly
post-terminal guard in step()
constellaration verifier wiring
verify the current 3-knob family against the real low-fidelity verifier
repair the low-dimensional parameterization so triangularity is controllable
split boundary building from boundary evaluation
update the action schema from 3 knobs to the repaired low-dimensional family
add explicit VMEC failure semantics
label low-fi vs high-fi truth in the observation/task surface
separate high-fi submit scoring/reporting from low-fi rollout score state
tracked P1 fixtures
manual playtest log
settle the non-submit terminal reward policy
baseline comparison has been re-run on the constellaration branch state
tiny low-fi PPO smoke run exists Note: training/ppo_smoke.py now runs a diagnostic-only low-fidelity PPO smoke pass and the first artifact is summarized in docs/P1_PPO_SMOKE_NOTE.md
refresh the heuristic baseline for the real verifier path Note: the refreshed heuristic now uses the measured rotational_transform -> triangularity_scale -> elongation -> submit path; a fresh uv run python baselines/compare.py 5 rerun finished at 5/5 feasible high-fidelity finals and 5/5 wins over random

Execution Graph

flowchart TD
    A["Northflank Smoke Test"] --> E["Fixture Checks"]
    B["P1 Contract Lock"] --> D["P1 Models + Environment"]
    C["constellaration Physics Wiring"] --> D
    D --> P["Parameterization Repair"]
    P --> F["Tiny PPO Smoke"]
    F --> E["Fixture Checks"]
    E --> G["Submit-side Manual Playtest"]
    G --> H["Reward V2"]
    H --> I["Baselines"]
    I --> J["HF Space Deploy"]
    J --> K["Colab Notebook"]
    K --> L["Demo + README"]

Hour 0-2

Lock the exact P1 environment contract Goal: freeze observation schema, action schema, episode loop, terminal conditions, and the live reward contract Related: Plan V2, Next 12 Hours Checklist
Pass the Northflank smoke test Related: Plan V2, Next 12 Hours Checklist, training/notebooks/README.md
Verify that the current 3-knob family can or cannot approach P1 feasibility Goal: resolve the historical gating question about whether parameterization repair was required before more reward work Related: P1 Environment Contract, P1 Pivot Record

Fresh Wiring

Rewrite the shared models to the locked P1 contract Files: fusion_lab/models.py, Plan V2
Rewrite the environment loop to the locked P1 contract Files: server/environment.py, Plan V2, P1 Pivot Record
Add a post-terminal guard to the environment loop Files: server/environment.py Goal: reject or no-op any step() call after terminal state so budget and step count do not drift past episode end
Replace the synthetic physics path with constellaration wiring Files: server/physics.py, Dockerfile, pyproject.toml
Update the API/task surface to match P1 Files: server/app.py, README.md
Repair the low-dimensional boundary family Goal: add an explicit triangularity control knob or equivalent low-dimensional control so the environment can actually approach P1 feasibility Files: server/physics.py, fusion_lab/models.py, server/environment.py, server/app.py Related: P1 Environment Contract
Split boundary construction from boundary evaluation Goal: make the verifier boundary-based and keep parameterization-specific logic in the environment adapter layer Files: server/physics.py Related: P1 Environment Contract
Add explicit VMEC failure semantics Goal: failed evaluations must cost budget, return a visible failure observation, and apply a documented penalty without silent fallback Files: server/physics.py, server/environment.py Related: P1 Environment Contract
Label low-fi vs high-fi truth in the observation/task surface Goal: make it obvious whether a metric came from a low-fidelity run step or a high-fidelity submit Files: fusion_lab/models.py, server/environment.py, server/app.py Related: P1 Environment Contract
Separate high-fi submit scoring/reporting from low-fi rollout score state Completed: submit-time reward now uses a high-fidelity initial reference, and submit summaries / displayed best score use high-fidelity state instead of low-fidelity rollout state Files: server/environment.py fusion_lab/models.py Related: P1 Environment Contract

Validation and Reward

Run a small measured sweep on the repaired low-dimensional family Goal: choose useful parameter ranges, step deltas, and reset seeds from the repaired action family instead of guessing them from prose Related: P1 Environment Contract
Clarify or split fidelity-dependent best-state observation fields Goal: replace ambiguous mixed best-state reporting with explicit low-fidelity and high-fidelity best-state fields before fixture evidence or baseline comparisons Related: P1 Environment Contract
Add 1-2 tracked P1 fixtures Files: server/data/p1/README.md, P1 Pivot Record Note: paired high-fidelity submit checks are now written into each tracked fixture and summarized in baselines/fixture_high_fidelity_pairs.json
Run fixture sanity checks Goal: confirm paired low-fi/high-fi verifier outputs, objective direction, and reward ordering Related: Plan V2, Next 12 Hours Checklist
Run a tiny low-fi PPO smoke pass Goal: fail quickly on learnability, reward exploits, and action-space problems before investing in longer training Note: treat this as a smoke test, not as proof that the terminal submit contract is already validated stop after a few readable trajectories or one clear failure mode paired high-fidelity fixture checks must happen immediately after this smoke pass Status: first smoke artifact exists; next use of this step should only happen if a follow-up reward or observation change needs re-checking high-fidelity VMEC-backed submit should stay out of the normal RL inner loop
Manual-playtest 5-10 episodes Goal: start with one submit-side trace, then expand the initial low-fidelity playtest note into 5-10 episodes and surface at least one pathology or ambiguity Related: Plan V2, Deliverables Map
Update reward from V0 to V1 after playtesting exposed a real repair-path pathology Goal: keep a short exploit -> fix -> behavior improvement story Related: AGENTS.md, Plan V2
Update reward from V1 to V2 after the verifier-native shaping exposed short-horizon gaps Goal: add bounded new-best, near-feasible, and anti-stagnation terms without breaking the verifier-native reward story Related: AGENTS.md, P1 Environment Contract
Write down why Reward V0 did not survive unchanged Goal: document the concrete pathology: pure Δ official_feasibility hid useful non-dominant repairs because official feasibility is a max over normalized constraint violations Related: README.md, Plan V2
Decide the non-submit terminal reward policy Goal: budget exhaustion now yields a smaller end-of-episode reward than submit, so non-submitting agents still get terminal feedback without outranking explicit submit behavior Files: server/environment.py, README.md

Baselines

Implement the random baseline Files: baselines/random_agent.py, baselines/compare.py
Implement the heuristic baseline Files: baselines/heuristic_agent.py, baselines/compare.py
Run the baseline comparison on the current constellaration branch state Files: baselines/compare.py
Refresh the heuristic baseline after the constellaration rerun Goal: the old synthetic-path heuristic no longer gives a useful anchor on the real verifier path; redesign it after manual playtesting
Save one comparison trace that is presentation-ready Goal: show at least one stable trajectory and one heuristic-vs-random comparison

Submission Surfaces

Deploy the environment to HF Space Related: Deliverables Map, README.md
Create the thin public Colab notebook Files: training/notebooks/README.md
Record the 1-minute demo Goal: explain P1, show one trajectory, show reward iteration, show baseline evidence
Finalize the public README Files: README.md
Only treat training evidence as submission-ready if low-fidelity gains survive sparse high-fidelity evaluation Related: Plan V2, Next 12 Hours Checklist

Guardrails

Do not reopen P1 + rotating-ellipse strategy without a real blocker
Do not pretend the current 3-knob family is sufficient for P1 after the verified triangularity blocker
Do not guess repaired-family ranges, deltas, or budget changes without measurement
Do not port the old ai-sci-feasible-designs harness
Do not let notebook or demo work outrun environment evidence
Do not let tiny low-fi smoke training replace paired high-fidelity checks or submit-side manual playtesting
Do not move high-fidelity VMEC-backed submit into the normal RL inner loop
Do not describe low-fidelity run metrics as equivalent to high-fidelity submit results
Do not compare high-fidelity submit scores against low-fidelity best/initial score state in the final story
Do not describe the current baseline reset state as feasible or near-feasible
Do not force a new reward-version story until the previous reward version shows a real pathology Note: completed by recording the concrete Reward V0 pathology before Reward V1, then recording the concrete short-horizon Reward V1 gaps before Reward V2