Commit ·
8bf0155
1
Parent(s): 567ff67
feat: add replay playtest and tighten fail-fast validation
Browse files- README.md +8 -8
- TODO.md +5 -3
- baselines/measured_sweep.py +45 -12
- baselines/replay_playtest.py +207 -0
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +6 -2
- docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md +6 -5
- training/notebooks/northflank_smoke.py +13 -18
README.md
CHANGED
|
@@ -30,7 +30,7 @@ Implementation status:
|
|
| 30 |
- the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
|
| 31 |
- the repaired 4-knob low-dimensional family is now wired into the runtime path
|
| 32 |
- the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
|
| 33 |
-
- the next runtime work is a tiny low-fi PPO smoke run
|
| 34 |
|
| 35 |
## Execution Status
|
| 36 |
|
|
@@ -52,7 +52,7 @@ Implementation status:
|
|
| 52 |
- [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
|
| 53 |
- [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
|
| 54 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 55 |
-
- [ ] Run a tiny low-fi PPO smoke run, then
|
| 56 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
| 57 |
- [ ] Deploy the real environment to HF Space
|
| 58 |
|
|
@@ -67,12 +67,12 @@ Implementation status:
|
|
| 67 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
| 68 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 69 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 70 |
-
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is a tiny low-fi PPO smoke run
|
| 71 |
|
| 72 |
Current mode:
|
| 73 |
|
| 74 |
- strategic task choice is already locked
|
| 75 |
-
- the next work is a tiny low-fi PPO smoke run
|
| 76 |
- new planning text should only appear when a real blocker forces a decision change
|
| 77 |
|
| 78 |
## Planned Repository Layout
|
|
@@ -117,7 +117,7 @@ uv sync --extra notebooks
|
|
| 117 |
- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
|
| 118 |
- OpenEnv deployment target: Hugging Face Spaces
|
| 119 |
- Minimal submission notebook target: Colab
|
| 120 |
-
- Required notebook artifact: one public Colab notebook
|
| 121 |
- Verifier of record: `constellaration.problems.GeometricalProblem`
|
| 122 |
- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
|
| 123 |
- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
|
|
@@ -126,10 +126,10 @@ uv sync --extra notebooks
|
|
| 126 |
|
| 127 |
## Immediate Next Steps
|
| 128 |
|
| 129 |
-
- [ ]
|
| 130 |
-
- [ ]
|
| 131 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 132 |
-
- [ ] Run at least one submit-side manual trace
|
| 133 |
- [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
|
| 134 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 135 |
- [ ] Deploy the environment to HF Space.
|
|
|
|
| 30 |
- the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
|
| 31 |
- the repaired 4-knob low-dimensional family is now wired into the runtime path
|
| 32 |
- the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
|
| 33 |
+
- the next runtime work is a tiny low-fi PPO smoke run as a diagnostic-only check, followed immediately by paired high-fidelity fixture checks and one real submit-side manual trace
|
| 34 |
|
| 35 |
## Execution Status
|
| 36 |
|
|
|
|
| 52 |
- [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
|
| 53 |
- [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
|
| 54 |
- [x] Add tracked `P1` fixtures under `server/data/p1/`
|
| 55 |
+
- [ ] Run a tiny low-fi PPO smoke run as a diagnostic-only check, then complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
|
| 56 |
- [ ] Refresh the heuristic baseline for the real verifier path
|
| 57 |
- [ ] Deploy the real environment to HF Space
|
| 58 |
|
|
|
|
| 67 |
- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
|
| 68 |
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 69 |
- The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
|
| 70 |
+
- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is a tiny low-fi PPO smoke run used only to surface obvious learnability bugs, followed immediately by high-fidelity fixture pairing and one real `submit` trace.
|
| 71 |
|
| 72 |
Current mode:
|
| 73 |
|
| 74 |
- strategic task choice is already locked
|
| 75 |
+
- the next work is a tiny low-fi PPO smoke run as a smoke test only, then paired high-fidelity fixture checks, one submit-side manual trace, heuristic refresh, smoke validation, and deployment
|
| 76 |
- new planning text should only appear when a real blocker forces a decision change
|
| 77 |
|
| 78 |
## Planned Repository Layout
|
|
|
|
| 117 |
- Recommended compute workspace: Northflank Jupyter Notebook with PyTorch on the team H100
|
| 118 |
- OpenEnv deployment target: Hugging Face Spaces
|
| 119 |
- Minimal submission notebook target: Colab
|
| 120 |
+
- Required notebook artifact: one public Colab notebook that demonstrates trained-policy behavior against the environment
|
| 121 |
- Verifier of record: `constellaration.problems.GeometricalProblem`
|
| 122 |
- Environment style: fresh wiring in this repo, not a port of the old `ai-sci-feasible-designs` harness
|
| 123 |
- Northflank containers are ephemeral, so persistent storage should be attached before relying on saved models, caches, or fixture data
|
|
|
|
| 126 |
|
| 127 |
## Immediate Next Steps
|
| 128 |
|
| 129 |
+
- [ ] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
|
| 130 |
+
- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
|
| 131 |
- [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
|
| 132 |
+
- [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
|
| 133 |
- [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
|
| 134 |
- [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
|
| 135 |
- [ ] Deploy the environment to HF Space.
|
TODO.md
CHANGED
|
@@ -54,9 +54,9 @@ flowchart TD
|
|
| 54 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 55 |
C["constellaration Physics Wiring"] --> D
|
| 56 |
D --> P["Parameterization Repair"]
|
| 57 |
-
P -->
|
| 58 |
-
|
| 59 |
-
|
| 60 |
G --> H["Reward V1"]
|
| 61 |
H --> I["Baselines"]
|
| 62 |
I --> J["HF Space Deploy"]
|
|
@@ -196,6 +196,8 @@ flowchart TD
|
|
| 196 |
fail quickly on learnability, reward exploits, and action-space problems before investing in longer training
|
| 197 |
Note:
|
| 198 |
treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
|
|
|
|
|
|
|
| 199 |
|
| 200 |
- [ ] Manual-playtest 5-10 episodes
|
| 201 |
Goal:
|
|
|
|
| 54 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 55 |
C["constellaration Physics Wiring"] --> D
|
| 56 |
D --> P["Parameterization Repair"]
|
| 57 |
+
P --> F["Tiny PPO Smoke"]
|
| 58 |
+
F --> E["Fixture Checks"]
|
| 59 |
+
E --> G["Submit-side Manual Playtest"]
|
| 60 |
G --> H["Reward V1"]
|
| 61 |
H --> I["Baselines"]
|
| 62 |
I --> J["HF Space Deploy"]
|
|
|
|
| 196 |
fail quickly on learnability, reward exploits, and action-space problems before investing in longer training
|
| 197 |
Note:
|
| 198 |
treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
|
| 199 |
+
stop after a few readable trajectories or one clear failure mode
|
| 200 |
+
paired high-fidelity fixture checks must happen immediately after this smoke pass
|
| 201 |
|
| 202 |
- [ ] Manual-playtest 5-10 episodes
|
| 203 |
Goal:
|
baselines/measured_sweep.py
CHANGED
|
@@ -4,7 +4,11 @@ Validates ranges, crash zones, feasibility regions, and identifies
|
|
| 4 |
candidate reset seeds for the repaired low-dimensional family.
|
| 5 |
|
| 6 |
Usage:
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
"""
|
| 9 |
|
| 10 |
from __future__ import annotations
|
|
@@ -29,15 +33,29 @@ def linspace_inclusive(low: float, high: float, n: int) -> list[float]:
|
|
| 29 |
return [round(float(v), 4) for v in np.linspace(low, high, n)]
|
| 30 |
|
| 31 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
def parse_args() -> argparse.Namespace:
|
| 33 |
parser = argparse.ArgumentParser(
|
| 34 |
description="Run a measured low-fidelity sweep over the repaired 4-knob family."
|
| 35 |
)
|
| 36 |
-
parser.
|
|
|
|
| 37 |
"--grid-points",
|
| 38 |
type=int,
|
| 39 |
default=3,
|
| 40 |
-
help="Number of evenly spaced points per parameter range.",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
)
|
| 42 |
parser.add_argument(
|
| 43 |
"--output-dir",
|
|
@@ -48,12 +66,15 @@ def parse_args() -> argparse.Namespace:
|
|
| 48 |
return parser.parse_args()
|
| 49 |
|
| 50 |
|
| 51 |
-
def run_sweep(*, grid_points: int) -> list[dict]:
|
| 52 |
-
if
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
configs = list(
|
| 59 |
product(
|
|
@@ -108,7 +129,8 @@ def run_sweep(*, grid_points: int) -> list[dict]:
|
|
| 108 |
f"{rate:.1f} eval/s"
|
| 109 |
)
|
| 110 |
|
| 111 |
-
|
|
|
|
| 112 |
|
| 113 |
|
| 114 |
def analyze(results: list[dict]) -> dict:
|
|
@@ -196,17 +218,28 @@ def analyze(results: list[dict]) -> dict:
|
|
| 196 |
|
| 197 |
def main() -> None:
|
| 198 |
args = parse_args()
|
| 199 |
-
results = run_sweep(
|
|
|
|
|
|
|
|
|
|
| 200 |
|
| 201 |
out_dir = args.output_dir
|
| 202 |
out_dir.mkdir(exist_ok=True)
|
|
|
|
| 203 |
timestamp = time.strftime("%Y%m%dT%H%M%SZ", time.gmtime())
|
| 204 |
out_path = out_dir / f"measured_sweep_{timestamp}.json"
|
| 205 |
|
| 206 |
analysis = analyze(results)
|
| 207 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 208 |
with open(out_path, "w") as f:
|
| 209 |
-
json.dump({"analysis": analysis, "results": results}, f, indent=2)
|
| 210 |
print(f"\nResults saved to {out_path}")
|
| 211 |
|
| 212 |
|
|
|
|
| 4 |
candidate reset seeds for the repaired low-dimensional family.
|
| 5 |
|
| 6 |
Usage:
|
| 7 |
+
# Broad evenly spaced grid (default 3 points per parameter)
|
| 8 |
+
uv run python baselines/measured_sweep.py --grid-points 5
|
| 9 |
+
|
| 10 |
+
# Targeted sweep around the known feasible zone
|
| 11 |
+
uv run python baselines/measured_sweep.py --targeted
|
| 12 |
"""
|
| 13 |
|
| 14 |
from __future__ import annotations
|
|
|
|
| 33 |
return [round(float(v), 4) for v in np.linspace(low, high, n)]
|
| 34 |
|
| 35 |
|
| 36 |
+
TARGETED_VALUES: dict[str, list[float]] = {
|
| 37 |
+
"aspect_ratio": [3.4, 3.6, 3.8],
|
| 38 |
+
"elongation": [1.2, 1.4, 1.6],
|
| 39 |
+
"rotational_transform": [1.50, 1.55, 1.60, 1.65, 1.70, 1.75, 1.80],
|
| 40 |
+
"triangularity_scale": [0.55, 0.58, 0.60, 0.62, 0.65],
|
| 41 |
+
}
|
| 42 |
+
|
| 43 |
+
|
| 44 |
def parse_args() -> argparse.Namespace:
|
| 45 |
parser = argparse.ArgumentParser(
|
| 46 |
description="Run a measured low-fidelity sweep over the repaired 4-knob family."
|
| 47 |
)
|
| 48 |
+
mode = parser.add_mutually_exclusive_group()
|
| 49 |
+
mode.add_argument(
|
| 50 |
"--grid-points",
|
| 51 |
type=int,
|
| 52 |
default=3,
|
| 53 |
+
help="Number of evenly spaced points per parameter range (default: 3).",
|
| 54 |
+
)
|
| 55 |
+
mode.add_argument(
|
| 56 |
+
"--targeted",
|
| 57 |
+
action="store_true",
|
| 58 |
+
help="Use the pre-defined targeted value set around the known feasible zone.",
|
| 59 |
)
|
| 60 |
parser.add_argument(
|
| 61 |
"--output-dir",
|
|
|
|
| 66 |
return parser.parse_args()
|
| 67 |
|
| 68 |
|
| 69 |
+
def run_sweep(*, grid_points: int, targeted: bool = False) -> tuple[list[dict], float]:
|
| 70 |
+
if targeted:
|
| 71 |
+
grids = TARGETED_VALUES
|
| 72 |
+
else:
|
| 73 |
+
if grid_points < 2:
|
| 74 |
+
raise ValueError("--grid-points must be at least 2.")
|
| 75 |
+
grids = {
|
| 76 |
+
name: linspace_inclusive(lo, hi, grid_points) for name, (lo, hi) in SWEEP_RANGES.items()
|
| 77 |
+
}
|
| 78 |
|
| 79 |
configs = list(
|
| 80 |
product(
|
|
|
|
| 129 |
f"{rate:.1f} eval/s"
|
| 130 |
)
|
| 131 |
|
| 132 |
+
total_elapsed = time.monotonic() - t0
|
| 133 |
+
return results, total_elapsed
|
| 134 |
|
| 135 |
|
| 136 |
def analyze(results: list[dict]) -> dict:
|
|
|
|
| 218 |
|
| 219 |
def main() -> None:
|
| 220 |
args = parse_args()
|
| 221 |
+
results, elapsed_s = run_sweep(
|
| 222 |
+
grid_points=args.grid_points,
|
| 223 |
+
targeted=args.targeted,
|
| 224 |
+
)
|
| 225 |
|
| 226 |
out_dir = args.output_dir
|
| 227 |
out_dir.mkdir(exist_ok=True)
|
| 228 |
+
mode_label = "targeted" if args.targeted else f"grid{args.grid_points}"
|
| 229 |
timestamp = time.strftime("%Y%m%dT%H%M%SZ", time.gmtime())
|
| 230 |
out_path = out_dir / f"measured_sweep_{timestamp}.json"
|
| 231 |
|
| 232 |
analysis = analyze(results)
|
| 233 |
|
| 234 |
+
metadata = {
|
| 235 |
+
"mode": mode_label,
|
| 236 |
+
"timestamp": timestamp,
|
| 237 |
+
"elapsed_seconds": round(elapsed_s, 1),
|
| 238 |
+
"seconds_per_eval": round(elapsed_s / max(len(results), 1), 2),
|
| 239 |
+
}
|
| 240 |
+
|
| 241 |
with open(out_path, "w") as f:
|
| 242 |
+
json.dump({"metadata": metadata, "analysis": analysis, "results": results}, f, indent=2)
|
| 243 |
print(f"\nResults saved to {out_path}")
|
| 244 |
|
| 245 |
|
baselines/replay_playtest.py
ADDED
|
@@ -0,0 +1,207 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Fixed-action replay playtest for reward branch coverage.
|
| 2 |
+
|
| 3 |
+
Runs 5 scripted episodes against StellaratorEnvironment directly.
|
| 4 |
+
Each episode targets specific untested reward branches.
|
| 5 |
+
|
| 6 |
+
Episodes:
|
| 7 |
+
1. Seed 0 — repair + feasible-side objective shaping + budget exhaustion
|
| 8 |
+
2. Seed 1 — repair from different seed (ar=3.4, rt=1.6)
|
| 9 |
+
3. Seed 2 — boundary clamping (ar=3.8 = upper bound)
|
| 10 |
+
4. Seed 0 — push rt into crash zone + restore_best
|
| 11 |
+
5. Seed 0 — repair + objective move + explicit submit
|
| 12 |
+
"""
|
| 13 |
+
|
| 14 |
+
from __future__ import annotations
|
| 15 |
+
|
| 16 |
+
import json
|
| 17 |
+
import sys
|
| 18 |
+
from dataclasses import asdict, dataclass
|
| 19 |
+
from typing import Sequence
|
| 20 |
+
|
| 21 |
+
from fusion_lab.models import StellaratorAction, StellaratorObservation
|
| 22 |
+
from server.environment import StellaratorEnvironment
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
@dataclass(frozen=True)
|
| 26 |
+
class StepRecord:
|
| 27 |
+
step: int
|
| 28 |
+
intent: str
|
| 29 |
+
action_label: str
|
| 30 |
+
score: float
|
| 31 |
+
feasibility: float
|
| 32 |
+
constraints_satisfied: bool
|
| 33 |
+
evaluation_fidelity: str
|
| 34 |
+
evaluation_failed: bool
|
| 35 |
+
max_elongation: float
|
| 36 |
+
reward: float
|
| 37 |
+
budget_remaining: int
|
| 38 |
+
done: bool
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
def _action_label(action: StellaratorAction) -> str:
|
| 42 |
+
if action.intent != "run":
|
| 43 |
+
return action.intent
|
| 44 |
+
return f"{action.parameter} {action.direction} {action.magnitude}"
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def _record(obs: StellaratorObservation, step: int, action: StellaratorAction) -> StepRecord:
|
| 48 |
+
return StepRecord(
|
| 49 |
+
step=step,
|
| 50 |
+
intent=action.intent,
|
| 51 |
+
action_label=_action_label(action),
|
| 52 |
+
score=obs.p1_score,
|
| 53 |
+
feasibility=obs.p1_feasibility,
|
| 54 |
+
constraints_satisfied=obs.constraints_satisfied,
|
| 55 |
+
evaluation_fidelity=obs.evaluation_fidelity,
|
| 56 |
+
evaluation_failed=obs.evaluation_failed,
|
| 57 |
+
max_elongation=obs.max_elongation,
|
| 58 |
+
reward=obs.reward or 0.0,
|
| 59 |
+
budget_remaining=obs.budget_remaining,
|
| 60 |
+
done=obs.done,
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def _run_episode(
|
| 65 |
+
env: StellaratorEnvironment,
|
| 66 |
+
seed: int,
|
| 67 |
+
actions: Sequence[StellaratorAction],
|
| 68 |
+
label: str,
|
| 69 |
+
) -> list[StepRecord]:
|
| 70 |
+
obs = env.reset(seed=seed)
|
| 71 |
+
print(f"\n{'=' * 72}")
|
| 72 |
+
print(f"Episode: {label}")
|
| 73 |
+
print(f"Seed: {seed}")
|
| 74 |
+
print(
|
| 75 |
+
f" reset score={obs.p1_score:.6f} feasibility={obs.p1_feasibility:.6f} "
|
| 76 |
+
f"constraints={'yes' if obs.constraints_satisfied else 'no'} "
|
| 77 |
+
f"elongation={obs.max_elongation:.4f} budget={obs.budget_remaining}"
|
| 78 |
+
)
|
| 79 |
+
|
| 80 |
+
records: list[StepRecord] = []
|
| 81 |
+
for i, action in enumerate(actions, start=1):
|
| 82 |
+
if obs.done:
|
| 83 |
+
print(f" (episode ended before step {i})")
|
| 84 |
+
break
|
| 85 |
+
obs = env.step(action)
|
| 86 |
+
rec = _record(obs, i, action)
|
| 87 |
+
records.append(rec)
|
| 88 |
+
|
| 89 |
+
status = (
|
| 90 |
+
"FAIL" if rec.evaluation_failed else ("OK" if rec.constraints_satisfied else "viol")
|
| 91 |
+
)
|
| 92 |
+
print(
|
| 93 |
+
f" step {i:2d} {rec.action_label:<42s} "
|
| 94 |
+
f"reward={rec.reward:+8.4f} score={rec.score:.6f} "
|
| 95 |
+
f"feas={rec.feasibility:.6f} elong={rec.max_elongation:.4f} "
|
| 96 |
+
f"status={status} budget={rec.budget_remaining} "
|
| 97 |
+
f"{'DONE' if rec.done else ''}"
|
| 98 |
+
)
|
| 99 |
+
|
| 100 |
+
total_reward = sum(r.reward for r in records)
|
| 101 |
+
print(f" total_reward={total_reward:+.4f}")
|
| 102 |
+
return records
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
def _run(action: str, param: str, direction: str, magnitude: str) -> StellaratorAction:
|
| 106 |
+
return StellaratorAction(
|
| 107 |
+
intent="run",
|
| 108 |
+
parameter=param,
|
| 109 |
+
direction=direction,
|
| 110 |
+
magnitude=magnitude,
|
| 111 |
+
)
|
| 112 |
+
|
| 113 |
+
|
| 114 |
+
def _submit() -> StellaratorAction:
|
| 115 |
+
return StellaratorAction(intent="submit")
|
| 116 |
+
|
| 117 |
+
|
| 118 |
+
def _restore() -> StellaratorAction:
|
| 119 |
+
return StellaratorAction(intent="restore_best")
|
| 120 |
+
|
| 121 |
+
|
| 122 |
+
# ── Episode definitions ──────────────────────────────────────────────────
|
| 123 |
+
|
| 124 |
+
EPISODE_1 = (
|
| 125 |
+
"seed0_repair_objective_exhaustion",
|
| 126 |
+
0,
|
| 127 |
+
[
|
| 128 |
+
_run("run", "triangularity_scale", "increase", "medium"), # cross feasibility
|
| 129 |
+
_run("run", "elongation", "decrease", "small"), # feasible-side shaping
|
| 130 |
+
_run("run", "elongation", "decrease", "small"), # more shaping
|
| 131 |
+
_run("run", "elongation", "decrease", "small"), # more shaping
|
| 132 |
+
_run("run", "elongation", "decrease", "small"), # more shaping
|
| 133 |
+
_run("run", "elongation", "decrease", "small"), # budget=0 → done bonus
|
| 134 |
+
],
|
| 135 |
+
)
|
| 136 |
+
|
| 137 |
+
EPISODE_2 = (
|
| 138 |
+
"seed1_repair_different_seed",
|
| 139 |
+
1,
|
| 140 |
+
[
|
| 141 |
+
_run(
|
| 142 |
+
"run", "triangularity_scale", "increase", "medium"
|
| 143 |
+
), # cross feasibility from ar=3.4,rt=1.6
|
| 144 |
+
_run("run", "elongation", "decrease", "small"), # feasible-side shaping
|
| 145 |
+
_run("run", "elongation", "decrease", "small"), # more shaping
|
| 146 |
+
_run("run", "triangularity_scale", "increase", "small"), # push tri further
|
| 147 |
+
_run("run", "elongation", "decrease", "small"), # more shaping
|
| 148 |
+
_run("run", "elongation", "decrease", "small"), # budget exhaustion
|
| 149 |
+
],
|
| 150 |
+
)
|
| 151 |
+
|
| 152 |
+
EPISODE_3 = (
|
| 153 |
+
"seed2_boundary_clamping",
|
| 154 |
+
2,
|
| 155 |
+
[
|
| 156 |
+
_run("run", "aspect_ratio", "increase", "large"), # ar=3.8 + 0.2 → clamped at 3.8
|
| 157 |
+
_run("run", "triangularity_scale", "increase", "medium"), # repair toward feasibility
|
| 158 |
+
_run("run", "triangularity_scale", "increase", "medium"), # push further
|
| 159 |
+
_run("run", "elongation", "decrease", "small"), # shaping if feasible
|
| 160 |
+
_run("run", "aspect_ratio", "decrease", "large"), # move ar down
|
| 161 |
+
_run("run", "elongation", "decrease", "small"), # budget exhaustion
|
| 162 |
+
],
|
| 163 |
+
)
|
| 164 |
+
|
| 165 |
+
EPISODE_4 = (
|
| 166 |
+
"seed0_crash_recovery_restore",
|
| 167 |
+
0,
|
| 168 |
+
[
|
| 169 |
+
_run("run", "triangularity_scale", "increase", "medium"), # cross feasibility first
|
| 170 |
+
_run("run", "rotational_transform", "increase", "large"), # rt 1.5→1.7
|
| 171 |
+
_run("run", "rotational_transform", "increase", "large"), # rt 1.7→1.9 (crash zone)
|
| 172 |
+
_restore(), # recover best state
|
| 173 |
+
_run("run", "elongation", "decrease", "small"), # continue from best
|
| 174 |
+
_run("run", "elongation", "decrease", "small"), # budget exhaustion
|
| 175 |
+
],
|
| 176 |
+
)
|
| 177 |
+
|
| 178 |
+
EPISODE_5 = (
|
| 179 |
+
"seed0_repair_objective_submit",
|
| 180 |
+
0,
|
| 181 |
+
[
|
| 182 |
+
_run("run", "triangularity_scale", "increase", "medium"), # cross feasibility
|
| 183 |
+
_run("run", "elongation", "decrease", "small"), # feasible-side objective
|
| 184 |
+
_submit(), # explicit high-fidelity submit
|
| 185 |
+
],
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
ALL_EPISODES = [EPISODE_1, EPISODE_2, EPISODE_3, EPISODE_4, EPISODE_5]
|
| 189 |
+
|
| 190 |
+
|
| 191 |
+
def main(output_json: str | None = None) -> None:
|
| 192 |
+
env = StellaratorEnvironment()
|
| 193 |
+
all_results: dict[str, list[dict[str, object]]] = {}
|
| 194 |
+
|
| 195 |
+
for label, seed, actions in ALL_EPISODES:
|
| 196 |
+
records = _run_episode(env, seed, actions, label)
|
| 197 |
+
all_results[label] = [asdict(r) for r in records]
|
| 198 |
+
|
| 199 |
+
if output_json:
|
| 200 |
+
with open(output_json, "w") as f:
|
| 201 |
+
json.dump(all_results, f, indent=2)
|
| 202 |
+
print(f"\nResults written to {output_json}")
|
| 203 |
+
|
| 204 |
+
|
| 205 |
+
if __name__ == "__main__":
|
| 206 |
+
out = sys.argv[1] if len(sys.argv) > 1 else None
|
| 207 |
+
main(output_json=out)
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -80,6 +80,8 @@ Practical fail-fast rule:
|
|
| 80 |
|
| 81 |
- allow a tiny low-fidelity PPO smoke run before full submit-side validation
|
| 82 |
- use it only to surface obvious learnability bugs, reward exploits, or action-space problems
|
|
|
|
|
|
|
| 83 |
- do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
|
| 84 |
|
| 85 |
## 5. Document Roles
|
|
@@ -133,8 +135,8 @@ The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V
|
|
| 133 |
|
| 134 |
## 8. Execution Order
|
| 135 |
|
| 136 |
-
- [ ] Run a tiny low-fidelity PPO smoke pass and
|
| 137 |
-
- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit checks.
|
| 138 |
- [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
|
| 139 |
- [ ] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
|
| 140 |
- [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
|
|
@@ -155,10 +157,12 @@ Gate 2: tiny PPO smoke is sane
|
|
| 155 |
|
| 156 |
- a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
|
| 157 |
- trajectories are readable enough to debug
|
|
|
|
| 158 |
|
| 159 |
Gate 3: fixture checks pass
|
| 160 |
|
| 161 |
- good, boundary, and bad references behave as expected
|
|
|
|
| 162 |
|
| 163 |
Gate 4: manual playtest passes
|
| 164 |
|
|
|
|
| 80 |
|
| 81 |
- allow a tiny low-fidelity PPO smoke run before full submit-side validation
|
| 82 |
- use it only to surface obvious learnability bugs, reward exploits, or action-space problems
|
| 83 |
+
- stop after a few readable trajectories or one clear failure mode
|
| 84 |
+
- run paired high-fidelity fixture checks and one real submit-side trace immediately after the smoke run
|
| 85 |
- do not use low-fidelity training alone as proof that the terminal `submit` contract is trustworthy
|
| 86 |
|
| 87 |
## 5. Document Roles
|
|
|
|
| 135 |
|
| 136 |
## 8. Execution Order
|
| 137 |
|
| 138 |
+
- [ ] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
|
| 139 |
+
- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit checks immediately after the PPO smoke pass.
|
| 140 |
- [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
|
| 141 |
- [ ] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
|
| 142 |
- [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
|
|
|
|
| 157 |
|
| 158 |
- a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
|
| 159 |
- trajectories are readable enough to debug
|
| 160 |
+
- the smoke run stops at that diagnostic threshold instead of turning into a broader training phase
|
| 161 |
|
| 162 |
Gate 3: fixture checks pass
|
| 163 |
|
| 164 |
- good, boundary, and bad references behave as expected
|
| 165 |
+
- the paired high-fidelity checks happen immediately after the PPO smoke run, not as optional later work
|
| 166 |
|
| 167 |
Gate 4: manual playtest passes
|
| 168 |
|
docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -14,8 +14,9 @@ Use these docs instead:
|
|
| 14 |
Current execution priority remains:
|
| 15 |
|
| 16 |
1. measured sweep
|
| 17 |
-
2.
|
| 18 |
-
3.
|
| 19 |
-
4.
|
| 20 |
-
5.
|
| 21 |
-
6.
|
|
|
|
|
|
| 14 |
Current execution priority remains:
|
| 15 |
|
| 16 |
1. measured sweep
|
| 17 |
+
2. tiny PPO smoke pass as a diagnostic-only check
|
| 18 |
+
3. tracked fixtures with paired high-fidelity submit checks
|
| 19 |
+
4. one submit-side manual trace, then broader manual playtest
|
| 20 |
+
5. heuristic baseline refresh
|
| 21 |
+
6. HF Space proof
|
| 22 |
+
7. notebook, demo, and repo polish
|
training/notebooks/northflank_smoke.py
CHANGED
|
@@ -7,10 +7,8 @@ from datetime import UTC, datetime
|
|
| 7 |
from importlib.metadata import version
|
| 8 |
from pathlib import Path
|
| 9 |
|
| 10 |
-
from
|
| 11 |
-
|
| 12 |
-
from server.environment import BASELINE_PARAMS, N_FIELD_PERIODS
|
| 13 |
-
from server.physics import EvaluationMetrics, evaluate_params
|
| 14 |
|
| 15 |
|
| 16 |
DEFAULT_OUTPUT_DIR = Path("training/notebooks/artifacts")
|
|
@@ -23,15 +21,15 @@ class SmokeArtifact:
|
|
| 23 |
boundary_type: str
|
| 24 |
n_field_periods: int
|
| 25 |
params: dict[str, float]
|
| 26 |
-
metrics: dict[str, float | bool]
|
| 27 |
|
| 28 |
|
| 29 |
def parse_args() -> argparse.Namespace:
|
| 30 |
parser = argparse.ArgumentParser(
|
| 31 |
description=(
|
| 32 |
"Run the Fusion Design Lab Northflank smoke check: generate one "
|
| 33 |
-
"rotating-ellipse boundary, run one
|
| 34 |
-
"and write a JSON artifact."
|
| 35 |
)
|
| 36 |
)
|
| 37 |
parser.add_argument(
|
|
@@ -47,23 +45,17 @@ def parse_args() -> argparse.Namespace:
|
|
| 47 |
|
| 48 |
|
| 49 |
def build_artifact() -> SmokeArtifact:
|
| 50 |
-
boundary =
|
| 51 |
-
|
| 52 |
-
elongation=BASELINE_PARAMS.elongation,
|
| 53 |
-
rotational_transform=BASELINE_PARAMS.rotational_transform,
|
| 54 |
-
n_field_periods=N_FIELD_PERIODS,
|
| 55 |
-
)
|
| 56 |
-
metrics = evaluate_params(
|
| 57 |
-
BASELINE_PARAMS,
|
| 58 |
n_field_periods=N_FIELD_PERIODS,
|
| 59 |
-
fidelity="low",
|
| 60 |
)
|
|
|
|
| 61 |
return SmokeArtifact(
|
| 62 |
created_at_utc=datetime.now(UTC).isoformat(),
|
| 63 |
constellaration_version=version("constellaration"),
|
| 64 |
boundary_type=type(boundary).__name__,
|
| 65 |
n_field_periods=N_FIELD_PERIODS,
|
| 66 |
-
params=
|
| 67 |
metrics=_metrics_payload(metrics),
|
| 68 |
)
|
| 69 |
|
|
@@ -76,8 +68,11 @@ def write_artifact(output_dir: Path, artifact: SmokeArtifact) -> Path:
|
|
| 76 |
return output_path
|
| 77 |
|
| 78 |
|
| 79 |
-
def _metrics_payload(metrics: EvaluationMetrics) -> dict[str, float | bool]:
|
| 80 |
return {
|
|
|
|
|
|
|
|
|
|
| 81 |
"max_elongation": metrics.max_elongation,
|
| 82 |
"aspect_ratio": metrics.aspect_ratio,
|
| 83 |
"average_triangularity": metrics.average_triangularity,
|
|
|
|
| 7 |
from importlib.metadata import version
|
| 8 |
from pathlib import Path
|
| 9 |
|
| 10 |
+
from server.contract import N_FIELD_PERIODS, SMOKE_TEST_PARAMS
|
| 11 |
+
from server.physics import EvaluationMetrics, build_boundary_from_params, evaluate_boundary
|
|
|
|
|
|
|
| 12 |
|
| 13 |
|
| 14 |
DEFAULT_OUTPUT_DIR = Path("training/notebooks/artifacts")
|
|
|
|
| 21 |
boundary_type: str
|
| 22 |
n_field_periods: int
|
| 23 |
params: dict[str, float]
|
| 24 |
+
metrics: dict[str, str | float | bool]
|
| 25 |
|
| 26 |
|
| 27 |
def parse_args() -> argparse.Namespace:
|
| 28 |
parser = argparse.ArgumentParser(
|
| 29 |
description=(
|
| 30 |
"Run the Fusion Design Lab Northflank smoke check: generate one "
|
| 31 |
+
"rotating-ellipse-derived low-dimensional boundary, run one "
|
| 32 |
+
"low-fidelity verifier call, and write a JSON artifact."
|
| 33 |
)
|
| 34 |
)
|
| 35 |
parser.add_argument(
|
|
|
|
| 45 |
|
| 46 |
|
| 47 |
def build_artifact() -> SmokeArtifact:
|
| 48 |
+
boundary = build_boundary_from_params(
|
| 49 |
+
SMOKE_TEST_PARAMS,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
n_field_periods=N_FIELD_PERIODS,
|
|
|
|
| 51 |
)
|
| 52 |
+
metrics = evaluate_boundary(boundary, fidelity="low")
|
| 53 |
return SmokeArtifact(
|
| 54 |
created_at_utc=datetime.now(UTC).isoformat(),
|
| 55 |
constellaration_version=version("constellaration"),
|
| 56 |
boundary_type=type(boundary).__name__,
|
| 57 |
n_field_periods=N_FIELD_PERIODS,
|
| 58 |
+
params=SMOKE_TEST_PARAMS.model_dump(),
|
| 59 |
metrics=_metrics_payload(metrics),
|
| 60 |
)
|
| 61 |
|
|
|
|
| 68 |
return output_path
|
| 69 |
|
| 70 |
|
| 71 |
+
def _metrics_payload(metrics: EvaluationMetrics) -> dict[str, str | float | bool]:
|
| 72 |
return {
|
| 73 |
+
"evaluation_fidelity": metrics.evaluation_fidelity,
|
| 74 |
+
"evaluation_failed": metrics.evaluation_failed,
|
| 75 |
+
"failure_reason": metrics.failure_reason,
|
| 76 |
"max_elongation": metrics.max_elongation,
|
| 77 |
"aspect_ratio": metrics.aspect_ratio,
|
| 78 |
"average_triangularity": metrics.average_triangularity,
|