Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 8

Commit

c3a24db

1 Parent(s): a02ffad

feat: add high-fidelity validation evidence

Browse files

Files changed (12) hide show

README.md +11 -8
TODO.md +8 -4
baselines/fixture_high_fidelity_pairs.json +144 -0
baselines/high_fidelity_validation.py +488 -0
baselines/submit_side_trace.json +106 -0
docs/FUSION_DESIGN_LAB_PLAN_V2.md +12 -10
docs/P1_MANUAL_PLAYTEST_LOG.md +30 -1
server/data/p1/FIXTURE_SANITY.md +20 -4
server/data/p1/README.md +1 -1
server/data/p1/bad_low_iota.json +18 -3
server/data/p1/boundary_default_reset.json +18 -3
server/data/p1/lowfi_feasible_local.json +18 -3

README.md CHANGED Viewed

@@ -30,7 +30,8 @@ Implementation status:
 - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
 - the repaired 4-knob low-dimensional family is now wired into the runtime path
 - the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
-- the next runtime work is a tiny low-fi PPO smoke run as a diagnostic-only check, followed immediately by paired high-fidelity fixture checks and one real submit-side manual trace
 ## Execution Status
@@ -52,7 +53,8 @@ Implementation status:
 - [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
 - [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
 - [x] Add tracked `P1` fixtures under `server/data/p1/`
-- [ ] Run a tiny low-fi PPO smoke run as a diagnostic-only check, then complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
 - [ ] Refresh the heuristic baseline for the real verifier path
 - [ ] Deploy the real environment to HF Space
@@ -60,7 +62,7 @@ Implementation status:
 - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
 - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
-- The tracked fixtures in `server/data/p1/` are currently low-fidelity-calibrated. Do not narrate them as fully paired low-fi/high-fi references until the submit-side spot checks land.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - High-fidelity VMEC-backed `submit` is too expensive to serve as the normal RL inner loop. Keep training rollouts on low-fidelity `run`, then use high-fidelity calls for paired fixtures, submit-side traces, sparse checkpoint evaluation, and final evidence.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
@@ -68,12 +70,13 @@ Implementation status:
 - Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
-- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is a tiny low-fi PPO smoke run used only to surface obvious learnability bugs, followed immediately by high-fidelity fixture pairing and one real `submit` trace.
 Current mode:
 - strategic task choice is already locked
-- the next work is a tiny low-fi PPO smoke run as a smoke test only, then paired high-fidelity fixture checks, one submit-side manual trace, heuristic refresh, smoke validation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout
@@ -127,10 +130,10 @@ uv sync --extra notebooks
 ## Immediate Next Steps
-- [ ] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
-- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
 - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
-- [ ] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
 - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
 - [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.

 - the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
 - the repaired 4-knob low-dimensional family is now wired into the runtime path
 - the first measured sweep note, tracked low-fidelity fixtures, and an initial low-fidelity manual playtest note now exist
+- the first tiny low-fi PPO smoke artifact and paired high-fidelity fixture checks now exist
+- a one-trajectory submit-side manual trace has now been recorded
 ## Execution Status
 - [x] Label low-fi `run` truth vs high-fi `submit` truth in observations and task docs
 - [x] Separate high-fidelity submit scoring/reporting from low-fidelity rollout score state
 - [x] Add tracked `P1` fixtures under `server/data/p1/`
+- [x] Run a tiny low-fi PPO smoke run as a diagnostic-only check and save one trajectory artifact
+- [x] Complete paired high-fidelity fixture checks and at least one real submit-side manual trace before any broader training push
 - [ ] Refresh the heuristic baseline for the real verifier path
 - [ ] Deploy the real environment to HF Space
 - Historical blocker note: the old 3-knob family was structurally blocked on P1 triangularity with the real verifier path. A sampled low-fidelity sweep kept `average_triangularity` at roughly `+0.004975` and `p1_feasibility` at roughly `1.00995`, with zero feasible samples. That blocker motivated the repaired 4-knob runtime that is now live.
 - The repaired family now has a first coarse measured sweep note in [docs/P1_MEASURED_SWEEP_NOTE.md](docs/P1_MEASURED_SWEEP_NOTE.md), but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks.
+- The paired low-fi/high-fi fixture snapshots are now written into each fixture JSON and summarized in `baselines/fixture_high_fidelity_pairs.json`.
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - High-fidelity VMEC-backed `submit` is too expensive to serve as the normal RL inner loop. Keep training rollouts on low-fidelity `run`, then use high-fidelity calls for paired fixtures, submit-side traces, sparse checkpoint evaluation, and final evidence.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
+- The first low-fidelity manual playtest note is in [docs/P1_MANUAL_PLAYTEST_LOG.md](docs/P1_MANUAL_PLAYTEST_LOG.md). The next fail-fast step is now baseline refresh and reset-seed confirmation backed by the paired high-fidelity evidence.
+- The first tiny PPO smoke note is in [docs/P1_PPO_SMOKE_NOTE.md](docs/P1_PPO_SMOKE_NOTE.md). It produced a valid trajectory artifact and exposed a repeated-action local failure, which is the right outcome for a smoke run.
 Current mode:
 - strategic task choice is already locked
+- the next work is heuristic refresh, reset-seed confirmation, and deployment
 - new planning text should only appear when a real blocker forces a decision change
 ## Planned Repository Layout
 ## Immediate Next Steps
+- [x] Run a tiny low-fidelity PPO smoke run and stop after a few readable trajectories or one clear failure mode.
+- [x] Pair the tracked low-fidelity fixtures with high-fidelity submit spot checks immediately after the PPO smoke run.
+- [x] Run at least one submit-side manual trace before any broader training push, then record the first real reward pathology, if any.
 - [ ] Decide whether any reset seed should move based on the measured sweep plus those paired checks.
 - [ ] Keep any checkpoint high-fidelity evaluation sparse enough that the low-fidelity inner loop stays fast.
 - [ ] Refresh the heuristic baseline using measured sweep and playtest evidence, then save one comparison trace.
 - [ ] Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.

TODO.md CHANGED Viewed

@@ -43,7 +43,9 @@ Priority source:
 - [x] manual playtest log
 - [x] settle the non-submit terminal reward policy
 - [x] baseline comparison has been re-run on the `constellaration` branch state
-- [ ] tiny low-fi PPO smoke run exists
 - [ ] refresh the heuristic baseline for the real verifier path
 ## Execution Graph
@@ -182,22 +184,24 @@ flowchart TD
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)
   Note:
-  first tracked fixtures are low-fidelity-calibrated; add paired high-fidelity submit checks next
-- [ ] Run fixture sanity checks
   Goal:
   confirm paired low-fi/high-fi verifier outputs, objective direction, and reward ordering
   Related:
   [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
   [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
-- [ ] Run a tiny low-fi PPO smoke pass
   Goal:
   fail quickly on learnability, reward exploits, and action-space problems before investing in longer training
   Note:
   treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
   stop after a few readable trajectories or one clear failure mode
   paired high-fidelity fixture checks must happen immediately after this smoke pass
   high-fidelity VMEC-backed `submit` should stay out of the normal RL inner loop
 - [ ] Manual-playtest 5-10 episodes

 - [x] manual playtest log
 - [x] settle the non-submit terminal reward policy
 - [x] baseline comparison has been re-run on the `constellaration` branch state
+- [x] tiny low-fi PPO smoke run exists
+  Note:
+  `training/ppo_smoke.py` now runs a diagnostic-only low-fidelity PPO smoke pass and the first artifact is summarized in `docs/P1_PPO_SMOKE_NOTE.md`
 - [ ] refresh the heuristic baseline for the real verifier path
 ## Execution Graph
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/archive/PIVOT_P1_ROTATING_ELLIPSE.md)
   Note:
+  paired high-fidelity submit checks are now written into each tracked fixture and summarized in `baselines/fixture_high_fidelity_pairs.json`
+- [x] Run fixture sanity checks
   Goal:
   confirm paired low-fi/high-fi verifier outputs, objective direction, and reward ordering
   Related:
   [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
   [Next 12 Hours Checklist](docs/archive/FUSION_NEXT_12_HOURS_CHECKLIST.md)
+- [x] Run a tiny low-fi PPO smoke pass
   Goal:
   fail quickly on learnability, reward exploits, and action-space problems before investing in longer training
   Note:
   treat this as a smoke test, not as proof that the terminal `submit` contract is already validated
   stop after a few readable trajectories or one clear failure mode
   paired high-fidelity fixture checks must happen immediately after this smoke pass
+  Status:
+  first smoke artifact exists; next use of this step should only happen if a follow-up reward or observation change needs re-checking
   high-fidelity VMEC-backed `submit` should stay out of the normal RL inner loop
 - [ ] Manual-playtest 5-10 episodes

baselines/fixture_high_fidelity_pairs.json ADDED Viewed

	@@ -0,0 +1,144 @@

+{
+  "timestamp_utc": "2026-03-08T07:07:29.939307+00:00",
+  "n_field_periods": 3,
+  "fixture_count": 3,
+  "pass_count": 3,
+  "fail_count": 0,
+  "results": [
+    {
+      "name": "bad_low_iota",
+      "file": "server/data/p1/bad_low_iota.json",
+      "status": "pass",
+      "low_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": false,
+        "p1_score": 0.0,
+        "p1_feasibility": 0.575134593927855,
+        "max_elongation": 5.983792904658967,
+        "aspect_ratio": 2.802311169335037,
+        "average_triangularity": -0.5512332332730122,
+        "edge_iota_over_nfp": 0.12745962182164347,
+        "vacuum_well": -1.0099648537211192,
+        "evaluation_fidelity": "low",
+        "failure_reason": ""
+      },
+      "high_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": false,
+        "p1_score": 0.0,
+        "p1_feasibility": 0.5763570514697449,
+        "max_elongation": 5.9831792818066525,
+        "aspect_ratio": 2.802311169335037,
+        "average_triangularity": -0.5512332332730122,
+        "edge_iota_over_nfp": 0.12709288455907652,
+        "vacuum_well": -1.0111716777365585,
+        "evaluation_fidelity": "high",
+        "failure_reason": ""
+      },
+      "comparison": {
+        "low_high_feasibility_match": true,
+        "feasibility_delta": 0.0012224575418898764,
+        "score_delta": 0.0,
+        "ranking_compatibility": "match",
+        "low_fidelity_stored_p1_score": 0.0,
+        "low_fidelity_stored_p1_feasibility": 0.575134593927855,
+        "low_fidelity_snapshot": {
+          "missing_fields": [],
+          "drift_fields": {},
+          "mismatches": [],
+          "max_abs_drift": 0.0
+        }
+      }
+    },
+    {
+      "name": "boundary_default_reset",
+      "file": "server/data/p1/boundary_default_reset.json",
+      "status": "pass",
+      "low_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": false,
+        "p1_score": 0.0,
+        "p1_feasibility": 0.0506528382250242,
+        "max_elongation": 6.13677115978351,
+        "aspect_ratio": 3.31313049868072,
+        "average_triangularity": -0.4746735808874879,
+        "edge_iota_over_nfp": 0.2906263991807532,
+        "vacuum_well": -0.7537878932672235,
+        "evaluation_fidelity": "low",
+        "failure_reason": ""
+      },
+      "high_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": false,
+        "p1_score": 0.0,
+        "p1_feasibility": 0.0506528382250242,
+        "max_elongation": 6.134177903677296,
+        "aspect_ratio": 3.31313049868072,
+        "average_triangularity": -0.4746735808874879,
+        "edge_iota_over_nfp": 0.28971623977263294,
+        "vacuum_well": -0.7554909069955263,
+        "evaluation_fidelity": "high",
+        "failure_reason": ""
+      },
+      "comparison": {
+        "low_high_feasibility_match": true,
+        "feasibility_delta": 0.0,
+        "score_delta": 0.0,
+        "ranking_compatibility": "match",
+        "low_fidelity_stored_p1_score": 0.0,
+        "low_fidelity_stored_p1_feasibility": 0.0506528382250242,
+        "low_fidelity_snapshot": {
+          "missing_fields": [],
+          "drift_fields": {},
+          "mismatches": [],
+          "max_abs_drift": 0.0
+        }
+      }
+    },
+    {
+      "name": "lowfi_feasible_local",
+      "file": "server/data/p1/lowfi_feasible_local.json",
+      "status": "pass",
+      "low_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": true,
+        "p1_score": 0.29165951078327634,
+        "p1_feasibility": 0.0,
+        "max_elongation": 7.375064402950513,
+        "aspect_ratio": 3.2870514531062405,
+        "average_triangularity": -0.5002923204919443,
+        "edge_iota_over_nfp": 0.30046082924426193,
+        "vacuum_well": -0.7949586699110935,
+        "evaluation_fidelity": "low",
+        "failure_reason": ""
+      },
+      "high_fidelity": {
+        "evaluation_failed": false,
+        "constraints_satisfied": true,
+        "p1_score": 0.2920325118884466,
+        "p1_feasibility": 0.0,
+        "max_elongation": 7.37170739300398,
+        "aspect_ratio": 3.2870514531062405,
+        "average_triangularity": -0.5002923204919443,
+        "edge_iota_over_nfp": 0.300057398146058,
+        "vacuum_well": -0.7963320784471227,
+        "evaluation_fidelity": "high",
+        "failure_reason": ""
+      },
+      "comparison": {
+        "low_high_feasibility_match": true,
+        "feasibility_delta": 0.0,
+        "score_delta": 0.0003730011051702453,
+        "ranking_compatibility": "match",
+        "low_fidelity_stored_p1_score": 0.29165951078327634,
+        "low_fidelity_stored_p1_feasibility": 0.0,
+        "low_fidelity_snapshot": {
+          "missing_fields": [],
+          "drift_fields": {},
+          "mismatches": [],
+          "max_abs_drift": 0.0
+        }
+      }
+    }
+  ]
+}

baselines/high_fidelity_validation.py ADDED Viewed

	@@ -0,0 +1,488 @@

+"""Validation utilities for high-fidelity fixture pairing and submit-side traces."""
+from __future__ import annotations
+import argparse
+import json
+from dataclasses import asdict, dataclass
+from datetime import UTC, datetime
+from pathlib import Path
+from pprint import pformat
+from time import perf_counter
+from typing import Any
+from fusion_lab.models import LowDimBoundaryParams, StellaratorAction
+from server.contract import N_FIELD_PERIODS
+from server.environment import StellaratorEnvironment
+from server.physics import EvaluationMetrics, build_boundary_from_params, evaluate_boundary
+LOW_FIDELITY_TOLERANCE = 1.0e-6
+def _float(value: Any) -> float | None:
+    if isinstance(value, bool):
+        return None
+    try:
+        return float(value)
+    except (TypeError, ValueError):
+        return None
+@dataclass(frozen=True)
+class FixturePairResult:
+    name: str
+    file: str
+    status: str
+    low_fidelity: dict[str, Any]
+    high_fidelity: dict[str, Any]
+    comparison: dict[str, Any]
+@dataclass(frozen=True)
+class TraceStep:
+    step: int
+    intent: str
+    action: str
+    reward: float
+    score: float
+    feasibility: float
+    constraints_satisfied: bool
+    feasibility_delta: float | None
+    score_delta: float | None
+    max_elongation: float
+    p1_feasibility: float
+    budget_remaining: int
+    evaluation_fidelity: str
+    done: bool
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description=(
+            "Run paired high-fidelity fixture checks and a submit-side manual trace "
+            "for the repaired P1 contract."
+        )
+    )
+    parser.add_argument(
+        "--fixture-dir",
+        type=Path,
+        default=Path("server/data/p1"),
+        help="Directory containing tracked P1 fixture JSON files.",
+    )
+    parser.add_argument(
+        "--fixture-output",
+        type=Path,
+        default=Path("baselines/fixture_high_fidelity_pairs.json"),
+        help="Output path for paired fixture summary JSON.",
+    )
+    parser.add_argument(
+        "--trace-output",
+        type=Path,
+        default=Path("baselines/submit_side_trace.json"),
+        help="Output path for one submit-side manual trace JSON.",
+    )
+    parser.add_argument(
+        "--no-write-fixture-updates",
+        action="store_true",
+        help="Do not write paired high-fidelity results back into fixture files.",
+    )
+    parser.add_argument(
+        "--skip-submit-trace",
+        action="store_true",
+        help="Only run paired fixture checks.",
+    )
+    parser.add_argument(
+        "--seed",
+        type=int,
+        default=0,
+        help="Seed for the submit-side manual trace reset state.",
+    )
+    parser.add_argument(
+        "--submit-action-sequence",
+        type=str,
+        default=(
+            "run:rotational_transform:increase:medium,"
+            "run:triangularity_scale:increase:medium,"
+            "run:elongation:decrease:small,"
+            "submit"
+        ),
+        help=(
+            "Comma-separated submit trace sequence. "
+            "Run actions are `run:parameter:direction:magnitude`; include `submit` as the last token."
+        ),
+    )
+    return parser.parse_args()
+def _fixture_files(fixture_dir: Path) -> list[Path]:
+    return sorted(path for path in fixture_dir.glob("*.json") if path.is_file())
+def _load_fixture(path: Path) -> dict[str, Any]:
+    with path.open("r") as file:
+        return json.load(file)
+def _metrics_payload(metrics: EvaluationMetrics) -> dict[str, Any]:
+    return {
+        "evaluation_failed": metrics.evaluation_failed,
+        "constraints_satisfied": metrics.constraints_satisfied,
+        "p1_score": metrics.p1_score,
+        "p1_feasibility": metrics.p1_feasibility,
+        "max_elongation": metrics.max_elongation,
+        "aspect_ratio": metrics.aspect_ratio,
+        "average_triangularity": metrics.average_triangularity,
+        "edge_iota_over_nfp": metrics.edge_iota_over_nfp,
+        "vacuum_well": metrics.vacuum_well,
+        "evaluation_fidelity": metrics.evaluation_fidelity,
+        "failure_reason": metrics.failure_reason,
+    }
+def _parse_submit_sequence(raw: str) -> list[StellaratorAction]:
+    actions: list[StellaratorAction] = []
+    for token in raw.split(","):
+        token = token.strip()
+        if not token:
+            continue
+        if token == "submit":
+            actions.append(StellaratorAction(intent="submit"))
+            continue
+        parts = token.split(":")
+        if len(parts) != 4 or parts[0] != "run":
+            raise ValueError(
+                "Expected token format `run:parameter:direction:magnitude` or `submit`."
+            )
+        _, parameter, direction, magnitude = parts
+        actions.append(
+            StellaratorAction(
+                intent="run",
+                parameter=parameter,
+                direction=direction,
+                magnitude=magnitude,
+            )
+        )
+    if not actions:
+        raise ValueError("submit-action-sequence must include at least one action.")
+    if actions[-1].intent != "submit":
+        raise ValueError("submit-action-sequence must end with submit.")
+    return actions
+def _compare_low_snapshot(
+    stored: dict[str, Any],
+    current: dict[str, Any],
+) -> tuple[bool, dict[str, Any]]:
+    numeric_keys = [
+        "p1_feasibility",
+        "p1_score",
+        "max_elongation",
+        "aspect_ratio",
+        "average_triangularity",
+        "edge_iota_over_nfp",
+        "vacuum_well",
+    ]
+    exact_keys = [
+        "constraints_satisfied",
+        "evaluation_fidelity",
+        "evaluation_failed",
+        "failure_reason",
+    ]
+    missing_fields: list[str] = []
+    drift_fields: dict[str, dict[str, float]] = {}
+    mismatches: list[dict[str, Any]] = []
+    max_abs_drift = 0.0
+    for key in numeric_keys:
+        if key not in stored:
+            missing_fields.append(key)
+            continue
+        expected = _float(stored.get(key))
+        actual = _float(current.get(key))
+        if expected is None or actual is None:
+            mismatches.append(
+                {
+                    "field": key,
+                    "expected": stored.get(key),
+                    "actual": current.get(key),
+                    "reason": "non-numeric",
+                }
+            )
+            continue
+        drift = abs(expected - actual)
+        max_abs_drift = max(max_abs_drift, drift)
+        if drift > LOW_FIDELITY_TOLERANCE:
+            drift_fields[key] = {
+                "expected": expected,
+                "actual": actual,
+                "abs_drift": drift,
+            }
+            mismatches.append(
+                {
+                    "field": key,
+                    "expected": expected,
+                    "actual": actual,
+                    "abs_drift": drift,
+                }
+            )
+    for key in exact_keys:
+        if key not in stored:
+            missing_fields.append(key)
+            continue
+        expected = stored.get(key)
+        actual = current.get(key)
+        if expected != actual:
+            mismatches.append(
+                {
+                    "field": key,
+                    "expected": expected,
+                    "actual": actual,
+                    "reason": "exact-mismatch",
+                }
+            )
+    return (
+        not missing_fields and not mismatches,
+        {
+            "missing_fields": missing_fields,
+            "drift_fields": drift_fields,
+            "mismatches": mismatches,
+            "max_abs_drift": max_abs_drift,
+        },
+    )
+def _pair_fixture(path: Path) -> FixturePairResult:
+    data = _load_fixture(path)
+    params = LowDimBoundaryParams.model_validate(data["params"])
+    boundary = build_boundary_from_params(params, n_field_periods=N_FIELD_PERIODS)
+    low = evaluate_boundary(boundary, fidelity="low")
+    high = evaluate_boundary(boundary, fidelity="high")
+    low_payload = _metrics_payload(low)
+    high_payload = _metrics_payload(high)
+    low_snapshot_ok, low_snapshot = _compare_low_snapshot(
+        data.get("low_fidelity", {}),
+        low_payload,
+    )
+    feasible_match = low.constraints_satisfied == high.constraints_satisfied
+    ranking_compat = (
+        "ambiguous"
+        if low.evaluation_failed or high.evaluation_failed
+        else "match"
+        if feasible_match
+        else "mismatch"
+    )
+    comparison: dict[str, Any] = {
+        "low_high_feasibility_match": feasible_match,
+        "feasibility_delta": high.p1_feasibility - low.p1_feasibility,
+        "score_delta": high.p1_score - low.p1_score,
+        "ranking_compatibility": ranking_compat,
+        "low_fidelity_stored_p1_score": data.get("low_fidelity", {}).get("p1_score"),
+        "low_fidelity_stored_p1_feasibility": data.get("low_fidelity", {}).get("p1_feasibility"),
+        "low_fidelity_snapshot": low_snapshot,
+    }
+    status = "pass"
+    if low.evaluation_failed or high.evaluation_failed or not feasible_match or not low_snapshot_ok:
+        status = "fail"
+        if not low_snapshot_ok:
+            print(f"  low-fidelity snapshot mismatch:\n{pformat(low_snapshot)}")
+    return FixturePairResult(
+        name=str(data.get("name", path.stem)),
+        file=str(path),
+        status=status,
+        low_fidelity=low_payload,
+        high_fidelity=high_payload,
+        comparison=comparison,
+    )
+def _write_json(payload: dict[str, Any], path: Path) -> None:
+    path.parent.mkdir(parents=True, exist_ok=True)
+    with path.open("w") as file:
+        json.dump(payload, file, indent=2)
+def _run_fixture_checks(
+    *,
+    fixture_dir: Path,
+    fixture_output: Path,
+    write_fixture_updates: bool,
+) -> tuple[list[FixturePairResult], int]:
+    results: list[FixturePairResult] = []
+    fail_count = 0
+    for path in _fixture_files(fixture_dir):
+        print(f"Evaluating fixture: {path.name}")
+        fixture_start = perf_counter()
+        result = _pair_fixture(path)
+        if result.status != "pass":
+            fail_count += 1
+        results.append(result)
+        if write_fixture_updates:
+            fixture = _load_fixture(path)
+            fixture["high_fidelity"] = result.high_fidelity
+            fixture["paired_high_fidelity_timestamp_utc"] = datetime.now(tz=UTC).isoformat()
+            with path.open("w") as file:
+                json.dump(fixture, file, indent=2)
+        elapsed = perf_counter() - fixture_start
+        print(
+            "  done in "
+            f"{elapsed:0.1f}s | low_feasible={result.low_fidelity['constraints_satisfied']} "
+            f"| high_feasible={result.high_fidelity['constraints_satisfied']} "
+            f"| status={result.status}"
+        )
+    pass_count = len(results) - fail_count
+    payload = {
+        "timestamp_utc": datetime.now(tz=UTC).isoformat(),
+        "n_field_periods": N_FIELD_PERIODS,
+        "fixture_count": len(results),
+        "pass_count": pass_count,
+        "fail_count": fail_count,
+        "results": [asdict(result) for result in results],
+    }
+    _write_json(payload, fixture_output)
+    return results, fail_count
+def _run_submit_trace(
+    trace_output: Path,
+    *,
+    seed: int,
+    action_sequence: str,
+) -> dict[str, Any]:
+    env = StellaratorEnvironment()
+    obs = env.reset(seed=seed)
+    initial_state = env.state
+    actions = _parse_submit_sequence(action_sequence)
+    trace: list[dict[str, Any]] = [
+        {
+            "step": 0,
+            "intent": "reset",
+            "action": f"reset(seed={seed})",
+            "reward": 0.0,
+            "score": obs.p1_score,
+            "feasibility": obs.p1_feasibility,
+            "feasibility_delta": None,
+            "score_delta": None,
+            "constraints_satisfied": obs.constraints_satisfied,
+            "max_elongation": obs.max_elongation,
+            "p1_feasibility": obs.p1_feasibility,
+            "budget_remaining": obs.budget_remaining,
+            "evaluation_fidelity": obs.evaluation_fidelity,
+            "done": obs.done,
+            "params": initial_state.current_params.model_dump(),
+        }
+    ]
+    previous_feasibility = obs.p1_feasibility
+    previous_score = obs.p1_score
+    for idx, action in enumerate(actions, start=1):
+        obs = env.step(action)
+        trace.append(
+            asdict(
+                TraceStep(
+                    step=idx,
+                    intent=action.intent,
+                    action=(
+                        f"{action.parameter} {action.direction} {action.magnitude}"
+                        if action.intent == "run"
+                        else action.intent
+                    ),
+                    reward=float(obs.reward or 0.0),
+                    score=obs.p1_score,
+                    feasibility=obs.p1_feasibility,
+                    constraints_satisfied=obs.constraints_satisfied,
+                    feasibility_delta=obs.p1_feasibility - previous_feasibility,
+                    score_delta=obs.p1_score - previous_score,
+                    max_elongation=obs.max_elongation,
+                    p1_feasibility=obs.p1_feasibility,
+                    budget_remaining=obs.budget_remaining,
+                    evaluation_fidelity=obs.evaluation_fidelity,
+                    done=obs.done,
+                )
+            )
+        )
+        previous_feasibility = obs.p1_feasibility
+        previous_score = obs.p1_score
+        if obs.done:
+            break
+    total_reward = sum(step["reward"] for step in trace)
+    payload = {
+        "trace_label": "submit_side_manual",
+        "trace_profile": action_sequence,
+        "timestamp_utc": datetime.now(tz=UTC).isoformat(),
+        "n_field_periods": N_FIELD_PERIODS,
+        "seed": seed,
+        "total_reward": total_reward,
+        "final_score": obs.p1_score,
+        "final_feasibility": obs.p1_feasibility,
+        "final_constraints_satisfied": obs.constraints_satisfied,
+        "final_evaluation_fidelity": obs.evaluation_fidelity,
+        "final_evaluation_failed": obs.evaluation_failed,
+        "steps": trace,
+        "final_best_low_fidelity_score": obs.best_low_fidelity_score,
+        "final_best_low_fidelity_feasibility": obs.best_low_fidelity_feasibility,
+        "final_best_high_fidelity_score": obs.best_high_fidelity_score,
+        "final_best_high_fidelity_feasibility": obs.best_high_fidelity_feasibility,
+        "final_diagnostics_text": obs.diagnostics_text,
+    }
+    _write_json(payload, trace_output)
+    return payload
+def main() -> int:
+    args = parse_args()
+    results, fail_count = _run_fixture_checks(
+        fixture_dir=args.fixture_dir,
+        fixture_output=args.fixture_output,
+        write_fixture_updates=not args.no_write_fixture_updates,
+    )
+    print(
+        f"Paired fixtures: {len(results)} total, {len(results) - fail_count} pass, {fail_count} fail"
+    )
+    for result in results:
+        print(
+            f"  - {result.name}: {result.status} "
+            f"(low={result.low_fidelity['constraints_satisfied']} "
+            f"high={result.high_fidelity['constraints_satisfied']})"
+        )
+    if not args.skip_submit_trace:
+        trace = _run_submit_trace(
+            args.trace_output,
+            seed=args.seed,
+            action_sequence=args.submit_action_sequence,
+        )
+        print(
+            f"Manual submit trace written to {args.trace_output} | "
+            f"sequence='{trace['trace_profile']}' | "
+            f"final_feasibility={trace['final_feasibility']:.6f} | "
+            f"fidelity={trace['final_evaluation_fidelity']}"
+        )
+    return 1 if fail_count else 0
+if __name__ == "__main__":
+    raise SystemExit(main())

baselines/submit_side_trace.json ADDED Viewed

	@@ -0,0 +1,106 @@

+{
+  "trace_label": "submit_side_manual",
+  "trace_profile": "run:rotational_transform:increase:medium,run:triangularity_scale:increase:medium,run:elongation:decrease:small,submit",
+  "timestamp_utc": "2026-03-08T07:07:43.478814+00:00",
+  "n_field_periods": 3,
+  "seed": 0,
+  "total_reward": 5.3296,
+  "final_score": 0.29605869964467535,
+  "final_feasibility": 0.0008652388718514148,
+  "final_constraints_satisfied": true,
+  "final_evaluation_fidelity": "high",
+  "final_evaluation_failed": false,
+  "steps": [
+    {
+      "step": 0,
+      "intent": "reset",
+      "action": "reset(seed=0)",
+      "reward": 0.0,
+      "score": 0.0,
+      "feasibility": 0.0506528382250242,
+      "feasibility_delta": null,
+      "score_delta": null,
+      "constraints_satisfied": false,
+      "max_elongation": 6.13677115978351,
+      "p1_feasibility": 0.0506528382250242,
+      "budget_remaining": 6,
+      "evaluation_fidelity": "low",
+      "done": false,
+      "params": {
+        "aspect_ratio": 3.6,
+        "elongation": 1.4,
+        "rotational_transform": 1.5,
+        "triangularity_scale": 0.55
+      }
+    },
+    {
+      "step": 1,
+      "intent": "run",
+      "action": "rotational_transform increase medium",
+      "reward": -0.1,
+      "score": 0.0,
+      "feasibility": 0.05065283822502309,
+      "constraints_satisfied": false,
+      "feasibility_delta": -1.1102230246251565e-15,
+      "score_delta": 0.0,
+      "max_elongation": 6.729528139593349,
+      "p1_feasibility": 0.05065283822502309,
+      "budget_remaining": 5,
+      "evaluation_fidelity": "low",
+      "done": false
+    },
+    {
+      "step": 2,
+      "intent": "run",
+      "action": "triangularity_scale increase medium",
+      "reward": 3.1533,
+      "score": 0.29165951078326,
+      "feasibility": 0.0,
+      "constraints_satisfied": true,
+      "feasibility_delta": -0.05065283822502309,
+      "score_delta": 0.29165951078326,
+      "max_elongation": 7.37506440295066,
+      "p1_feasibility": 0.0,
+      "budget_remaining": 4,
+      "evaluation_fidelity": "low",
+      "done": false
+    },
+    {
+      "step": 3,
+      "intent": "run",
+      "action": "elongation decrease small",
+      "reward": 0.2665,
+      "score": 0.2957311862720885,
+      "feasibility": 0.0008652388718514148,
+      "constraints_satisfied": true,
+      "feasibility_delta": 0.0008652388718514148,
+      "score_delta": 0.0040716754888284745,
+      "max_elongation": 7.338419323551204,
+      "p1_feasibility": 0.0008652388718514148,
+      "budget_remaining": 3,
+      "evaluation_fidelity": "low",
+      "done": false
+    },
+    {
+      "step": 4,
+      "intent": "submit",
+      "action": "submit",
+      "reward": 2.0098,
+      "score": 0.29605869964467535,
+      "feasibility": 0.0008652388718514148,
+      "constraints_satisfied": true,
+      "feasibility_delta": 0.0,
+      "score_delta": 0.00032751337258685176,
+      "max_elongation": 7.335471703197922,
+      "p1_feasibility": 0.0008652388718514148,
+      "budget_remaining": 3,
+      "evaluation_fidelity": "high",
+      "done": true
+    }
+  ],
+  "final_best_low_fidelity_score": 0.2957311862720885,
+  "final_best_low_fidelity_feasibility": 0.0008652388718514148,
+  "final_best_high_fidelity_score": 0.29605869964467535,
+  "final_best_high_fidelity_feasibility": 0.0008652388718514148,
+  "final_diagnostics_text": "Submitted current_high_fidelity_score=0.296059, best_high_fidelity_score=0.296059, best_high_fidelity_feasibility=0.000865, constraints=SATISFIED.\n\nevaluation_fidelity=high\nevaluation_status=OK\nmax_elongation=7.3355\naspect_ratio=3.2897  (<= 4.0)\naverage_triangularity=-0.4996  (<= -0.5)\nedge_iota_over_nfp=0.3030  (>= 0.3)\nfeasibility=0.000865\nbest_low_fidelity_score=0.295731\nbest_low_fidelity_feasibility=0.000865\nbest_high_fidelity_score=0.296059\nbest_high_fidelity_feasibility=0.000865\nvacuum_well=-0.8079\nconstraints=SATISFIED\nstep=4  |  budget=3/6"
+}

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -35,12 +35,13 @@ Completed:
 - a coarse measured sweep note now exists
 - the first tracked low-fidelity fixtures now exist
 - an initial low-fidelity manual playtest note now exists
 Still open:
 - tiny low-fidelity PPO smoke evidence
-- paired high-fidelity checks for the tracked fixtures
-- submit-side manual playtest evidence
 - heuristic baseline refresh on the repaired real-verifier path
 - HF Space deployment evidence
 - Colab artifact wiring
@@ -114,9 +115,9 @@ Compute surfaces:
 Evidence order:
 - [x] measured sweep note
-- [ ] fixture checks
 - [x] manual playtest log
-- [ ] tiny low-fi PPO smoke trace
 - [ ] reward iteration note
 - [ ] stable local and remote episodes
 - [x] random and heuristic baselines
@@ -138,10 +139,10 @@ The live technical details belong in [`P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V
 ## 8. Execution Order
-- [ ] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
-- [ ] Pair the tracked low-fidelity fixtures with high-fidelity submit checks immediately after the PPO smoke pass.
 - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
-- [ ] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
 - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
 - [ ] Refresh the heuristic baseline using the repaired-family evidence.
 - [ ] Prove a stable local episode path.
@@ -161,6 +162,7 @@ Gate 2: tiny PPO smoke is sane
 - a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
 - trajectories are readable enough to debug
 - the smoke run stops at that diagnostic threshold instead of turning into a broader training phase
 Gate 3: fixture checks pass
@@ -221,8 +223,8 @@ If the repaired family is too easy:
 - [x] Record the measured sweep and choose provisional defaults from evidence.
 - [x] Check in tracked fixtures.
 - [x] Record the first manual playtest log.
-- [ ] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
-- [ ] Pair the tracked fixtures with high-fidelity submit checks.
-- [ ] Record one submit-side manual trace.
 - [ ] Refresh the heuristic baseline from that playtest evidence.
 - [ ] Verify one clean HF Space episode with the same contract.

 - a coarse measured sweep note now exists
 - the first tracked low-fidelity fixtures now exist
 - an initial low-fidelity manual playtest note now exists
+- paired high-fidelity fixture checks for those tracked fixtures now exist
+- one submit-side manual playtest trace exists
 Still open:
 - tiny low-fidelity PPO smoke evidence
+- decision on whether reset-seed pool should change from paired checks
 - heuristic baseline refresh on the repaired real-verifier path
 - HF Space deployment evidence
 - Colab artifact wiring
 Evidence order:
 - [x] measured sweep note
+- [x] fixture checks
 - [x] manual playtest log
+- [x] tiny low-fi PPO smoke trace
 - [ ] reward iteration note
 - [ ] stable local and remote episodes
 - [x] random and heuristic baselines
 ## 8. Execution Order
+- [x] Run a tiny low-fidelity PPO smoke pass and stop after a few trajectories once it reveals either readable behavior or one clear failure mode.
+- [x] Pair the tracked low-fidelity fixtures with high-fidelity submit checks immediately after the PPO smoke pass.
 - [ ] Decide whether the reset pool should change based on the measured sweep plus those paired checks.
+- [x] Run at least one submit-side manual trace, then expand to 5 to 10 episodes and record the first real confusion point, exploit, or reward pathology.
 - [ ] Adjust reward or penalties only if playtesting exposes a concrete problem.
 - [ ] Refresh the heuristic baseline using the repaired-family evidence.
 - [ ] Prove a stable local episode path.
 - a small low-fidelity policy can improve or at least reveal a concrete failure mode quickly
 - trajectories are readable enough to debug
 - the smoke run stops at that diagnostic threshold instead of turning into a broader training phase
+- current status: passed as a plumbing/debugging gate, with the first exposed failure mode recorded in [`P1_PPO_SMOKE_NOTE.md`](P1_PPO_SMOKE_NOTE.md)
 Gate 3: fixture checks pass
 - [x] Record the measured sweep and choose provisional defaults from evidence.
 - [x] Check in tracked fixtures.
 - [x] Record the first manual playtest log.
+- [x] Run a tiny low-fidelity PPO smoke pass and save a few trajectories.
+- [x] Pair the tracked fixtures with high-fidelity submit checks.
+- [x] Record one submit-side manual trace.
 - [ ] Refresh the heuristic baseline from that playtest evidence.
 - [ ] Verify one clean HF Space episode with the same contract.

docs/P1_MANUAL_PLAYTEST_LOG.md CHANGED Viewed

@@ -50,4 +50,33 @@ Step 1:
 Current conclusion:
 - Reward V0 is legible on the low-fidelity repair path around the default reset seed
-- the most useful next manual check is still a real `submit` trace, but low-fidelity shaping is already understandable by a human

 Current conclusion:
 - Reward V0 is legible on the low-fidelity repair path around the default reset seed
+- a real `submit` trace is now recorded; next manual validation is to extend beyond the initial 5-10 episode path and look for one clear exploit or ambiguity
+Episode C: submit-side manual trace
+Scope:
+- same seed-0 start state as episode A
+- actions: `rotational_transform increase medium`, `triangularity_scale increase medium`, `elongation decrease small`, `submit`
+Step sequence:
+- Step 1: `rotational_transform increase medium`
+  - low-fidelity feasibility changed by `0.000000` (still infeasible)
+  - reward: `-0.1000`
+- Step 2: `triangularity_scale increase medium`
+  - crossed feasibility boundary
+  - low-fidelity feasibility moved from `0.050653` to `0.000000`
+  - reward: `+3.1533`
+- Step 3: `elongation decrease small`
+  - low-fidelity feasibility moved to `0.000865`
+  - reward: `+0.2665`
+- Step 4: `submit` (high-fidelity)
+  - final feasibility: `0.000865`
+  - final high-fidelity score: `0.296059`
+  - final reward: `+2.0098`
+  - final diagnostics `evaluation_fidelity`=`high`, `constraints`=`SATISFIED`, `best_high_fidelity_score`=`0.296059`
+Artifact:
+- [manual submit trace JSON](../baselines/submit_side_trace.json)

server/data/p1/FIXTURE_SANITY.md CHANGED Viewed

@@ -1,6 +1,6 @@
 # P1 Fixture Sanity
-This folder now contains three low-fidelity-calibrated `P1` fixtures:
 - `boundary_default_reset.json`
 - `bad_low_iota.json`
@@ -23,8 +23,24 @@ Current interpretation:
   - low-fidelity feasible local target
   - reachable from the default reset band with two intuitive knob increases
-What is still pending:
-- paired high-fidelity submit measurements for each tracked fixture
-- low-fi vs high-fi ranking comparison note
 - decision on whether any reset seed should be changed from the current default

 # P1 Fixture Sanity
+This folder now contains three paired low-fidelity/high-fidelity `P1` fixtures:
 - `boundary_default_reset.json`
 - `bad_low_iota.json`
   - low-fidelity feasible local target
   - reachable from the default reset band with two intuitive knob increases
+Observed from paired run:
+- low-fi vs high-fi feasibility alignment and metric deltas are documented in `baselines/fixture_high_fidelity_pairs.json`.
 - decision on whether any reset seed should be changed from the current default
+Current paired summary (`baselines/fixture_high_fidelity_pairs.json`):
+- `bad_low_iota.json`:
+  - both fidelities infeasible
+  - low/high feasibility match: `true`
+  - low/high score match: both `0.0`
+- `boundary_default_reset.json`:
+  - both fidelities infeasible
+  - low/high feasibility match: `true`
+  - low/high score match: both `0.0`
+- `lowfi_feasible_local.json`:
+  - both fidelities feasible
+  - low/high feasibility match: `true`
+  - high-fidelity score improved slightly: `0.29165951078327634` → `0.2920325118884466`

server/data/p1/README.md CHANGED Viewed

@@ -12,7 +12,7 @@ These fixtures are for verifier and reward sanity checks.
 ## Status
-- [ ] known-good or near-winning fixture added
 - [x] near-boundary fixture added
 - [x] clearly infeasible fixture added
 - [x] fixture sanity note written

 ## Status
+- [x] known-good or near-winning fixture added
 - [x] near-boundary fixture added
 - [x] clearly infeasible fixture added
 - [x] fixture sanity note written

server/data/p1/bad_low_iota.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "notes": [
     "Clearly infeasible calibration case from the coarse measured sweep.",
     "The dominant failure mode is low edge_iota_over_nfp, not triangularity.",
-    "High-fidelity submit spot check is still pending."
   ],
   "params": {
     "aspect_ratio": 3.2,
@@ -21,7 +21,22 @@
     "aspect_ratio": 2.802311169335037,
     "average_triangularity": -0.5512332332730122,
     "edge_iota_over_nfp": 0.12745962182164347,
-    "vacuum_well": -1.0099648537211192
   },
-  "high_fidelity": null
 }

   "notes": [
     "Clearly infeasible calibration case from the coarse measured sweep.",
     "The dominant failure mode is low edge_iota_over_nfp, not triangularity.",
+    "High-fidelity submit spot check is complete."
   ],
   "params": {
     "aspect_ratio": 3.2,
     "aspect_ratio": 2.802311169335037,
     "average_triangularity": -0.5512332332730122,
     "edge_iota_over_nfp": 0.12745962182164347,
+    "vacuum_well": -1.0099648537211192,
+    "evaluation_fidelity": "low",
+    "failure_reason": ""
   },
+  "high_fidelity": {
+    "evaluation_failed": false,
+    "constraints_satisfied": false,
+    "p1_score": 0.0,
+    "p1_feasibility": 0.5763570514697449,
+    "max_elongation": 5.9831792818066525,
+    "aspect_ratio": 2.802311169335037,
+    "average_triangularity": -0.5512332332730122,
+    "edge_iota_over_nfp": 0.12709288455907652,
+    "vacuum_well": -1.0111716777365585,
+    "evaluation_fidelity": "high",
+    "failure_reason": ""
+  },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:19.629771+00:00"
 }

server/data/p1/boundary_default_reset.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "notes": [
     "Matches the current default reset seed.",
     "Useful as a near-boundary starting point for short repair episodes.",
-    "High-fidelity submit spot check is still pending."
   ],
   "params": {
     "aspect_ratio": 3.6,
@@ -21,7 +21,22 @@
     "aspect_ratio": 3.31313049868072,
     "average_triangularity": -0.4746735808874879,
     "edge_iota_over_nfp": 0.2906263991807532,
-    "vacuum_well": -0.7537878932672235
   },
-  "high_fidelity": null
 }

   "notes": [
     "Matches the current default reset seed.",
     "Useful as a near-boundary starting point for short repair episodes.",
+    "High-fidelity submit spot check is complete."
   ],
   "params": {
     "aspect_ratio": 3.6,
     "aspect_ratio": 3.31313049868072,
     "average_triangularity": -0.4746735808874879,
     "edge_iota_over_nfp": 0.2906263991807532,
+    "vacuum_well": -0.7537878932672235,
+    "evaluation_fidelity": "low",
+    "failure_reason": ""
   },
+  "high_fidelity": {
+    "evaluation_failed": false,
+    "constraints_satisfied": false,
+    "p1_score": 0.0,
+    "p1_feasibility": 0.0506528382250242,
+    "max_elongation": 6.134177903677296,
+    "aspect_ratio": 3.31313049868072,
+    "average_triangularity": -0.4746735808874879,
+    "edge_iota_over_nfp": 0.28971623977263294,
+    "vacuum_well": -0.7554909069955263,
+    "evaluation_fidelity": "high",
+    "failure_reason": ""
+  },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:24.745385+00:00"
 }

server/data/p1/lowfi_feasible_local.json CHANGED Viewed

@@ -4,7 +4,7 @@
   "notes": [
     "Local repair target reached from the default reset band by increasing rotational_transform and triangularity_scale.",
     "Useful as a low-fidelity feasibility reference for Reward V0 sanity checks.",
-    "High-fidelity submit spot check is still pending."
   ],
   "params": {
     "aspect_ratio": 3.6,
@@ -21,7 +21,22 @@
     "aspect_ratio": 3.2870514531062405,
     "average_triangularity": -0.5002923204919443,
     "edge_iota_over_nfp": 0.30046082924426193,
-    "vacuum_well": -0.7949586699110935
   },
-  "high_fidelity": null
 }

   "notes": [
     "Local repair target reached from the default reset band by increasing rotational_transform and triangularity_scale.",
     "Useful as a low-fidelity feasibility reference for Reward V0 sanity checks.",
+    "High-fidelity submit spot check is complete."
   ],
   "params": {
     "aspect_ratio": 3.6,
     "aspect_ratio": 3.2870514531062405,
     "average_triangularity": -0.5002923204919443,
     "edge_iota_over_nfp": 0.30046082924426193,
+    "vacuum_well": -0.7949586699110935,
+    "evaluation_fidelity": "low",
+    "failure_reason": ""
   },
+  "high_fidelity": {
+    "evaluation_failed": false,
+    "constraints_satisfied": true,
+    "p1_score": 0.2920325118884466,
+    "p1_feasibility": 0.0,
+    "max_elongation": 7.37170739300398,
+    "aspect_ratio": 3.2870514531062405,
+    "average_triangularity": -0.5002923204919443,
+    "edge_iota_over_nfp": 0.300057398146058,
+    "vacuum_well": -0.7963320784471227,
+    "evaluation_fidelity": "high",
+    "failure_reason": ""
+  },
+  "paired_high_fidelity_timestamp_utc": "2026-03-08T07:07:29.939083+00:00"
 }