Spaces:

CreativeEngineer
/

fusion-design-lab

Sleeping

App Files Files Community

CreativeEngineer commited on 11 days ago

Commit

daba1b9

1 Parent(s): 61fc39b

feat: align p1 environment with repo plan

Browse files

Files changed (13) hide show

AGENTS.md +9 -0
README.md +28 -8
TODO.md +81 -47
baselines/compare.py +7 -7
baselines/heuristic_agent.py +30 -23
baselines/random_agent.py +6 -8
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +9 -0
docs/PIVOT_P1_ROTATING_ELLIPSE.md +1 -1
fusion_lab/models.py +36 -16
server/app.py +10 -9
server/environment.py +231 -115
server/physics.py +88 -132
uv.lock +0 -0

AGENTS.md CHANGED Viewed

@@ -51,6 +51,7 @@ Do not leave silent divergence.
 - `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
 - `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
 - `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
 ## Working Rules
@@ -62,6 +63,8 @@ Do not leave silent divergence.
 - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
 - Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
 - Treat supporting decision records as rationale, not as a fresh task queue.
 ## Environment Contract Rules
@@ -109,6 +112,12 @@ If a human cannot act coherently from the observation, fix the environment contr
 For scoped changes, prefer the smallest relevant checks first.
 Current useful commands:
 ```bash

 - `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
 - `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
 - `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
+- `No Fallout`: keep refactors atomic. Do not leave stale schemas, stale consumers, or half-migrated task terms behind.
 ## Working Rules
 - Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
 - Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
 - Treat supporting decision records as rationale, not as a fresh task queue.
+- Do not leave fallout after contract changes. If a schema, action, reward, or task term changes, update dependent files in the same task so the repo stays coherent.
+- Do not leave stale consumers behind after refactors. Task summaries, baselines, notebooks, and docs must either match the new contract or be deliberately updated.
 ## Environment Contract Rules
 For scoped changes, prefer the smallest relevant checks first.
+## Environment and Tooling
+- This repo uses `uv` as the package and environment manager.
+- Prefer `uv sync`, `uv run`, and `uv lock` for local work, Northflank, and HF Space builds.
+- Do not introduce `conda`-specific setup into this repo unless a real blocker forces it and the change is documented.
 Current useful commands:
 ```bash

README.md CHANGED Viewed

@@ -14,14 +14,35 @@ Training is supporting evidence. The environment is the product.
 ## Current Status
-This repository is the clean hackathon workspace. The detailed planning docs live in [docs/FUSION_DESIGN_LAB_PLAN_V2.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md), [docs/FUSION_DELIVERABLES_MAP.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md), and [docs/FUSION_NEXT_12_HOURS_CHECKLIST.md](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md).
 Implementation status:
 - `P1` is locked as the benchmark task
 - docs are aligned to fresh `P1` wiring in this repo
-- shared models and server/client entry points exist
-- the runtime environment still needs to be rewired from the old toy scaffold to the real `P1` contract
 Current mode:
@@ -84,11 +105,10 @@ uv sync --extra notebooks
    - import `constellaration`
    - run one rotating-ellipse generation plus one low-fidelity verifier call
    - write an artifact to persistent storage
-3. Rewrite [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py) to the locked `P1` contract.
-4. Rewrite [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py) to use `constellaration`-based `P1` verification.
-5. Add tracked `P1` fixtures under [server/data/p1](/Users/suhjungdae/code/fusion-design-lab/server/data/p1).
-6. Add the Colab notebook under [training/notebooks](/Users/suhjungdae/code/fusion-design-lab/training/notebooks).
-7. Run manual playtest episodes before heavy training work.
 These are implementation steps, not another planning phase.

 ## Current Status
+This repository is the clean hackathon workspace. The detailed planning docs live in `docs/FUSION_DESIGN_LAB_PLAN_V2.md`, `docs/FUSION_DELIVERABLES_MAP.md`, and `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`.
 Implementation status:
 - `P1` is locked as the benchmark task
 - docs are aligned to fresh `P1` wiring in this repo
+- shared models, baselines, and server/client entry points now reflect the locked `P1` contract
+- the current environment uses a synthetic `P1` evaluator; the next runtime step is swapping in `constellaration` as the verifier of record
+## Execution Status
+- [x] Lock the `P1` contract in code
+- [x] Rewrite shared models to the rotating-ellipse `P1` schema
+- [x] Rewrite the environment loop to the rotating-ellipse `P1` schema
+- [x] Update the API/task surface to match `P1`
+- [x] Update baseline agents to the `P1` contract
+- [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
+- [x] Run an initial baseline comparison on the current synthetic `P1` branch state
+- [ ] Replace the synthetic evaluator with `constellaration`
+- [ ] Add tracked `P1` fixtures under `server/data/p1/`
+- [ ] Run manual playtesting and record the first reward pathology
+- [ ] Deploy the real environment to HF Space
+## Known Gaps
+- The current evaluator in `server/physics.py` is a synthetic proxy for `P1`, not the official `constellaration` verifier yet.
+- `BASELINE_PARAMS` is intentionally repairable but currently infeasible at reset; do not describe it as a feasible anchor.
+- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
+- The first local baseline run is only a synthetic-proxy sanity check; heuristic beat random on 20/20 seeded episodes, but this should be re-run after `constellaration` wiring.
 Current mode:
    - import `constellaration`
    - run one rotating-ellipse generation plus one low-fidelity verifier call
    - write an artifact to persistent storage
+3. Replace the synthetic evaluator in `server/physics.py` with `constellaration`-based `P1` verification.
+4. Add tracked `P1` fixtures under `server/data/p1`.
+5. Add the Colab notebook under `training/notebooks`.
+6. Run manual playtest episodes before heavy training work.
 These are implementation steps, not another planning phase.

TODO.md CHANGED Viewed

@@ -4,18 +4,33 @@ This is the execution tracker for the hackathon repo.
 Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
-- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
-- [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md)
-- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
-- [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
-- [Repo Guardrails](/Users/suhjungdae/code/fusion-design-lab/AGENTS.md)
 Priority source:
-- [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
-- [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md) is the execution order SSOT
 - this file should track execution progress only
 ## Execution Graph
 ```mermaid
@@ -34,82 +49,99 @@ flowchart TD
 ## Hour 0-2
-- [ ] Lock the exact `P1` environment contract
   Goal:
   freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
   Related:
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 - [ ] Pass the Northflank smoke test
   Related:
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
-  [training/notebooks/README.md](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/README.md)
 ## Fresh Wiring
-- [ ] Rewrite the shared models to the locked `P1` contract
   Files:
-  [fusion_lab/models.py](/Users/suhjungdae/code/fusion-design-lab/fusion_lab/models.py),
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
-- [ ] Rewrite the environment loop to the locked `P1` contract
   Files:
-  [server/environment.py](/Users/suhjungdae/code/fusion-design-lab/server/environment.py),
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
-- [ ] Replace the toy physics path with `constellaration` wiring
   Files:
-  [server/physics.py](/Users/suhjungdae/code/fusion-design-lab/server/physics.py),
-  [server/Dockerfile](/Users/suhjungdae/code/fusion-design-lab/server/Dockerfile),
-  [pyproject.toml](/Users/suhjungdae/code/fusion-design-lab/pyproject.toml)
-- [ ] Update the API/task surface to match `P1`
   Files:
-  [server/app.py](/Users/suhjungdae/code/fusion-design-lab/server/app.py),
-  [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
 ## Validation and Reward
 - [ ] Add 1-2 tracked `P1` fixtures
   Files:
-  [server/data/p1/README.md](/Users/suhjungdae/code/fusion-design-lab/server/data/p1/README.md),
-  [P1 Pivot Record](/Users/suhjungdae/code/fusion-design-lab/docs/PIVOT_P1_ROTATING_ELLIPSE.md)
 - [ ] Run fixture sanity checks
   Goal:
   confirm verifier outputs, objective direction, and reward ordering
   Related:
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 - [ ] Manual-playtest 5-10 episodes
   Goal:
   verify a human can act coherently and surface at least one pathology or ambiguity
   Related:
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md)
 - [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
   Goal:
   keep a short exploit -> fix -> behavior improvement story
   Related:
-  [AGENTS.md](/Users/suhjungdae/code/fusion-design-lab/AGENTS.md),
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md)
 ## Baselines
-- [ ] Implement and run the random baseline
   Files:
-  [baselines/random_agent.py](/Users/suhjungdae/code/fusion-design-lab/baselines/random_agent.py),
-  [baselines/compare.py](/Users/suhjungdae/code/fusion-design-lab/baselines/compare.py)
-- [ ] Implement and run the heuristic baseline
   Files:
-  [baselines/heuristic_agent.py](/Users/suhjungdae/code/fusion-design-lab/baselines/heuristic_agent.py),
-  [baselines/compare.py](/Users/suhjungdae/code/fusion-design-lab/baselines/compare.py)
 - [ ] Save one comparison trace that is presentation-ready
   Goal:
@@ -119,12 +151,12 @@ flowchart TD
 - [ ] Deploy the environment to HF Space
   Related:
-  [Deliverables Map](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DELIVERABLES_MAP.md),
-  [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
 - [ ] Create the thin public Colab notebook
   Files:
-  [training/notebooks/README.md](/Users/suhjungdae/code/fusion-design-lab/training/notebooks/README.md)
 - [ ] Record the 1-minute demo
   Goal:
@@ -132,12 +164,12 @@ flowchart TD
 - [ ] Finalize the public README
   Files:
-  [README.md](/Users/suhjungdae/code/fusion-design-lab/README.md)
 - [ ] Only add training evidence if it is actually persuasive
   Related:
-  [Plan V2](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_DESIGN_LAB_PLAN_V2.md),
-  [Next 12 Hours Checklist](/Users/suhjungdae/code/fusion-design-lab/docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 ## Guardrails
@@ -145,3 +177,5 @@ flowchart TD
 - [ ] Do not port the old `ai-sci-feasible-designs` harness
 - [ ] Do not let notebook or demo work outrun environment evidence
 - [ ] Do not add training-first complexity before manual playtesting

 Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
+- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
+- [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
+- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
+- [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
+- [Repo Guardrails](AGENTS.md)
 Priority source:
+- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
+- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md) is the execution order SSOT
 - this file should track execution progress only
+## Current State
+- [x] `P1` strategy is locked
+- [x] shared models reflect the rotating-ellipse `P1` contract
+- [x] environment loop reflects the rotating-ellipse `P1` contract
+- [x] API/task surface reflects `P1`
+- [x] baselines reflect the `P1` contract
+- [x] repo docs call out the synthetic evaluator honestly
+- [x] post-terminal guard in `step()`
+- [ ] `constellaration` verifier wiring
+- [ ] tracked `P1` fixtures
+- [ ] manual playtest log
+- [x] settle the non-submit terminal reward policy
+- [x] baseline comparison has been run once on the current synthetic `P1` branch state
 ## Execution Graph
 ```mermaid
 ## Hour 0-2
+- [x] Lock the exact `P1` environment contract
   Goal:
   freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
   Related:
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 - [ ] Pass the Northflank smoke test
   Related:
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
+  [training/notebooks/README.md](training/notebooks/README.md)
 ## Fresh Wiring
+- [x] Rewrite the shared models to the locked `P1` contract
   Files:
+  [fusion_lab/models.py](fusion_lab/models.py),
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
+- [x] Rewrite the environment loop to the locked `P1` contract
   Files:
+  [server/environment.py](server/environment.py),
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
+- [x] Add a post-terminal guard to the environment loop
   Files:
+  [server/environment.py](server/environment.py)
+  Goal:
+  reject or no-op any `step()` call after terminal state so budget and step count do not drift past episode end
+- [ ] Replace the synthetic physics path with `constellaration` wiring
   Files:
+  [server/physics.py](server/physics.py),
+  [server/Dockerfile](server/Dockerfile),
+  [pyproject.toml](pyproject.toml)
+- [x] Update the API/task surface to match `P1`
+  Files:
+  [server/app.py](server/app.py),
+  [README.md](README.md)
 ## Validation and Reward
 - [ ] Add 1-2 tracked `P1` fixtures
   Files:
+  [server/data/p1/README.md](server/data/p1/README.md),
+  [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
 - [ ] Run fixture sanity checks
   Goal:
   confirm verifier outputs, objective direction, and reward ordering
   Related:
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 - [ ] Manual-playtest 5-10 episodes
   Goal:
   verify a human can act coherently and surface at least one pathology or ambiguity
   Related:
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
 - [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
   Goal:
   keep a short exploit -> fix -> behavior improvement story
   Related:
+  [AGENTS.md](AGENTS.md),
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
+- [x] Decide the non-submit terminal reward policy
+  Goal:
+  budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
+  Files:
+  [server/environment.py](server/environment.py),
+  [README.md](README.md)
 ## Baselines
+- [x] Implement the random baseline
+  Files:
+  [baselines/random_agent.py](baselines/random_agent.py),
+  [baselines/compare.py](baselines/compare.py)
+- [x] Implement the heuristic baseline
   Files:
+  [baselines/heuristic_agent.py](baselines/heuristic_agent.py),
+  [baselines/compare.py](baselines/compare.py)
+- [x] Run the baseline comparison on the current `P1` branch state
   Files:
+  [baselines/compare.py](baselines/compare.py)
 - [ ] Save one comparison trace that is presentation-ready
   Goal:
 - [ ] Deploy the environment to HF Space
   Related:
+  [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md),
+  [README.md](README.md)
 - [ ] Create the thin public Colab notebook
   Files:
+  [training/notebooks/README.md](training/notebooks/README.md)
 - [ ] Record the 1-minute demo
   Goal:
 - [ ] Finalize the public README
   Files:
+  [README.md](README.md)
 - [ ] Only add training evidence if it is actually persuasive
   Related:
+  [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
+  [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
 ## Guardrails
 - [ ] Do not port the old `ai-sci-feasible-designs` harness
 - [ ] Do not let notebook or demo work outrun environment evidence
 - [ ] Do not add training-first complexity before manual playtesting
+- [ ] Do not describe the current synthetic evaluator as the official verifier integration
+- [ ] Do not describe the current baseline reset state as already feasible

baselines/compare.py CHANGED Viewed

@@ -14,27 +14,27 @@ def main(n_episodes: int = 20) -> None:
     random_rewards: list[float] = []
     heuristic_rewards: list[float] = []
-    random_best_qs: list[float] = []
-    heuristic_best_qs: list[float] = []
     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
         random_rewards.append(rr)
-        random_best_qs.append(rt[-1]["best_qs"])
         hr, ht = heuristic_episode(env, seed=i)
         heuristic_rewards.append(hr)
-        heuristic_best_qs.append(ht[-1]["best_qs"])
     r_mean = sum(random_rewards) / len(random_rewards)
     h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
-    r_qs = sum(random_best_qs) / len(random_best_qs)
-    h_qs = sum(heuristic_best_qs) / len(heuristic_best_qs)
     print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
     print("-" * 51)
     print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
-    print(f"{'Mean best QS residual':<25} {r_qs:>12.6f} {h_qs:>12.6f}")
     print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
     print()

     random_rewards: list[float] = []
     heuristic_rewards: list[float] = []
+    random_best_scores: list[float] = []
+    heuristic_best_scores: list[float] = []
     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
         random_rewards.append(rr)
+        random_best_scores.append(rt[-1]["best_score"])
         hr, ht = heuristic_episode(env, seed=i)
         heuristic_rewards.append(hr)
+        heuristic_best_scores.append(ht[-1]["best_score"])
     r_mean = sum(random_rewards) / len(random_rewards)
     h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
+    r_score = sum(random_best_scores) / len(random_best_scores)
+    h_score = sum(heuristic_best_scores) / len(heuristic_best_scores)
     print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
     print("-" * 51)
     print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
+    print(f"{'Mean best P1 score':<25} {r_score:>12.6f} {h_score:>12.6f}")
     print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
     print()

baselines/heuristic_agent.py CHANGED Viewed

@@ -1,8 +1,8 @@
 """Heuristic baseline agent for the stellarator design environment.
 Strategy: guided perturbations informed by domain knowledge.
-1. Probe the most sensitive coefficient (zs12) first with a small move.
-2. Apply medium perturbations in directions that typically improve QS.
 3. Use restore_best to recover from any worsening.
 4. Submit before exhausting budget.
 """
@@ -14,12 +14,12 @@ import sys
 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
-STRATEGY: list[tuple[str, str, str, str]] = [
-    ("tune_zs12", "decrease", "small", "hot"),
-    ("tune_zs12", "decrease", "medium", "hot"),
-    ("tune_rc11", "increase", "small", "hot"),
-    ("tune_rc10", "increase", "medium", "hot"),
-    ("tune_zs11", "decrease", "small", "hot"),
 ]
@@ -28,33 +28,40 @@ def heuristic_episode(
 ) -> tuple[float, list[dict[str, object]]]:
     obs = env.reset(seed=seed)
     total_reward = 0.0
-    trace: list[dict[str, object]] = [{"step": 0, "qs": obs.quasi_symmetry_residual}]
-    prev_best = obs.best_qs_residual
-    for operator, direction, magnitude, restart in STRATEGY:
         if obs.done or obs.budget_remaining <= 1:
             break
         action = StellaratorAction(
             intent="run",
-            operator=operator,
             direction=direction,
             magnitude=magnitude,
-            restart=restart,
         )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
             {
                 "step": len(trace),
-                "action": f"{operator} {direction} {magnitude}",
-                "qs": obs.quasi_symmetry_residual,
-                "best_qs": obs.best_qs_residual,
                 "reward": obs.reward,
             }
         )
-        if obs.best_qs_residual > prev_best and obs.budget_remaining > 1:
             restore = StellaratorAction(intent="restore_best")
             obs = env.step(restore)
             total_reward += obs.reward or 0.0
@@ -62,13 +69,13 @@ def heuristic_episode(
                 {
                     "step": len(trace),
                     "action": "restore_best",
-                    "qs": obs.quasi_symmetry_residual,
-                    "best_qs": obs.best_qs_residual,
                     "reward": obs.reward,
                 }
             )
-        prev_best = obs.best_qs_residual
     if not obs.done:
         submit = StellaratorAction(intent="submit")
@@ -78,8 +85,8 @@ def heuristic_episode(
             {
                 "step": len(trace),
                 "action": "submit",
-                "qs": obs.quasi_symmetry_residual,
-                "best_qs": obs.best_qs_residual,
                 "reward": obs.reward,
             }
         )
@@ -97,7 +104,7 @@ def main(n_episodes: int = 20) -> None:
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
-            f"final_qs={final['qs']:.6f}  best_qs={final['best_qs']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

 """Heuristic baseline agent for the stellarator design environment.
 Strategy: guided perturbations informed by domain knowledge.
+1. Push elongation upward to improve triangularity.
+2. Nudge rotational transform upward to stay on the iota side of feasibility.
 3. Use restore_best to recover from any worsening.
 4. Submit before exhausting budget.
 """
 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
+STRATEGY: list[tuple[str, str, str]] = [
+    ("elongation", "increase", "medium"),
+    ("elongation", "increase", "small"),
+    ("rotational_transform", "increase", "small"),
+    ("aspect_ratio", "decrease", "small"),
+    ("rotational_transform", "increase", "small"),
 ]
 ) -> tuple[float, list[dict[str, object]]]:
     obs = env.reset(seed=seed)
     total_reward = 0.0
+    trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
+    prev_best = (
+        int(obs.best_feasibility <= 0.01),
+        obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
+    )
+    for parameter, direction, magnitude in STRATEGY:
         if obs.done or obs.budget_remaining <= 1:
             break
         action = StellaratorAction(
             intent="run",
+            parameter=parameter,
             direction=direction,
             magnitude=magnitude,
         )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
             {
                 "step": len(trace),
+                "action": f"{parameter} {direction} {magnitude}",
+                "score": obs.p1_score,
+                "best_score": obs.best_score,
                 "reward": obs.reward,
             }
         )
+        current_best = (
+            int(obs.best_feasibility <= 0.01),
+            obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
+        )
+        if current_best < prev_best and obs.budget_remaining > 1:
             restore = StellaratorAction(intent="restore_best")
             obs = env.step(restore)
             total_reward += obs.reward or 0.0
                 {
                     "step": len(trace),
                     "action": "restore_best",
+                    "score": obs.p1_score,
+                    "best_score": obs.best_score,
                     "reward": obs.reward,
                 }
             )
+        prev_best = current_best
     if not obs.done:
         submit = StellaratorAction(intent="submit")
             {
                 "step": len(trace),
                 "action": "submit",
+                "score": obs.p1_score,
+                "best_score": obs.best_score,
                 "reward": obs.reward,
             }
         )
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
+            f"final_score={final['score']:.6f}  best_score={final['best_score']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

baselines/random_agent.py CHANGED Viewed

@@ -8,10 +8,9 @@ import sys
 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
-OPERATORS = ["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
 DIRECTIONS = ["increase", "decrease"]
 MAGNITUDES = ["small", "medium", "large"]
-RESTARTS = ["hot", "cold"]
 def random_episode(
@@ -20,7 +19,7 @@ def random_episode(
     rng = random.Random(seed)
     obs = env.reset(seed=seed)
     total_reward = 0.0
-    trace: list[dict[str, object]] = [{"step": 0, "qs": obs.quasi_symmetry_residual}]
     while not obs.done:
         if obs.budget_remaining <= 0:
@@ -28,10 +27,9 @@ def random_episode(
         else:
             action = StellaratorAction(
                 intent="run",
-                operator=rng.choice(OPERATORS),
                 direction=rng.choice(DIRECTIONS),
                 magnitude=rng.choice(MAGNITUDES),
-                restart=rng.choice(RESTARTS),
             )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
@@ -39,8 +37,8 @@ def random_episode(
             {
                 "step": len(trace),
                 "action": action.intent,
-                "qs": obs.quasi_symmetry_residual,
-                "best_qs": obs.best_qs_residual,
                 "reward": obs.reward,
             }
         )
@@ -58,7 +56,7 @@ def main(n_episodes: int = 20) -> None:
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
-            f"final_qs={final['qs']:.6f}  best_qs={final['best_qs']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

 from fusion_lab.models import StellaratorAction
 from server.environment import StellaratorEnvironment
+PARAMETERS = ["aspect_ratio", "elongation", "rotational_transform"]
 DIRECTIONS = ["increase", "decrease"]
 MAGNITUDES = ["small", "medium", "large"]
 def random_episode(
     rng = random.Random(seed)
     obs = env.reset(seed=seed)
     total_reward = 0.0
+    trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
     while not obs.done:
         if obs.budget_remaining <= 0:
         else:
             action = StellaratorAction(
                 intent="run",
+                parameter=rng.choice(PARAMETERS),
                 direction=rng.choice(DIRECTIONS),
                 magnitude=rng.choice(MAGNITUDES),
             )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
             {
                 "step": len(trace),
                 "action": action.intent,
+                "score": obs.p1_score,
+                "best_score": obs.best_score,
                 "reward": obs.reward,
             }
         )
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
+            f"final_score={final['score']:.6f}  best_score={final['best_score']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED Viewed

@@ -6,6 +6,15 @@ This checklist turns the updated deliverables map and Plan V2 into concrete exec
 Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
 ## Plan V2 Inheritance
 Carry these rules through the whole checklist:

 Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
+## Current Branch Status
+- [x] `P1` task is locked
+- [x] rotating-ellipse `P1` contract is implemented in the working tree
+- [x] baselines and API surface have been moved to the `P1` contract
+- [x] add a post-terminal guard in `step()`
+- [ ] replace the synthetic evaluator with `constellaration`
+- [ ] add tracked fixtures and manual playtest evidence
 ## Plan V2 Inheritance
 Carry these rules through the whole checklist:

docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED Viewed

@@ -220,7 +220,7 @@ If constellaration deployment fails (Docker build, HF Spaces issues):
 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
-1. **Near-feasible anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — expected to be close to P1 boundary
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
 3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful

 Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
+1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
 2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
 3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful

fusion_lab/models.py CHANGED Viewed

@@ -3,46 +3,66 @@ from __future__ import annotations
 from typing import Literal
 from openenv.core import Action, Observation, State
-from pydantic import Field
 ActionIntent = Literal["run", "submit", "restore_best"]
-OperatorName = Literal["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"]
 DirectionName = Literal["increase", "decrease"]
 MagnitudeName = Literal["small", "medium", "large"]
-RestartMode = Literal["hot", "cold"]
 class StellaratorAction(Action):
     intent: ActionIntent
-    operator: OperatorName | None = None
     direction: DirectionName | None = None
     magnitude: MagnitudeName | None = None
-    restart: RestartMode | None = None
     reasoning: str = ""
 class StellaratorObservation(Observation):
     diagnostics_text: str = ""
-    quasi_symmetry_residual: float = 0.0
     aspect_ratio: float = 0.0
-    rotational_transform_axis: float = 0.0
-    rotational_transform_edge: float = 0.0
-    magnetic_well_depth: float = 0.0
-    volume: float = 0.0
-    vmec_converged: bool = True
     step_number: int = 0
     budget_remaining: int = 6
-    best_qs_residual: float = float("inf")
     constraints_satisfied: bool = True
     target_spec: str = ""
 class StellaratorState(State):
-    initial_qs: float = 0.0
-    current_qs: float = 0.0
-    prev_qs: float = 0.0
-    best_qs: float = Field(default=float("inf"))
     budget_total: int = 6
     budget_remaining: int = 6
     constraints_satisfied: bool = True
     history: list[str] = Field(default_factory=list)

 from typing import Literal
 from openenv.core import Action, Observation, State
+from pydantic import BaseModel, Field
 ActionIntent = Literal["run", "submit", "restore_best"]
+ParameterName = Literal["aspect_ratio", "elongation", "rotational_transform"]
 DirectionName = Literal["increase", "decrease"]
 MagnitudeName = Literal["small", "medium", "large"]
+class RotatingEllipseParams(BaseModel):
+    aspect_ratio: float
+    elongation: float
+    rotational_transform: float
 class StellaratorAction(Action):
     intent: ActionIntent
+    parameter: ParameterName | None = None
     direction: DirectionName | None = None
     magnitude: MagnitudeName | None = None
     reasoning: str = ""
 class StellaratorObservation(Observation):
     diagnostics_text: str = ""
+    max_elongation: float = 0.0
     aspect_ratio: float = 0.0
+    average_triangularity: float = 0.0
+    edge_iota_over_nfp: float = 0.0
+    p1_score: float = 0.0
+    p1_feasibility: float = 0.0
+    vacuum_well: float = 0.0
     step_number: int = 0
     budget_remaining: int = 6
+    best_score: float = 0.0
+    best_feasibility: float = float("inf")
     constraints_satisfied: bool = True
     target_spec: str = ""
 class StellaratorState(State):
+    current_params: RotatingEllipseParams = Field(
+        default_factory=lambda: RotatingEllipseParams(
+            aspect_ratio=3.5,
+            elongation=1.5,
+            rotational_transform=0.4,
+        )
+    )
+    best_params: RotatingEllipseParams = Field(
+        default_factory=lambda: RotatingEllipseParams(
+            aspect_ratio=3.5,
+            elongation=1.5,
+            rotational_transform=0.4,
+        )
+    )
+    initial_score: float = 0.0
+    best_score: float = 0.0
+    current_feasibility: float = float("inf")
+    best_feasibility: float = float("inf")
     budget_total: int = 6
     budget_remaining: int = 6
+    episode_done: bool = False
     constraints_satisfied: bool = True
     history: list[str] = Field(default_factory=list)

server/app.py CHANGED Viewed

@@ -4,10 +4,11 @@ from openenv.core import create_fastapi_app
 from fusion_lab.models import StellaratorAction, StellaratorObservation
 from server.environment import (
-    ASPECT_RATIO_RANGE,
     BUDGET,
-    IOTA_EDGE_RANGE,
-    VOLUME_MIN,
     StellaratorEnvironment,
 )
@@ -21,18 +22,18 @@ app = create_fastapi_app(
 @app.get("/task")
 def task_summary() -> dict[str, object]:
     return {
-        "description": "Minimize quasi-symmetry error for a 2-period quasi-helical stellarator.",
         "constraints": {
-            "aspect_ratio": list(ASPECT_RATIO_RANGE),
-            "rotational_transform_edge": list(IOTA_EDGE_RANGE),
-            "volume_min": VOLUME_MIN,
         },
         "budget": BUDGET,
         "actions": ["run", "submit", "restore_best"],
-        "operators": ["tune_rc10", "tune_rc11", "tune_zs11", "tune_zs12"],
         "directions": ["increase", "decrease"],
         "magnitudes": ["small", "medium", "large"],
-        "restart_modes": ["hot", "cold"],
     }

 from fusion_lab.models import StellaratorAction, StellaratorObservation
 from server.environment import (
+    ASPECT_RATIO_MAX,
+    AVERAGE_TRIANGULARITY_MAX,
     BUDGET,
+    EDGE_IOTA_OVER_NFP_MIN,
+    N_FIELD_PERIODS,
     StellaratorEnvironment,
 )
 @app.get("/task")
 def task_summary() -> dict[str, object]:
     return {
+        "description": "Optimize the P1 benchmark with a rotating-ellipse parameterization.",
         "constraints": {
+            "aspect_ratio_max": ASPECT_RATIO_MAX,
+            "average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
+            "edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
         },
+        "n_field_periods": N_FIELD_PERIODS,
         "budget": BUDGET,
         "actions": ["run", "submit", "restore_best"],
+        "parameters": ["aspect_ratio", "elongation", "rotational_transform"],
         "directions": ["increase", "decrease"],
         "magnitudes": ["small", "medium", "large"],
     }

server/environment.py CHANGED Viewed

@@ -1,47 +1,62 @@
 from __future__ import annotations
 from typing import Any, Final, Optional
 from openenv.core import Environment as BaseEnvironment
 from fusion_lab.models import (
     StellaratorAction,
     StellaratorObservation,
     StellaratorState,
 )
-from server.physics import Diagnostics, PhysicsEngine
 BUDGET: Final[int] = 6
-ASPECT_RATIO_RANGE: Final[tuple[float, float]] = (4.5, 7.0)
-IOTA_EDGE_RANGE: Final[tuple[float, float]] = (0.3, 0.6)
-VOLUME_MIN: Final[float] = 0.5
 TARGET_SPEC: Final[str] = (
-    "Minimize quasi-symmetry residual for a 2-period quasi-helical stellarator. "
-    "Constraints: aspect ratio in [4.5, 7.0], edge iota in [0.3, 0.6], volume > 0.5 m³. "
     "Budget: 6 evaluations."
 )
-def check_constraints(diag: Diagnostics) -> bool:
-    ar_lo, ar_hi = ASPECT_RATIO_RANGE
-    iota_lo, iota_hi = IOTA_EDGE_RANGE
-    return (
-        ar_lo <= diag.aspect_ratio <= ar_hi
-        and iota_lo <= diag.iota_edge <= iota_hi
-        and diag.volume >= VOLUME_MIN
-    )
 class StellaratorEnvironment(
     BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
 ):
     def __init__(self) -> None:
         super().__init__()
-        self._engine = PhysicsEngine()
         self._state = StellaratorState()
-        self._last_diag: Diagnostics | None = None
     def reset(
         self,
@@ -49,22 +64,27 @@ class StellaratorEnvironment(
         episode_id: Optional[str] = None,
         **kwargs: Any,
     ) -> StellaratorObservation:
-        diag = self._engine.reset(seed)
-        satisfied = check_constraints(diag)
         self._state = StellaratorState(
             episode_id=episode_id,
             step_count=0,
-            initial_qs=diag.qs_residual,
-            current_qs=diag.qs_residual,
-            prev_qs=diag.qs_residual,
-            best_qs=diag.qs_residual,
             budget_total=BUDGET,
             budget_remaining=BUDGET,
-            constraints_satisfied=satisfied,
         )
-        self._last_diag = diag
         return self._build_observation(
-            diag, satisfied, action_summary="Episode started. Baseline design loaded."
         )
     def step(
@@ -73,7 +93,15 @@ class StellaratorEnvironment(
         timeout_s: Optional[float] = None,
         **kwargs: Any,
     ) -> StellaratorObservation:
-        self._state.prev_qs = self._state.current_qs
         self._state.step_count += 1
         if action.intent == "submit":
@@ -91,108 +119,131 @@ class StellaratorEnvironment(
     # ------------------------------------------------------------------
     def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
-        if not all([action.operator, action.direction, action.magnitude]):
             return self._handle_invalid_run()
         self._state.budget_remaining -= 1
-        diag = self._engine.modify_and_run(
-            operator=action.operator,
             direction=action.direction,
             magnitude=action.magnitude,
-            restart=action.restart or "hot",
         )
-        satisfied = check_constraints(diag) if diag.converged else self._state.constraints_satisfied
-        if diag.converged:
-            self._state.current_qs = diag.qs_residual
-            if diag.qs_residual < self._state.best_qs:
-                self._state.best_qs = diag.qs_residual
-            self._state.constraints_satisfied = satisfied
         done = self._state.budget_remaining <= 0
-        reward = self._compute_reward(diag, action.intent, done)
-        summary = self._summary_run(action, diag)
         self._state.history.append(summary)
-        self._last_diag = diag
         return self._build_observation(
-            diag, satisfied, action_summary=summary, reward=reward, done=done
         )
     def _handle_submit(self) -> StellaratorObservation:
-        diag = self._last_diag or self._engine.restore_best()
-        satisfied = check_constraints(diag)
-        reward = self._compute_reward(diag, "submit", done=True)
-        summary = self._summary_submit(satisfied)
         self._state.history.append(summary)
         return self._build_observation(
-            diag, satisfied, action_summary=summary, reward=reward, done=True
         )
     def _handle_restore(self) -> StellaratorObservation:
         self._state.budget_remaining -= 1
-        diag = self._engine.restore_best()
-        self._state.current_qs = diag.qs_residual
-        satisfied = check_constraints(diag)
-        self._state.constraints_satisfied = satisfied
         done = self._state.budget_remaining <= 0
-        reward = self._compute_reward(diag, "restore_best", done)
-        summary = f"Restored best design. QS residual: {diag.qs_residual:.6f}."
         self._state.history.append(summary)
-        self._last_diag = diag
         return self._build_observation(
-            diag, satisfied, action_summary=summary, reward=reward, done=done
         )
     def _handle_invalid_run(self) -> StellaratorObservation:
         self._state.budget_remaining -= 1
-        diag = self._last_diag or self._engine.restore_best()
-        satisfied = check_constraints(diag)
         done = self._state.budget_remaining <= 0
-        summary = "Invalid run action: operator, direction, and magnitude are required."
         self._state.history.append(summary)
         return self._build_observation(
-            diag, satisfied, action_summary=summary, reward=-1.0, done=done
         )
     # ------------------------------------------------------------------
     # Reward V0
     # ------------------------------------------------------------------
-    def _compute_reward(self, diag: Diagnostics, intent: str, done: bool) -> float:
         reward = 0.0
-        if diag.converged and self._state.prev_qs < float("inf"):
-            improvement = self._state.prev_qs - diag.qs_residual
-            reward += improvement * 500.0
-        if diag.converged and not check_constraints(diag):
-            reward -= 2.0
-        if not diag.converged:
-            reward -= 1.5
         if intent != "submit":
             reward -= 0.1
         if intent == "submit":
-            if self._state.best_qs < self._state.initial_qs:
-                ratio = 1.0 - (self._state.best_qs / max(self._state.initial_qs, 1e-9))
-                reward += 5.0 * ratio
-                reward += 1.0 * (self._state.budget_remaining / self._state.budget_total)
             else:
                 reward -= 1.0
-        if done and intent != "submit":
-            if self._state.best_qs < self._state.initial_qs:
-                ratio = 1.0 - (self._state.best_qs / max(self._state.initial_qs, 1e-9))
-                reward += 2.0 * ratio
         return round(reward, 4)
@@ -202,8 +253,7 @@ class StellaratorEnvironment(
     def _build_observation(
         self,
-        diag: Diagnostics,
-        satisfied: bool,
         action_summary: str,
         reward: float | None = None,
         done: bool = False,
@@ -211,29 +261,30 @@ class StellaratorEnvironment(
         text_lines = [
             action_summary,
             "",
-            f"QS Residual: {diag.qs_residual:.6f}  |  Best: {self._state.best_qs:.6f}",
-            f"Aspect Ratio: {diag.aspect_ratio:.4f}  [4.5, 7.0]",
-            f"Edge Iota: {diag.iota_edge:.4f}  [0.3, 0.6]",
-            f"Volume: {diag.volume:.4f} m³  (min 0.5)",
-            f"Magnetic Well: {diag.magnetic_well_depth:.4f}",
-            f"VMEC Converged: {diag.converged}",
-            f"Constraints: {'SATISFIED' if satisfied else 'VIOLATED'}",
-            f"Step: {self._state.step_count}  |  Budget: {self._state.budget_remaining}/{self._state.budget_total}",
         ]
         return StellaratorObservation(
             diagnostics_text="\n".join(text_lines),
-            quasi_symmetry_residual=diag.qs_residual,
-            aspect_ratio=diag.aspect_ratio,
-            rotational_transform_axis=diag.iota_axis,
-            rotational_transform_edge=diag.iota_edge,
-            magnetic_well_depth=diag.magnetic_well_depth,
-            volume=diag.volume,
-            vmec_converged=diag.converged,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
-            best_qs_residual=self._state.best_qs,
-            constraints_satisfied=satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
             done=done,
@@ -243,20 +294,85 @@ class StellaratorEnvironment(
     # Action summaries
     # ------------------------------------------------------------------
-    def _summary_run(self, action: StellaratorAction, diag: Diagnostics) -> str:
-        restart_note = f" ({action.restart} restart)" if action.restart else ""
-        header = f"Applied {action.operator} {action.direction} {action.magnitude}{restart_note}."
-        if diag.converged:
-            delta = self._state.prev_qs - diag.qs_residual
-            direction = "improved" if delta > 0 else "worsened" if delta < 0 else "unchanged"
-            return f"{header} VMEC converged. QS {direction}: {self._state.prev_qs:.6f} -> {diag.qs_residual:.6f}."
-        return f"{header} VMEC failed to converge. Change reverted."
-    def _summary_submit(self, satisfied: bool) -> str:
-        status = "Constraints satisfied." if satisfied else "Constraints VIOLATED."
-        improvement = self._state.initial_qs - self._state.best_qs
         return (
-            f"Design submitted. Best QS residual: {self._state.best_qs:.6f} "
-            f"(improved by {improvement:.6f} from initial). {status}"
         )

 from __future__ import annotations
+from random import Random
 from typing import Any, Final, Optional
 from openenv.core import Environment as BaseEnvironment
 from fusion_lab.models import (
+    RotatingEllipseParams,
     StellaratorAction,
     StellaratorObservation,
     StellaratorState,
 )
+from server.physics import (
+    ASPECT_RATIO_MAX,
+    AVERAGE_TRIANGULARITY_MAX,
+    EDGE_IOTA_OVER_NFP_MIN,
+    FEASIBILITY_TOLERANCE,
+    EvaluationMetrics,
+    evaluate_params,
+)
 BUDGET: Final[int] = 6
+N_FIELD_PERIODS: Final[int] = 3
+PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
+    "aspect_ratio": (2.0, 8.0),
+    "elongation": (1.0, 5.0),
+    "rotational_transform": (0.1, 1.0),
+}
+PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
+    "aspect_ratio": {"small": 0.1, "medium": 0.3, "large": 0.8},
+    "elongation": {"small": 0.1, "medium": 0.3, "large": 0.8},
+    "rotational_transform": {"small": 0.02, "medium": 0.05, "large": 0.15},
+}
+BASELINE_PARAMS: Final[RotatingEllipseParams] = RotatingEllipseParams(
+    aspect_ratio=3.5,
+    elongation=1.5,
+    rotational_transform=0.4,
+)
 TARGET_SPEC: Final[str] = (
+    "Optimize the P1 benchmark using a rotating-ellipse parameterization. "
+    "Constraints: aspect ratio <= 4.0, average triangularity <= -0.5, "
+    "edge rotational transform / n_field_periods >= 0.3. "
     "Budget: 6 evaluations."
 )
 class StellaratorEnvironment(
     BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
 ):
     def __init__(self) -> None:
         super().__init__()
         self._state = StellaratorState()
+        self._last_metrics: EvaluationMetrics | None = None
+        self._rng = Random()
     def reset(
         self,
         episode_id: Optional[str] = None,
         **kwargs: Any,
     ) -> StellaratorObservation:
+        self._rng = Random(seed)
+        params = self._initial_params(seed)
+        metrics = evaluate_params(params)
         self._state = StellaratorState(
             episode_id=episode_id,
             step_count=0,
+            current_params=params,
+            best_params=params,
+            initial_score=metrics.p1_score,
+            best_score=metrics.p1_score,
+            current_feasibility=metrics.p1_feasibility,
+            best_feasibility=metrics.p1_feasibility,
             budget_total=BUDGET,
             budget_remaining=BUDGET,
+            episode_done=False,
+            constraints_satisfied=metrics.constraints_satisfied,
         )
+        self._last_metrics = metrics
         return self._build_observation(
+            metrics,
+            action_summary="Episode started from the rotating-ellipse baseline.",
         )
     def step(
         timeout_s: Optional[float] = None,
         **kwargs: Any,
     ) -> StellaratorObservation:
+        if self._state.episode_done or self._state.budget_remaining <= 0:
+            metrics = self._last_metrics or evaluate_params(self._state.current_params)
+            return self._build_observation(
+                metrics,
+                action_summary=("Episode already ended. Call reset() before sending more actions."),
+                reward=0.0,
+                done=True,
+            )
         self._state.step_count += 1
         if action.intent == "submit":
     # ------------------------------------------------------------------
     def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
+        if not all([action.parameter, action.direction, action.magnitude]):
             return self._handle_invalid_run()
         self._state.budget_remaining -= 1
+        params = self._apply_action(
+            params=self._state.current_params,
+            parameter=action.parameter,
             direction=action.direction,
             magnitude=action.magnitude,
         )
+        metrics = evaluate_params(params)
+        self._state.current_params = params
+        self._state.current_feasibility = metrics.p1_feasibility
+        self._state.constraints_satisfied = metrics.constraints_satisfied
+        self._update_best(params, metrics)
         done = self._state.budget_remaining <= 0
+        reward = self._compute_reward(metrics, action.intent, done)
+        summary = self._summary_run(action, metrics)
         self._state.history.append(summary)
+        self._last_metrics = metrics
+        self._state.episode_done = done
         return self._build_observation(
+            metrics,
+            action_summary=summary,
+            reward=reward,
+            done=done,
         )
     def _handle_submit(self) -> StellaratorObservation:
+        metrics = self._last_metrics or evaluate_params(self._state.current_params)
+        reward = self._compute_reward(metrics, "submit", done=True)
+        summary = self._summary_submit(metrics)
         self._state.history.append(summary)
+        self._state.episode_done = True
         return self._build_observation(
+            metrics,
+            action_summary=summary,
+            reward=reward,
+            done=True,
         )
     def _handle_restore(self) -> StellaratorObservation:
         self._state.budget_remaining -= 1
+        self._state.current_params = self._state.best_params
+        metrics = evaluate_params(self._state.current_params)
+        self._state.current_feasibility = metrics.p1_feasibility
+        self._state.constraints_satisfied = metrics.constraints_satisfied
         done = self._state.budget_remaining <= 0
+        reward = self._compute_reward(metrics, "restore_best", done)
+        summary = (
+            "Restored the best-known design. "
+            f"Score={metrics.p1_score:.6f}, feasibility={metrics.p1_feasibility:.6f}."
+        )
         self._state.history.append(summary)
+        self._last_metrics = metrics
+        self._state.episode_done = done
         return self._build_observation(
+            metrics,
+            action_summary=summary,
+            reward=reward,
+            done=done,
         )
     def _handle_invalid_run(self) -> StellaratorObservation:
         self._state.budget_remaining -= 1
+        metrics = self._last_metrics or evaluate_params(self._state.current_params)
         done = self._state.budget_remaining <= 0
+        summary = "Invalid run action: parameter, direction, and magnitude are required."
         self._state.history.append(summary)
+        self._state.episode_done = done
         return self._build_observation(
+            metrics,
+            action_summary=summary,
+            reward=-1.0,
+            done=done,
         )
     # ------------------------------------------------------------------
     # Reward V0
     # ------------------------------------------------------------------
+    def _compute_reward(
+        self,
+        metrics: EvaluationMetrics,
+        intent: str,
+        done: bool,
+    ) -> float:
+        previous_metrics = self._last_metrics or metrics
         reward = 0.0
+        if metrics.constraints_satisfied and not previous_metrics.constraints_satisfied:
+            reward += 3.0
+        if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
+            reward -= 3.0
+        if metrics.constraints_satisfied:
+            reward += (previous_metrics.max_elongation - metrics.max_elongation) * 10.0
+        else:
+            reward += (previous_metrics.p1_feasibility - metrics.p1_feasibility) * 5.0
         if intent != "submit":
             reward -= 0.1
         if intent == "submit":
+            if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
+                improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
+                    1.0 - self._state.initial_score, 1e-6
+                )
+                budget_efficiency = self._state.budget_remaining / self._state.budget_total
+                reward += 5.0 * improvement_ratio + budget_efficiency
             else:
                 reward -= 1.0
+        elif done:
+            if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
+                improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
+                    1.0 - self._state.initial_score, 1e-6
+                )
+                reward += 2.0 * improvement_ratio
+            else:
+                reward -= 0.5
         return round(reward, 4)
     def _build_observation(
         self,
+        metrics: EvaluationMetrics,
         action_summary: str,
         reward: float | None = None,
         done: bool = False,
         text_lines = [
             action_summary,
             "",
+            f"max_elongation={metrics.max_elongation:.4f}  |  best_score={self._state.best_score:.6f}",
+            f"aspect_ratio={metrics.aspect_ratio:.4f}  (<= {ASPECT_RATIO_MAX:.1f})",
+            f"average_triangularity={metrics.average_triangularity:.4f}  (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
+            f"edge_iota_over_nfp={metrics.edge_iota_over_nfp:.4f}  (>= {EDGE_IOTA_OVER_NFP_MIN:.1f})",
+            f"feasibility={metrics.p1_feasibility:.6f}  |  best_feasibility={self._state.best_feasibility:.6f}",
+            f"vacuum_well={metrics.vacuum_well:.4f}",
+            f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
+            f"step={self._state.step_count}  |  budget={self._state.budget_remaining}/{self._state.budget_total}",
         ]
         return StellaratorObservation(
             diagnostics_text="\n".join(text_lines),
+            max_elongation=metrics.max_elongation,
+            aspect_ratio=metrics.aspect_ratio,
+            average_triangularity=metrics.average_triangularity,
+            edge_iota_over_nfp=metrics.edge_iota_over_nfp,
+            p1_score=metrics.p1_score,
+            p1_feasibility=metrics.p1_feasibility,
+            vacuum_well=metrics.vacuum_well,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
+            best_score=self._state.best_score,
+            best_feasibility=self._state.best_feasibility,
+            constraints_satisfied=metrics.constraints_satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
             done=done,
     # Action summaries
     # ------------------------------------------------------------------
+    def _summary_run(self, action: StellaratorAction, metrics: EvaluationMetrics) -> str:
+        assert action.parameter is not None
+        assert action.direction is not None
+        assert action.magnitude is not None
+        previous_metrics = self._last_metrics or metrics
+        if metrics.constraints_satisfied:
+            delta = previous_metrics.max_elongation - metrics.max_elongation
+            objective_summary = (
+                f"max_elongation changed by {delta:+.4f} to {metrics.max_elongation:.4f}."
+            )
+        else:
+            delta = previous_metrics.p1_feasibility - metrics.p1_feasibility
+            objective_summary = (
+                f"feasibility changed by {delta:+.6f} to {metrics.p1_feasibility:.6f}."
+            )
+        return (
+            f"Applied {action.parameter} {action.direction} {action.magnitude}. {objective_summary}"
+        )
+    def _summary_submit(self, metrics: EvaluationMetrics) -> str:
         return (
+            f"Submitted design with best_score={self._state.best_score:.6f}, "
+            f"best_feasibility={self._state.best_feasibility:.6f}, "
+            f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
+        )
+    def _initial_params(self, seed: int | None) -> RotatingEllipseParams:
+        if seed is None:
+            return BASELINE_PARAMS
+        rng = Random(seed)
+        return RotatingEllipseParams(
+            aspect_ratio=self._clamp(
+                BASELINE_PARAMS.aspect_ratio + rng.uniform(-0.1, 0.1),
+                parameter="aspect_ratio",
+            ),
+            elongation=self._clamp(
+                BASELINE_PARAMS.elongation + rng.uniform(-0.1, 0.1),
+                parameter="elongation",
+            ),
+            rotational_transform=self._clamp(
+                BASELINE_PARAMS.rotational_transform + rng.uniform(-0.015, 0.015),
+                parameter="rotational_transform",
+            ),
+        )
+    def _apply_action(
+        self,
+        params: RotatingEllipseParams,
+        parameter: str,
+        direction: str,
+        magnitude: str,
+    ) -> RotatingEllipseParams:
+        delta = PARAMETER_DELTAS[parameter][magnitude]
+        signed_delta = delta if direction == "increase" else -delta
+        next_values = params.model_dump()
+        next_values[parameter] = self._clamp(
+            next_values[parameter] + signed_delta,
+            parameter=parameter,
+        )
+        return RotatingEllipseParams.model_validate(next_values)
+    def _clamp(self, value: float, *, parameter: str) -> float:
+        lower, upper = PARAMETER_RANGES[parameter]
+        return min(max(value, lower), upper)
+    def _update_best(self, params: RotatingEllipseParams, metrics: EvaluationMetrics) -> None:
+        current_rank = self._candidate_rank(metrics)
+        best_rank = (
+            (1, self._state.best_score)
+            if self._state.best_feasibility <= FEASIBILITY_TOLERANCE
+            else (0, -self._state.best_feasibility)
         )
+        if current_rank > best_rank:
+            self._state.best_params = params
+            self._state.best_score = metrics.p1_score
+            self._state.best_feasibility = metrics.p1_feasibility
+    def _candidate_rank(self, metrics: EvaluationMetrics) -> tuple[int, float]:
+        if metrics.constraints_satisfied:
+            return (1, metrics.p1_score)
+        return (0, -metrics.p1_feasibility)

server/physics.py CHANGED Viewed

@@ -1,141 +1,97 @@
 from __future__ import annotations
-import math
-import random
-from dataclasses import dataclass, field
 from typing import Final
-NFP: Final[int] = 2
-BASELINE_COEFFS: Final[dict[str, float]] = {
-    "rc10": 1.0,
-    "rc11": 0.12,
-    "zs11": 0.12,
-    "zs12": -0.02,
-}
-OPTIMAL_COEFFS: Final[dict[str, float]] = {
-    "rc10": 1.02,
-    "rc11": 0.135,
-    "zs11": 0.115,
-    "zs12": -0.035,
-}
-MAGNITUDE_DELTAS: Final[dict[str, float]] = {
-    "small": 0.005,
-    "medium": 0.02,
-    "large": 0.05,
-}
 @dataclass(frozen=True)
-class Diagnostics:
-    qs_residual: float
     aspect_ratio: float
-    iota_axis: float
-    iota_edge: float
-    volume: float
-    magnetic_well_depth: float
-    converged: bool
-@dataclass
-class PhysicsEngine:
-    coeffs: dict[str, float] = field(default_factory=lambda: dict(BASELINE_COEFFS))
-    best_coeffs: dict[str, float] = field(default_factory=lambda: dict(BASELINE_COEFFS))
-    best_qs: float = float("inf")
-    _rng: random.Random = field(default_factory=random.Random)
-    def reset(self, seed: int | None = None) -> Diagnostics:
-        self.coeffs = dict(BASELINE_COEFFS)
-        self._rng = random.Random(seed)
-        if seed is not None:
-            for key in self.coeffs:
-                self.coeffs[key] += self._rng.gauss(0, 0.002)
-        self.best_coeffs = dict(self.coeffs)
-        diag = self._compute_diagnostics(converged=True)
-        self.best_qs = diag.qs_residual
-        return diag
-    def modify_and_run(
-        self,
-        operator: str,
-        direction: str,
-        magnitude: str,
-        restart: str,
-    ) -> Diagnostics:
-        coeff_key = operator.removeprefix("tune_")
-        delta = MAGNITUDE_DELTAS[magnitude]
-        if direction == "decrease":
-            delta = -delta
-        prev_value = self.coeffs[coeff_key]
-        self.coeffs[coeff_key] = prev_value + delta
-        converged = self._simulate_convergence(magnitude, restart)
-        if not converged:
-            self.coeffs[coeff_key] = prev_value
-            return self._compute_diagnostics(converged=False)
-        diag = self._compute_diagnostics(converged=True)
-        if diag.qs_residual < self.best_qs:
-            self.best_qs = diag.qs_residual
-            self.best_coeffs = dict(self.coeffs)
-        return diag
-    def restore_best(self) -> Diagnostics:
-        self.coeffs = dict(self.best_coeffs)
-        return self._compute_diagnostics(converged=True)
-    def _compute_diagnostics(self, *, converged: bool) -> Diagnostics:
-        rc10 = self.coeffs["rc10"]
-        rc11 = self.coeffs["rc11"]
-        zs11 = self.coeffs["zs11"]
-        zs12 = self.coeffs["zs12"]
-        r_minor = math.sqrt(rc11**2 + zs11**2)
-        aspect_ratio = rc10 / max(r_minor, 1e-6)
-        volume = 2.0 * math.pi**2 * rc10 * r_minor**2
-        helical_excursion = abs(zs11 / max(abs(rc11), 1e-6))
-        iota_axis = 0.35 + 0.15 * helical_excursion + 0.5 * abs(zs12)
-        shear = 0.04 + 0.02 * abs(rc10 - 1.0)
-        iota_edge = iota_axis + shear
-        magnetic_well = 0.02 + 0.01 * (rc11 / max(abs(zs11), 1e-6) - 1.0)
-        qs_residual = self._compute_qs_residual() if converged else float("inf")
-        return Diagnostics(
-            qs_residual=round(qs_residual, 6),
-            aspect_ratio=round(aspect_ratio, 4),
-            iota_axis=round(iota_axis, 4),
-            iota_edge=round(iota_edge, 4),
-            volume=round(volume, 4),
-            magnetic_well_depth=round(magnetic_well, 4),
-            converged=converged,
-        )
-    def _compute_qs_residual(self) -> float:
-        d = {k: self.coeffs[k] - OPTIMAL_COEFFS[k] for k in OPTIMAL_COEFFS}
-        quadratic = (
-            2.0 * d["rc10"] ** 2
-            + 8.0 * d["rc11"] ** 2
-            + 8.0 * d["zs11"] ** 2
-            + 15.0 * d["zs12"] ** 2
-        )
-        cross = 4.0 * d["rc11"] * d["zs11"] - 3.0 * d["rc10"] * d["zs12"]
-        noise = self._rng.gauss(0, 0.0003)
-        return max(quadratic + cross + 0.002 + noise, 0.001)
-    def _simulate_convergence(self, magnitude: str, restart: str) -> bool:
-        fail_prob = {"small": 0.02, "medium": 0.08, "large": 0.20}[magnitude]
-        if restart == "hot":
-            fail_prob *= 0.5
-        for key, val in self.coeffs.items():
-            deviation = abs(val - BASELINE_COEFFS[key])
-            if deviation > 0.1:
-                fail_prob += 0.15
-            elif deviation > 0.05:
-                fail_prob += 0.05
-        return self._rng.random() > min(fail_prob, 0.8)

 from __future__ import annotations
+from dataclasses import dataclass
 from typing import Final
+from fusion_lab.models import RotatingEllipseParams
+ASPECT_RATIO_MAX: Final[float] = 4.0
+AVERAGE_TRIANGULARITY_MAX: Final[float] = -0.5
+EDGE_IOTA_OVER_NFP_MIN: Final[float] = 0.3
+FEASIBILITY_TOLERANCE: Final[float] = 0.01
 @dataclass(frozen=True)
+class EvaluationMetrics:
+    max_elongation: float
     aspect_ratio: float
+    average_triangularity: float
+    edge_iota_over_nfp: float
+    p1_score: float
+    p1_feasibility: float
+    constraints_satisfied: bool
+    vacuum_well: float
+def _normalized_violation(value: float, *, limit: float, direction: str) -> float:
+    if direction == "max":
+        return max((value - limit) / max(abs(limit), 1e-6), 0.0)
+    return max((limit - value) / max(abs(limit), 1e-6), 0.0)
+def evaluate_params(params: RotatingEllipseParams) -> EvaluationMetrics:
+    aspect_ratio = round(params.aspect_ratio, 4)
+    average_triangularity = round(
+        -0.2
+        - 0.35 * (params.elongation - 1.0)
+        - 0.2 * max(0.0, 0.35 - params.rotational_transform),
+        4,
+    )
+    edge_iota_over_nfp = round(
+        params.rotational_transform
+        - 0.05 * max(0.0, params.aspect_ratio - ASPECT_RATIO_MAX)
+        + 0.03 * (params.elongation - 1.5),
+        4,
+    )
+    max_elongation = round(
+        params.elongation
+        + 0.18 * (params.aspect_ratio - 3.4) ** 2
+        + 0.8 * abs(params.rotational_transform - 0.42)
+        + 0.2,
+        4,
+    )
+    vacuum_well = round(
+        0.03
+        + 0.02 * (4.0 - min(params.aspect_ratio, 4.0))
+        + 0.015 * (params.rotational_transform - 0.3)
+        - 0.01 * abs(params.elongation - 1.7),
+        4,
+    )
+    aspect_ratio_violation = _normalized_violation(
+        aspect_ratio,
+        limit=ASPECT_RATIO_MAX,
+        direction="max",
+    )
+    triangularity_violation = _normalized_violation(
+        average_triangularity,
+        limit=AVERAGE_TRIANGULARITY_MAX,
+        direction="max",
+    )
+    iota_violation = _normalized_violation(
+        edge_iota_over_nfp,
+        limit=EDGE_IOTA_OVER_NFP_MIN,
+        direction="min",
+    )
+    p1_feasibility = round(
+        max(aspect_ratio_violation, triangularity_violation, iota_violation),
+        6,
+    )
+    constraints_satisfied = p1_feasibility <= FEASIBILITY_TOLERANCE
+    p1_score = (
+        round(1.0 - min(max((max_elongation - 1.0) / 9.0, 0.0), 1.0), 6)
+        if constraints_satisfied
+        else 0.0
+    )
+    return EvaluationMetrics(
+        max_elongation=max_elongation,
+        aspect_ratio=aspect_ratio,
+        average_triangularity=average_triangularity,
+        edge_iota_over_nfp=edge_iota_over_nfp,
+        p1_score=p1_score,
+        p1_feasibility=p1_feasibility,
+        constraints_satisfied=constraints_satisfied,
+        vacuum_well=vacuum_well,
+    )

uv.lock CHANGED Viewed

The diff for this file is too large to render. See raw diff