Spaces:
Sleeping
Sleeping
Commit ·
daba1b9
1
Parent(s): 61fc39b
feat: align p1 environment with repo plan
Browse files- AGENTS.md +9 -0
- README.md +28 -8
- TODO.md +81 -47
- baselines/compare.py +7 -7
- baselines/heuristic_agent.py +30 -23
- baselines/random_agent.py +6 -8
- docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +9 -0
- docs/PIVOT_P1_ROTATING_ELLIPSE.md +1 -1
- fusion_lab/models.py +36 -16
- server/app.py +10 -9
- server/environment.py +231 -115
- server/physics.py +88 -132
- uv.lock +0 -0
AGENTS.md
CHANGED
|
@@ -51,6 +51,7 @@ Do not leave silent divergence.
|
|
| 51 |
- `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
|
| 52 |
- `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
|
| 53 |
- `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
|
|
|
|
| 54 |
|
| 55 |
## Working Rules
|
| 56 |
|
|
@@ -62,6 +63,8 @@ Do not leave silent divergence.
|
|
| 62 |
- Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
|
| 63 |
- Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
|
| 64 |
- Treat supporting decision records as rationale, not as a fresh task queue.
|
|
|
|
|
|
|
| 65 |
|
| 66 |
## Environment Contract Rules
|
| 67 |
|
|
@@ -109,6 +112,12 @@ If a human cannot act coherently from the observation, fix the environment contr
|
|
| 109 |
|
| 110 |
For scoped changes, prefer the smallest relevant checks first.
|
| 111 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 112 |
Current useful commands:
|
| 113 |
|
| 114 |
```bash
|
|
|
|
| 51 |
- `SSOT`: keep one canonical definition for the environment contract, reward semantics, and task wording.
|
| 52 |
- `SOLID`: keep modules focused, interfaces clear, and responsibilities separated.
|
| 53 |
- `Occam's Razor`: when two approaches work, prefer the one with fewer moving parts and fewer assumptions.
|
| 54 |
+
- `No Fallout`: keep refactors atomic. Do not leave stale schemas, stale consumers, or half-migrated task terms behind.
|
| 55 |
|
| 56 |
## Working Rules
|
| 57 |
|
|
|
|
| 63 |
- Do not optimize notebook/training work ahead of local environment stability, remote environment stability, and baseline comparisons.
|
| 64 |
- Do not create new planning loops around decisions that are already locked in the SSOT docs unless a hard blocker appears.
|
| 65 |
- Treat supporting decision records as rationale, not as a fresh task queue.
|
| 66 |
+
- Do not leave fallout after contract changes. If a schema, action, reward, or task term changes, update dependent files in the same task so the repo stays coherent.
|
| 67 |
+
- Do not leave stale consumers behind after refactors. Task summaries, baselines, notebooks, and docs must either match the new contract or be deliberately updated.
|
| 68 |
|
| 69 |
## Environment Contract Rules
|
| 70 |
|
|
|
|
| 112 |
|
| 113 |
For scoped changes, prefer the smallest relevant checks first.
|
| 114 |
|
| 115 |
+
## Environment and Tooling
|
| 116 |
+
|
| 117 |
+
- This repo uses `uv` as the package and environment manager.
|
| 118 |
+
- Prefer `uv sync`, `uv run`, and `uv lock` for local work, Northflank, and HF Space builds.
|
| 119 |
+
- Do not introduce `conda`-specific setup into this repo unless a real blocker forces it and the change is documented.
|
| 120 |
+
|
| 121 |
Current useful commands:
|
| 122 |
|
| 123 |
```bash
|
README.md
CHANGED
|
@@ -14,14 +14,35 @@ Training is supporting evidence. The environment is the product.
|
|
| 14 |
|
| 15 |
## Current Status
|
| 16 |
|
| 17 |
-
This repository is the clean hackathon workspace. The detailed planning docs live in
|
| 18 |
|
| 19 |
Implementation status:
|
| 20 |
|
| 21 |
- `P1` is locked as the benchmark task
|
| 22 |
- docs are aligned to fresh `P1` wiring in this repo
|
| 23 |
-
- shared models and server/client entry points
|
| 24 |
-
- the
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
Current mode:
|
| 27 |
|
|
@@ -84,11 +105,10 @@ uv sync --extra notebooks
|
|
| 84 |
- import `constellaration`
|
| 85 |
- run one rotating-ellipse generation plus one low-fidelity verifier call
|
| 86 |
- write an artifact to persistent storage
|
| 87 |
-
3.
|
| 88 |
-
4.
|
| 89 |
-
5. Add
|
| 90 |
-
6.
|
| 91 |
-
7. Run manual playtest episodes before heavy training work.
|
| 92 |
|
| 93 |
These are implementation steps, not another planning phase.
|
| 94 |
|
|
|
|
| 14 |
|
| 15 |
## Current Status
|
| 16 |
|
| 17 |
+
This repository is the clean hackathon workspace. The detailed planning docs live in `docs/FUSION_DESIGN_LAB_PLAN_V2.md`, `docs/FUSION_DELIVERABLES_MAP.md`, and `docs/FUSION_NEXT_12_HOURS_CHECKLIST.md`.
|
| 18 |
|
| 19 |
Implementation status:
|
| 20 |
|
| 21 |
- `P1` is locked as the benchmark task
|
| 22 |
- docs are aligned to fresh `P1` wiring in this repo
|
| 23 |
+
- shared models, baselines, and server/client entry points now reflect the locked `P1` contract
|
| 24 |
+
- the current environment uses a synthetic `P1` evaluator; the next runtime step is swapping in `constellaration` as the verifier of record
|
| 25 |
+
|
| 26 |
+
## Execution Status
|
| 27 |
+
|
| 28 |
+
- [x] Lock the `P1` contract in code
|
| 29 |
+
- [x] Rewrite shared models to the rotating-ellipse `P1` schema
|
| 30 |
+
- [x] Rewrite the environment loop to the rotating-ellipse `P1` schema
|
| 31 |
+
- [x] Update the API/task surface to match `P1`
|
| 32 |
+
- [x] Update baseline agents to the `P1` contract
|
| 33 |
+
- [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
|
| 34 |
+
- [x] Run an initial baseline comparison on the current synthetic `P1` branch state
|
| 35 |
+
- [ ] Replace the synthetic evaluator with `constellaration`
|
| 36 |
+
- [ ] Add tracked `P1` fixtures under `server/data/p1/`
|
| 37 |
+
- [ ] Run manual playtesting and record the first reward pathology
|
| 38 |
+
- [ ] Deploy the real environment to HF Space
|
| 39 |
+
|
| 40 |
+
## Known Gaps
|
| 41 |
+
|
| 42 |
+
- The current evaluator in `server/physics.py` is a synthetic proxy for `P1`, not the official `constellaration` verifier yet.
|
| 43 |
+
- `BASELINE_PARAMS` is intentionally repairable but currently infeasible at reset; do not describe it as a feasible anchor.
|
| 44 |
+
- Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
|
| 45 |
+
- The first local baseline run is only a synthetic-proxy sanity check; heuristic beat random on 20/20 seeded episodes, but this should be re-run after `constellaration` wiring.
|
| 46 |
|
| 47 |
Current mode:
|
| 48 |
|
|
|
|
| 105 |
- import `constellaration`
|
| 106 |
- run one rotating-ellipse generation plus one low-fidelity verifier call
|
| 107 |
- write an artifact to persistent storage
|
| 108 |
+
3. Replace the synthetic evaluator in `server/physics.py` with `constellaration`-based `P1` verification.
|
| 109 |
+
4. Add tracked `P1` fixtures under `server/data/p1`.
|
| 110 |
+
5. Add the Colab notebook under `training/notebooks`.
|
| 111 |
+
6. Run manual playtest episodes before heavy training work.
|
|
|
|
| 112 |
|
| 113 |
These are implementation steps, not another planning phase.
|
| 114 |
|
TODO.md
CHANGED
|
@@ -4,18 +4,33 @@ This is the execution tracker for the hackathon repo.
|
|
| 4 |
|
| 5 |
Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
|
| 6 |
|
| 7 |
-
- [Plan V2](
|
| 8 |
-
- [Deliverables Map](
|
| 9 |
-
- [Next 12 Hours Checklist](
|
| 10 |
-
- [P1 Pivot Record](
|
| 11 |
-
- [Repo Guardrails](
|
| 12 |
|
| 13 |
Priority source:
|
| 14 |
|
| 15 |
-
- [Plan V2](
|
| 16 |
-
- [Next 12 Hours Checklist](
|
| 17 |
- this file should track execution progress only
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Execution Graph
|
| 20 |
|
| 21 |
```mermaid
|
|
@@ -34,82 +49,99 @@ flowchart TD
|
|
| 34 |
|
| 35 |
## Hour 0-2
|
| 36 |
|
| 37 |
-
- [
|
| 38 |
Goal:
|
| 39 |
freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
|
| 40 |
Related:
|
| 41 |
-
[Plan V2](
|
| 42 |
-
[Next 12 Hours Checklist](
|
| 43 |
|
| 44 |
- [ ] Pass the Northflank smoke test
|
| 45 |
Related:
|
| 46 |
-
[Plan V2](
|
| 47 |
-
[Next 12 Hours Checklist](
|
| 48 |
-
[training/notebooks/README.md](
|
| 49 |
|
| 50 |
## Fresh Wiring
|
| 51 |
|
| 52 |
-
- [
|
| 53 |
Files:
|
| 54 |
-
[fusion_lab/models.py](
|
| 55 |
-
[Plan V2](
|
| 56 |
|
| 57 |
-
- [
|
| 58 |
Files:
|
| 59 |
-
[server/environment.py](
|
| 60 |
-
[Plan V2](
|
| 61 |
-
[P1 Pivot Record](
|
| 62 |
|
| 63 |
-
- [
|
| 64 |
Files:
|
| 65 |
-
[server/
|
| 66 |
-
|
| 67 |
-
|
| 68 |
|
| 69 |
-
- [ ]
|
| 70 |
Files:
|
| 71 |
-
[server/
|
| 72 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 73 |
|
| 74 |
## Validation and Reward
|
| 75 |
|
| 76 |
- [ ] Add 1-2 tracked `P1` fixtures
|
| 77 |
Files:
|
| 78 |
-
[server/data/p1/README.md](
|
| 79 |
-
[P1 Pivot Record](
|
| 80 |
|
| 81 |
- [ ] Run fixture sanity checks
|
| 82 |
Goal:
|
| 83 |
confirm verifier outputs, objective direction, and reward ordering
|
| 84 |
Related:
|
| 85 |
-
[Plan V2](
|
| 86 |
-
[Next 12 Hours Checklist](
|
| 87 |
|
| 88 |
- [ ] Manual-playtest 5-10 episodes
|
| 89 |
Goal:
|
| 90 |
verify a human can act coherently and surface at least one pathology or ambiguity
|
| 91 |
Related:
|
| 92 |
-
[Plan V2](
|
| 93 |
-
[Deliverables Map](
|
| 94 |
|
| 95 |
- [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
|
| 96 |
Goal:
|
| 97 |
keep a short exploit -> fix -> behavior improvement story
|
| 98 |
Related:
|
| 99 |
-
[AGENTS.md](
|
| 100 |
-
[Plan V2](
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
|
| 102 |
## Baselines
|
| 103 |
|
| 104 |
-
- [
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 105 |
Files:
|
| 106 |
-
[baselines/
|
| 107 |
-
[baselines/compare.py](
|
| 108 |
|
| 109 |
-
- [
|
| 110 |
Files:
|
| 111 |
-
[baselines/
|
| 112 |
-
[baselines/compare.py](/Users/suhjungdae/code/fusion-design-lab/baselines/compare.py)
|
| 113 |
|
| 114 |
- [ ] Save one comparison trace that is presentation-ready
|
| 115 |
Goal:
|
|
@@ -119,12 +151,12 @@ flowchart TD
|
|
| 119 |
|
| 120 |
- [ ] Deploy the environment to HF Space
|
| 121 |
Related:
|
| 122 |
-
[Deliverables Map](
|
| 123 |
-
[README.md](
|
| 124 |
|
| 125 |
- [ ] Create the thin public Colab notebook
|
| 126 |
Files:
|
| 127 |
-
[training/notebooks/README.md](
|
| 128 |
|
| 129 |
- [ ] Record the 1-minute demo
|
| 130 |
Goal:
|
|
@@ -132,12 +164,12 @@ flowchart TD
|
|
| 132 |
|
| 133 |
- [ ] Finalize the public README
|
| 134 |
Files:
|
| 135 |
-
[README.md](
|
| 136 |
|
| 137 |
- [ ] Only add training evidence if it is actually persuasive
|
| 138 |
Related:
|
| 139 |
-
[Plan V2](
|
| 140 |
-
[Next 12 Hours Checklist](
|
| 141 |
|
| 142 |
## Guardrails
|
| 143 |
|
|
@@ -145,3 +177,5 @@ flowchart TD
|
|
| 145 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 146 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 147 |
- [ ] Do not add training-first complexity before manual playtesting
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
Use this file for day-of build progress. Use the linked docs for rationale, sequencing, and submission framing:
|
| 6 |
|
| 7 |
+
- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 8 |
+
- [Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
|
| 9 |
+
- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 10 |
+
- [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 11 |
+
- [Repo Guardrails](AGENTS.md)
|
| 12 |
|
| 13 |
Priority source:
|
| 14 |
|
| 15 |
+
- [Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md) is the planning SSOT
|
| 16 |
+
- [Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md) is the execution order SSOT
|
| 17 |
- this file should track execution progress only
|
| 18 |
|
| 19 |
+
## Current State
|
| 20 |
+
|
| 21 |
+
- [x] `P1` strategy is locked
|
| 22 |
+
- [x] shared models reflect the rotating-ellipse `P1` contract
|
| 23 |
+
- [x] environment loop reflects the rotating-ellipse `P1` contract
|
| 24 |
+
- [x] API/task surface reflects `P1`
|
| 25 |
+
- [x] baselines reflect the `P1` contract
|
| 26 |
+
- [x] repo docs call out the synthetic evaluator honestly
|
| 27 |
+
- [x] post-terminal guard in `step()`
|
| 28 |
+
- [ ] `constellaration` verifier wiring
|
| 29 |
+
- [ ] tracked `P1` fixtures
|
| 30 |
+
- [ ] manual playtest log
|
| 31 |
+
- [x] settle the non-submit terminal reward policy
|
| 32 |
+
- [x] baseline comparison has been run once on the current synthetic `P1` branch state
|
| 33 |
+
|
| 34 |
## Execution Graph
|
| 35 |
|
| 36 |
```mermaid
|
|
|
|
| 49 |
|
| 50 |
## Hour 0-2
|
| 51 |
|
| 52 |
+
- [x] Lock the exact `P1` environment contract
|
| 53 |
Goal:
|
| 54 |
freeze observation schema, action schema, episode loop, terminal conditions, and `Reward V0`
|
| 55 |
Related:
|
| 56 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 57 |
+
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 58 |
|
| 59 |
- [ ] Pass the Northflank smoke test
|
| 60 |
Related:
|
| 61 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 62 |
+
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md),
|
| 63 |
+
[training/notebooks/README.md](training/notebooks/README.md)
|
| 64 |
|
| 65 |
## Fresh Wiring
|
| 66 |
|
| 67 |
+
- [x] Rewrite the shared models to the locked `P1` contract
|
| 68 |
Files:
|
| 69 |
+
[fusion_lab/models.py](fusion_lab/models.py),
|
| 70 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 71 |
|
| 72 |
+
- [x] Rewrite the environment loop to the locked `P1` contract
|
| 73 |
Files:
|
| 74 |
+
[server/environment.py](server/environment.py),
|
| 75 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 76 |
+
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 77 |
|
| 78 |
+
- [x] Add a post-terminal guard to the environment loop
|
| 79 |
Files:
|
| 80 |
+
[server/environment.py](server/environment.py)
|
| 81 |
+
Goal:
|
| 82 |
+
reject or no-op any `step()` call after terminal state so budget and step count do not drift past episode end
|
| 83 |
|
| 84 |
+
- [ ] Replace the synthetic physics path with `constellaration` wiring
|
| 85 |
Files:
|
| 86 |
+
[server/physics.py](server/physics.py),
|
| 87 |
+
[server/Dockerfile](server/Dockerfile),
|
| 88 |
+
[pyproject.toml](pyproject.toml)
|
| 89 |
+
|
| 90 |
+
- [x] Update the API/task surface to match `P1`
|
| 91 |
+
Files:
|
| 92 |
+
[server/app.py](server/app.py),
|
| 93 |
+
[README.md](README.md)
|
| 94 |
|
| 95 |
## Validation and Reward
|
| 96 |
|
| 97 |
- [ ] Add 1-2 tracked `P1` fixtures
|
| 98 |
Files:
|
| 99 |
+
[server/data/p1/README.md](server/data/p1/README.md),
|
| 100 |
+
[P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
|
| 101 |
|
| 102 |
- [ ] Run fixture sanity checks
|
| 103 |
Goal:
|
| 104 |
confirm verifier outputs, objective direction, and reward ordering
|
| 105 |
Related:
|
| 106 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 107 |
+
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 108 |
|
| 109 |
- [ ] Manual-playtest 5-10 episodes
|
| 110 |
Goal:
|
| 111 |
verify a human can act coherently and surface at least one pathology or ambiguity
|
| 112 |
Related:
|
| 113 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 114 |
+
[Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md)
|
| 115 |
|
| 116 |
- [ ] Update reward from `V0` to `V1` if playtesting reveals a real pathology
|
| 117 |
Goal:
|
| 118 |
keep a short exploit -> fix -> behavior improvement story
|
| 119 |
Related:
|
| 120 |
+
[AGENTS.md](AGENTS.md),
|
| 121 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 122 |
+
|
| 123 |
+
- [x] Decide the non-submit terminal reward policy
|
| 124 |
+
Goal:
|
| 125 |
+
budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
|
| 126 |
+
Files:
|
| 127 |
+
[server/environment.py](server/environment.py),
|
| 128 |
+
[README.md](README.md)
|
| 129 |
|
| 130 |
## Baselines
|
| 131 |
|
| 132 |
+
- [x] Implement the random baseline
|
| 133 |
+
Files:
|
| 134 |
+
[baselines/random_agent.py](baselines/random_agent.py),
|
| 135 |
+
[baselines/compare.py](baselines/compare.py)
|
| 136 |
+
|
| 137 |
+
- [x] Implement the heuristic baseline
|
| 138 |
Files:
|
| 139 |
+
[baselines/heuristic_agent.py](baselines/heuristic_agent.py),
|
| 140 |
+
[baselines/compare.py](baselines/compare.py)
|
| 141 |
|
| 142 |
+
- [x] Run the baseline comparison on the current `P1` branch state
|
| 143 |
Files:
|
| 144 |
+
[baselines/compare.py](baselines/compare.py)
|
|
|
|
| 145 |
|
| 146 |
- [ ] Save one comparison trace that is presentation-ready
|
| 147 |
Goal:
|
|
|
|
| 151 |
|
| 152 |
- [ ] Deploy the environment to HF Space
|
| 153 |
Related:
|
| 154 |
+
[Deliverables Map](docs/FUSION_DELIVERABLES_MAP.md),
|
| 155 |
+
[README.md](README.md)
|
| 156 |
|
| 157 |
- [ ] Create the thin public Colab notebook
|
| 158 |
Files:
|
| 159 |
+
[training/notebooks/README.md](training/notebooks/README.md)
|
| 160 |
|
| 161 |
- [ ] Record the 1-minute demo
|
| 162 |
Goal:
|
|
|
|
| 164 |
|
| 165 |
- [ ] Finalize the public README
|
| 166 |
Files:
|
| 167 |
+
[README.md](README.md)
|
| 168 |
|
| 169 |
- [ ] Only add training evidence if it is actually persuasive
|
| 170 |
Related:
|
| 171 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md),
|
| 172 |
+
[Next 12 Hours Checklist](docs/FUSION_NEXT_12_HOURS_CHECKLIST.md)
|
| 173 |
|
| 174 |
## Guardrails
|
| 175 |
|
|
|
|
| 177 |
- [ ] Do not port the old `ai-sci-feasible-designs` harness
|
| 178 |
- [ ] Do not let notebook or demo work outrun environment evidence
|
| 179 |
- [ ] Do not add training-first complexity before manual playtesting
|
| 180 |
+
- [ ] Do not describe the current synthetic evaluator as the official verifier integration
|
| 181 |
+
- [ ] Do not describe the current baseline reset state as already feasible
|
baselines/compare.py
CHANGED
|
@@ -14,27 +14,27 @@ def main(n_episodes: int = 20) -> None:
|
|
| 14 |
|
| 15 |
random_rewards: list[float] = []
|
| 16 |
heuristic_rewards: list[float] = []
|
| 17 |
-
|
| 18 |
-
|
| 19 |
|
| 20 |
for i in range(n_episodes):
|
| 21 |
rr, rt = random_episode(env, seed=i)
|
| 22 |
random_rewards.append(rr)
|
| 23 |
-
|
| 24 |
|
| 25 |
hr, ht = heuristic_episode(env, seed=i)
|
| 26 |
heuristic_rewards.append(hr)
|
| 27 |
-
|
| 28 |
|
| 29 |
r_mean = sum(random_rewards) / len(random_rewards)
|
| 30 |
h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
|
| 31 |
-
|
| 32 |
-
|
| 33 |
|
| 34 |
print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
|
| 35 |
print("-" * 51)
|
| 36 |
print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
|
| 37 |
-
print(f"{'Mean best
|
| 38 |
print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
|
| 39 |
print()
|
| 40 |
|
|
|
|
| 14 |
|
| 15 |
random_rewards: list[float] = []
|
| 16 |
heuristic_rewards: list[float] = []
|
| 17 |
+
random_best_scores: list[float] = []
|
| 18 |
+
heuristic_best_scores: list[float] = []
|
| 19 |
|
| 20 |
for i in range(n_episodes):
|
| 21 |
rr, rt = random_episode(env, seed=i)
|
| 22 |
random_rewards.append(rr)
|
| 23 |
+
random_best_scores.append(rt[-1]["best_score"])
|
| 24 |
|
| 25 |
hr, ht = heuristic_episode(env, seed=i)
|
| 26 |
heuristic_rewards.append(hr)
|
| 27 |
+
heuristic_best_scores.append(ht[-1]["best_score"])
|
| 28 |
|
| 29 |
r_mean = sum(random_rewards) / len(random_rewards)
|
| 30 |
h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
|
| 31 |
+
r_score = sum(random_best_scores) / len(random_best_scores)
|
| 32 |
+
h_score = sum(heuristic_best_scores) / len(heuristic_best_scores)
|
| 33 |
|
| 34 |
print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
|
| 35 |
print("-" * 51)
|
| 36 |
print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
|
| 37 |
+
print(f"{'Mean best P1 score':<25} {r_score:>12.6f} {h_score:>12.6f}")
|
| 38 |
print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
|
| 39 |
print()
|
| 40 |
|
baselines/heuristic_agent.py
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
"""Heuristic baseline agent for the stellarator design environment.
|
| 2 |
|
| 3 |
Strategy: guided perturbations informed by domain knowledge.
|
| 4 |
-
1.
|
| 5 |
-
2.
|
| 6 |
3. Use restore_best to recover from any worsening.
|
| 7 |
4. Submit before exhausting budget.
|
| 8 |
"""
|
|
@@ -14,12 +14,12 @@ import sys
|
|
| 14 |
from fusion_lab.models import StellaratorAction
|
| 15 |
from server.environment import StellaratorEnvironment
|
| 16 |
|
| 17 |
-
STRATEGY: list[tuple[str, str, str
|
| 18 |
-
("
|
| 19 |
-
("
|
| 20 |
-
("
|
| 21 |
-
("
|
| 22 |
-
("
|
| 23 |
]
|
| 24 |
|
| 25 |
|
|
@@ -28,33 +28,40 @@ def heuristic_episode(
|
|
| 28 |
) -> tuple[float, list[dict[str, object]]]:
|
| 29 |
obs = env.reset(seed=seed)
|
| 30 |
total_reward = 0.0
|
| 31 |
-
trace: list[dict[str, object]] = [{"step": 0, "
|
| 32 |
-
prev_best =
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
-
for
|
| 35 |
if obs.done or obs.budget_remaining <= 1:
|
| 36 |
break
|
| 37 |
|
| 38 |
action = StellaratorAction(
|
| 39 |
intent="run",
|
| 40 |
-
|
| 41 |
direction=direction,
|
| 42 |
magnitude=magnitude,
|
| 43 |
-
restart=restart,
|
| 44 |
)
|
| 45 |
obs = env.step(action)
|
| 46 |
total_reward += obs.reward or 0.0
|
| 47 |
trace.append(
|
| 48 |
{
|
| 49 |
"step": len(trace),
|
| 50 |
-
"action": f"{
|
| 51 |
-
"
|
| 52 |
-
"
|
| 53 |
"reward": obs.reward,
|
| 54 |
}
|
| 55 |
)
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
restore = StellaratorAction(intent="restore_best")
|
| 59 |
obs = env.step(restore)
|
| 60 |
total_reward += obs.reward or 0.0
|
|
@@ -62,13 +69,13 @@ def heuristic_episode(
|
|
| 62 |
{
|
| 63 |
"step": len(trace),
|
| 64 |
"action": "restore_best",
|
| 65 |
-
"
|
| 66 |
-
"
|
| 67 |
"reward": obs.reward,
|
| 68 |
}
|
| 69 |
)
|
| 70 |
|
| 71 |
-
prev_best =
|
| 72 |
|
| 73 |
if not obs.done:
|
| 74 |
submit = StellaratorAction(intent="submit")
|
|
@@ -78,8 +85,8 @@ def heuristic_episode(
|
|
| 78 |
{
|
| 79 |
"step": len(trace),
|
| 80 |
"action": "submit",
|
| 81 |
-
"
|
| 82 |
-
"
|
| 83 |
"reward": obs.reward,
|
| 84 |
}
|
| 85 |
)
|
|
@@ -97,7 +104,7 @@ def main(n_episodes: int = 20) -> None:
|
|
| 97 |
rewards.append(total_reward)
|
| 98 |
print(
|
| 99 |
f"Episode {i:3d}: steps={len(trace) - 1} "
|
| 100 |
-
f"
|
| 101 |
f"reward={total_reward:+.4f}"
|
| 102 |
)
|
| 103 |
|
|
|
|
| 1 |
"""Heuristic baseline agent for the stellarator design environment.
|
| 2 |
|
| 3 |
Strategy: guided perturbations informed by domain knowledge.
|
| 4 |
+
1. Push elongation upward to improve triangularity.
|
| 5 |
+
2. Nudge rotational transform upward to stay on the iota side of feasibility.
|
| 6 |
3. Use restore_best to recover from any worsening.
|
| 7 |
4. Submit before exhausting budget.
|
| 8 |
"""
|
|
|
|
| 14 |
from fusion_lab.models import StellaratorAction
|
| 15 |
from server.environment import StellaratorEnvironment
|
| 16 |
|
| 17 |
+
STRATEGY: list[tuple[str, str, str]] = [
|
| 18 |
+
("elongation", "increase", "medium"),
|
| 19 |
+
("elongation", "increase", "small"),
|
| 20 |
+
("rotational_transform", "increase", "small"),
|
| 21 |
+
("aspect_ratio", "decrease", "small"),
|
| 22 |
+
("rotational_transform", "increase", "small"),
|
| 23 |
]
|
| 24 |
|
| 25 |
|
|
|
|
| 28 |
) -> tuple[float, list[dict[str, object]]]:
|
| 29 |
obs = env.reset(seed=seed)
|
| 30 |
total_reward = 0.0
|
| 31 |
+
trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
|
| 32 |
+
prev_best = (
|
| 33 |
+
int(obs.best_feasibility <= 0.01),
|
| 34 |
+
obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
|
| 35 |
+
)
|
| 36 |
|
| 37 |
+
for parameter, direction, magnitude in STRATEGY:
|
| 38 |
if obs.done or obs.budget_remaining <= 1:
|
| 39 |
break
|
| 40 |
|
| 41 |
action = StellaratorAction(
|
| 42 |
intent="run",
|
| 43 |
+
parameter=parameter,
|
| 44 |
direction=direction,
|
| 45 |
magnitude=magnitude,
|
|
|
|
| 46 |
)
|
| 47 |
obs = env.step(action)
|
| 48 |
total_reward += obs.reward or 0.0
|
| 49 |
trace.append(
|
| 50 |
{
|
| 51 |
"step": len(trace),
|
| 52 |
+
"action": f"{parameter} {direction} {magnitude}",
|
| 53 |
+
"score": obs.p1_score,
|
| 54 |
+
"best_score": obs.best_score,
|
| 55 |
"reward": obs.reward,
|
| 56 |
}
|
| 57 |
)
|
| 58 |
|
| 59 |
+
current_best = (
|
| 60 |
+
int(obs.best_feasibility <= 0.01),
|
| 61 |
+
obs.best_score if obs.best_feasibility <= 0.01 else -obs.best_feasibility,
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
if current_best < prev_best and obs.budget_remaining > 1:
|
| 65 |
restore = StellaratorAction(intent="restore_best")
|
| 66 |
obs = env.step(restore)
|
| 67 |
total_reward += obs.reward or 0.0
|
|
|
|
| 69 |
{
|
| 70 |
"step": len(trace),
|
| 71 |
"action": "restore_best",
|
| 72 |
+
"score": obs.p1_score,
|
| 73 |
+
"best_score": obs.best_score,
|
| 74 |
"reward": obs.reward,
|
| 75 |
}
|
| 76 |
)
|
| 77 |
|
| 78 |
+
prev_best = current_best
|
| 79 |
|
| 80 |
if not obs.done:
|
| 81 |
submit = StellaratorAction(intent="submit")
|
|
|
|
| 85 |
{
|
| 86 |
"step": len(trace),
|
| 87 |
"action": "submit",
|
| 88 |
+
"score": obs.p1_score,
|
| 89 |
+
"best_score": obs.best_score,
|
| 90 |
"reward": obs.reward,
|
| 91 |
}
|
| 92 |
)
|
|
|
|
| 104 |
rewards.append(total_reward)
|
| 105 |
print(
|
| 106 |
f"Episode {i:3d}: steps={len(trace) - 1} "
|
| 107 |
+
f"final_score={final['score']:.6f} best_score={final['best_score']:.6f} "
|
| 108 |
f"reward={total_reward:+.4f}"
|
| 109 |
)
|
| 110 |
|
baselines/random_agent.py
CHANGED
|
@@ -8,10 +8,9 @@ import sys
|
|
| 8 |
from fusion_lab.models import StellaratorAction
|
| 9 |
from server.environment import StellaratorEnvironment
|
| 10 |
|
| 11 |
-
|
| 12 |
DIRECTIONS = ["increase", "decrease"]
|
| 13 |
MAGNITUDES = ["small", "medium", "large"]
|
| 14 |
-
RESTARTS = ["hot", "cold"]
|
| 15 |
|
| 16 |
|
| 17 |
def random_episode(
|
|
@@ -20,7 +19,7 @@ def random_episode(
|
|
| 20 |
rng = random.Random(seed)
|
| 21 |
obs = env.reset(seed=seed)
|
| 22 |
total_reward = 0.0
|
| 23 |
-
trace: list[dict[str, object]] = [{"step": 0, "
|
| 24 |
|
| 25 |
while not obs.done:
|
| 26 |
if obs.budget_remaining <= 0:
|
|
@@ -28,10 +27,9 @@ def random_episode(
|
|
| 28 |
else:
|
| 29 |
action = StellaratorAction(
|
| 30 |
intent="run",
|
| 31 |
-
|
| 32 |
direction=rng.choice(DIRECTIONS),
|
| 33 |
magnitude=rng.choice(MAGNITUDES),
|
| 34 |
-
restart=rng.choice(RESTARTS),
|
| 35 |
)
|
| 36 |
obs = env.step(action)
|
| 37 |
total_reward += obs.reward or 0.0
|
|
@@ -39,8 +37,8 @@ def random_episode(
|
|
| 39 |
{
|
| 40 |
"step": len(trace),
|
| 41 |
"action": action.intent,
|
| 42 |
-
"
|
| 43 |
-
"
|
| 44 |
"reward": obs.reward,
|
| 45 |
}
|
| 46 |
)
|
|
@@ -58,7 +56,7 @@ def main(n_episodes: int = 20) -> None:
|
|
| 58 |
rewards.append(total_reward)
|
| 59 |
print(
|
| 60 |
f"Episode {i:3d}: steps={len(trace) - 1} "
|
| 61 |
-
f"
|
| 62 |
f"reward={total_reward:+.4f}"
|
| 63 |
)
|
| 64 |
|
|
|
|
| 8 |
from fusion_lab.models import StellaratorAction
|
| 9 |
from server.environment import StellaratorEnvironment
|
| 10 |
|
| 11 |
+
PARAMETERS = ["aspect_ratio", "elongation", "rotational_transform"]
|
| 12 |
DIRECTIONS = ["increase", "decrease"]
|
| 13 |
MAGNITUDES = ["small", "medium", "large"]
|
|
|
|
| 14 |
|
| 15 |
|
| 16 |
def random_episode(
|
|
|
|
| 19 |
rng = random.Random(seed)
|
| 20 |
obs = env.reset(seed=seed)
|
| 21 |
total_reward = 0.0
|
| 22 |
+
trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
|
| 23 |
|
| 24 |
while not obs.done:
|
| 25 |
if obs.budget_remaining <= 0:
|
|
|
|
| 27 |
else:
|
| 28 |
action = StellaratorAction(
|
| 29 |
intent="run",
|
| 30 |
+
parameter=rng.choice(PARAMETERS),
|
| 31 |
direction=rng.choice(DIRECTIONS),
|
| 32 |
magnitude=rng.choice(MAGNITUDES),
|
|
|
|
| 33 |
)
|
| 34 |
obs = env.step(action)
|
| 35 |
total_reward += obs.reward or 0.0
|
|
|
|
| 37 |
{
|
| 38 |
"step": len(trace),
|
| 39 |
"action": action.intent,
|
| 40 |
+
"score": obs.p1_score,
|
| 41 |
+
"best_score": obs.best_score,
|
| 42 |
"reward": obs.reward,
|
| 43 |
}
|
| 44 |
)
|
|
|
|
| 56 |
rewards.append(total_reward)
|
| 57 |
print(
|
| 58 |
f"Episode {i:3d}: steps={len(trace) - 1} "
|
| 59 |
+
f"final_score={final['score']:.6f} best_score={final['best_score']:.6f} "
|
| 60 |
f"reward={total_reward:+.4f}"
|
| 61 |
)
|
| 62 |
|
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -6,6 +6,15 @@ This checklist turns the updated deliverables map and Plan V2 into concrete exec
|
|
| 6 |
|
| 7 |
Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Plan V2 Inheritance
|
| 10 |
|
| 11 |
Carry these rules through the whole checklist:
|
|
|
|
| 6 |
|
| 7 |
Do not expand scope beyond one stable task. Training is supporting evidence, not the main story.
|
| 8 |
|
| 9 |
+
## Current Branch Status
|
| 10 |
+
|
| 11 |
+
- [x] `P1` task is locked
|
| 12 |
+
- [x] rotating-ellipse `P1` contract is implemented in the working tree
|
| 13 |
+
- [x] baselines and API surface have been moved to the `P1` contract
|
| 14 |
+
- [x] add a post-terminal guard in `step()`
|
| 15 |
+
- [ ] replace the synthetic evaluator with `constellaration`
|
| 16 |
+
- [ ] add tracked fixtures and manual playtest evidence
|
| 17 |
+
|
| 18 |
## Plan V2 Inheritance
|
| 19 |
|
| 20 |
Carry these rules through the whole checklist:
|
docs/PIVOT_P1_ROTATING_ELLIPSE.md
CHANGED
|
@@ -220,7 +220,7 @@ If constellaration deployment fails (Docker build, HF Spaces issues):
|
|
| 220 |
|
| 221 |
Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
|
| 222 |
|
| 223 |
-
1. **
|
| 224 |
2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
|
| 225 |
3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
|
| 226 |
|
|
|
|
| 220 |
|
| 221 |
Start with 1-2 rotating-ellipse configurations for sanity checks and expand only if the implementation needs more coverage:
|
| 222 |
|
| 223 |
+
1. **Repairable baseline anchor:** aspect_ratio=3.5, elongation=1.5, rotational_transform=0.4 — intentionally infeasible at reset but close enough to support short repair/improvement episodes
|
| 224 |
2. **Infeasible reference:** aspect_ratio=5.0, elongation=3.0, rotational_transform=0.2 — expected to violate constraints
|
| 225 |
3. **Baseline comparison:** add only if manual playtesting shows a second start state is useful
|
| 226 |
|
fusion_lab/models.py
CHANGED
|
@@ -3,46 +3,66 @@ from __future__ import annotations
|
|
| 3 |
from typing import Literal
|
| 4 |
|
| 5 |
from openenv.core import Action, Observation, State
|
| 6 |
-
from pydantic import Field
|
| 7 |
|
| 8 |
ActionIntent = Literal["run", "submit", "restore_best"]
|
| 9 |
-
|
| 10 |
DirectionName = Literal["increase", "decrease"]
|
| 11 |
MagnitudeName = Literal["small", "medium", "large"]
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
class StellaratorAction(Action):
|
| 16 |
intent: ActionIntent
|
| 17 |
-
|
| 18 |
direction: DirectionName | None = None
|
| 19 |
magnitude: MagnitudeName | None = None
|
| 20 |
-
restart: RestartMode | None = None
|
| 21 |
reasoning: str = ""
|
| 22 |
|
| 23 |
|
| 24 |
class StellaratorObservation(Observation):
|
| 25 |
diagnostics_text: str = ""
|
| 26 |
-
|
| 27 |
aspect_ratio: float = 0.0
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
step_number: int = 0
|
| 34 |
budget_remaining: int = 6
|
| 35 |
-
|
|
|
|
| 36 |
constraints_satisfied: bool = True
|
| 37 |
target_spec: str = ""
|
| 38 |
|
| 39 |
|
| 40 |
class StellaratorState(State):
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
budget_total: int = 6
|
| 46 |
budget_remaining: int = 6
|
|
|
|
| 47 |
constraints_satisfied: bool = True
|
| 48 |
history: list[str] = Field(default_factory=list)
|
|
|
|
| 3 |
from typing import Literal
|
| 4 |
|
| 5 |
from openenv.core import Action, Observation, State
|
| 6 |
+
from pydantic import BaseModel, Field
|
| 7 |
|
| 8 |
ActionIntent = Literal["run", "submit", "restore_best"]
|
| 9 |
+
ParameterName = Literal["aspect_ratio", "elongation", "rotational_transform"]
|
| 10 |
DirectionName = Literal["increase", "decrease"]
|
| 11 |
MagnitudeName = Literal["small", "medium", "large"]
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
class RotatingEllipseParams(BaseModel):
|
| 15 |
+
aspect_ratio: float
|
| 16 |
+
elongation: float
|
| 17 |
+
rotational_transform: float
|
| 18 |
|
| 19 |
|
| 20 |
class StellaratorAction(Action):
|
| 21 |
intent: ActionIntent
|
| 22 |
+
parameter: ParameterName | None = None
|
| 23 |
direction: DirectionName | None = None
|
| 24 |
magnitude: MagnitudeName | None = None
|
|
|
|
| 25 |
reasoning: str = ""
|
| 26 |
|
| 27 |
|
| 28 |
class StellaratorObservation(Observation):
|
| 29 |
diagnostics_text: str = ""
|
| 30 |
+
max_elongation: float = 0.0
|
| 31 |
aspect_ratio: float = 0.0
|
| 32 |
+
average_triangularity: float = 0.0
|
| 33 |
+
edge_iota_over_nfp: float = 0.0
|
| 34 |
+
p1_score: float = 0.0
|
| 35 |
+
p1_feasibility: float = 0.0
|
| 36 |
+
vacuum_well: float = 0.0
|
| 37 |
step_number: int = 0
|
| 38 |
budget_remaining: int = 6
|
| 39 |
+
best_score: float = 0.0
|
| 40 |
+
best_feasibility: float = float("inf")
|
| 41 |
constraints_satisfied: bool = True
|
| 42 |
target_spec: str = ""
|
| 43 |
|
| 44 |
|
| 45 |
class StellaratorState(State):
|
| 46 |
+
current_params: RotatingEllipseParams = Field(
|
| 47 |
+
default_factory=lambda: RotatingEllipseParams(
|
| 48 |
+
aspect_ratio=3.5,
|
| 49 |
+
elongation=1.5,
|
| 50 |
+
rotational_transform=0.4,
|
| 51 |
+
)
|
| 52 |
+
)
|
| 53 |
+
best_params: RotatingEllipseParams = Field(
|
| 54 |
+
default_factory=lambda: RotatingEllipseParams(
|
| 55 |
+
aspect_ratio=3.5,
|
| 56 |
+
elongation=1.5,
|
| 57 |
+
rotational_transform=0.4,
|
| 58 |
+
)
|
| 59 |
+
)
|
| 60 |
+
initial_score: float = 0.0
|
| 61 |
+
best_score: float = 0.0
|
| 62 |
+
current_feasibility: float = float("inf")
|
| 63 |
+
best_feasibility: float = float("inf")
|
| 64 |
budget_total: int = 6
|
| 65 |
budget_remaining: int = 6
|
| 66 |
+
episode_done: bool = False
|
| 67 |
constraints_satisfied: bool = True
|
| 68 |
history: list[str] = Field(default_factory=list)
|
server/app.py
CHANGED
|
@@ -4,10 +4,11 @@ from openenv.core import create_fastapi_app
|
|
| 4 |
|
| 5 |
from fusion_lab.models import StellaratorAction, StellaratorObservation
|
| 6 |
from server.environment import (
|
| 7 |
-
|
|
|
|
| 8 |
BUDGET,
|
| 9 |
-
|
| 10 |
-
|
| 11 |
StellaratorEnvironment,
|
| 12 |
)
|
| 13 |
|
|
@@ -21,18 +22,18 @@ app = create_fastapi_app(
|
|
| 21 |
@app.get("/task")
|
| 22 |
def task_summary() -> dict[str, object]:
|
| 23 |
return {
|
| 24 |
-
"description": "
|
| 25 |
"constraints": {
|
| 26 |
-
"
|
| 27 |
-
"
|
| 28 |
-
"
|
| 29 |
},
|
|
|
|
| 30 |
"budget": BUDGET,
|
| 31 |
"actions": ["run", "submit", "restore_best"],
|
| 32 |
-
"
|
| 33 |
"directions": ["increase", "decrease"],
|
| 34 |
"magnitudes": ["small", "medium", "large"],
|
| 35 |
-
"restart_modes": ["hot", "cold"],
|
| 36 |
}
|
| 37 |
|
| 38 |
|
|
|
|
| 4 |
|
| 5 |
from fusion_lab.models import StellaratorAction, StellaratorObservation
|
| 6 |
from server.environment import (
|
| 7 |
+
ASPECT_RATIO_MAX,
|
| 8 |
+
AVERAGE_TRIANGULARITY_MAX,
|
| 9 |
BUDGET,
|
| 10 |
+
EDGE_IOTA_OVER_NFP_MIN,
|
| 11 |
+
N_FIELD_PERIODS,
|
| 12 |
StellaratorEnvironment,
|
| 13 |
)
|
| 14 |
|
|
|
|
| 22 |
@app.get("/task")
|
| 23 |
def task_summary() -> dict[str, object]:
|
| 24 |
return {
|
| 25 |
+
"description": "Optimize the P1 benchmark with a rotating-ellipse parameterization.",
|
| 26 |
"constraints": {
|
| 27 |
+
"aspect_ratio_max": ASPECT_RATIO_MAX,
|
| 28 |
+
"average_triangularity_max": AVERAGE_TRIANGULARITY_MAX,
|
| 29 |
+
"edge_iota_over_nfp_min": EDGE_IOTA_OVER_NFP_MIN,
|
| 30 |
},
|
| 31 |
+
"n_field_periods": N_FIELD_PERIODS,
|
| 32 |
"budget": BUDGET,
|
| 33 |
"actions": ["run", "submit", "restore_best"],
|
| 34 |
+
"parameters": ["aspect_ratio", "elongation", "rotational_transform"],
|
| 35 |
"directions": ["increase", "decrease"],
|
| 36 |
"magnitudes": ["small", "medium", "large"],
|
|
|
|
| 37 |
}
|
| 38 |
|
| 39 |
|
server/environment.py
CHANGED
|
@@ -1,47 +1,62 @@
|
|
| 1 |
from __future__ import annotations
|
| 2 |
|
|
|
|
| 3 |
from typing import Any, Final, Optional
|
| 4 |
|
| 5 |
from openenv.core import Environment as BaseEnvironment
|
| 6 |
|
| 7 |
from fusion_lab.models import (
|
|
|
|
| 8 |
StellaratorAction,
|
| 9 |
StellaratorObservation,
|
| 10 |
StellaratorState,
|
| 11 |
)
|
| 12 |
-
from server.physics import
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
|
| 14 |
BUDGET: Final[int] = 6
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
TARGET_SPEC: Final[str] = (
|
| 21 |
-
"
|
| 22 |
-
"Constraints: aspect ratio
|
|
|
|
| 23 |
"Budget: 6 evaluations."
|
| 24 |
)
|
| 25 |
|
| 26 |
|
| 27 |
-
def check_constraints(diag: Diagnostics) -> bool:
|
| 28 |
-
ar_lo, ar_hi = ASPECT_RATIO_RANGE
|
| 29 |
-
iota_lo, iota_hi = IOTA_EDGE_RANGE
|
| 30 |
-
return (
|
| 31 |
-
ar_lo <= diag.aspect_ratio <= ar_hi
|
| 32 |
-
and iota_lo <= diag.iota_edge <= iota_hi
|
| 33 |
-
and diag.volume >= VOLUME_MIN
|
| 34 |
-
)
|
| 35 |
-
|
| 36 |
-
|
| 37 |
class StellaratorEnvironment(
|
| 38 |
BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
|
| 39 |
):
|
| 40 |
def __init__(self) -> None:
|
| 41 |
super().__init__()
|
| 42 |
-
self._engine = PhysicsEngine()
|
| 43 |
self._state = StellaratorState()
|
| 44 |
-
self.
|
|
|
|
| 45 |
|
| 46 |
def reset(
|
| 47 |
self,
|
|
@@ -49,22 +64,27 @@ class StellaratorEnvironment(
|
|
| 49 |
episode_id: Optional[str] = None,
|
| 50 |
**kwargs: Any,
|
| 51 |
) -> StellaratorObservation:
|
| 52 |
-
|
| 53 |
-
|
|
|
|
| 54 |
self._state = StellaratorState(
|
| 55 |
episode_id=episode_id,
|
| 56 |
step_count=0,
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
|
|
|
|
|
|
| 61 |
budget_total=BUDGET,
|
| 62 |
budget_remaining=BUDGET,
|
| 63 |
-
|
|
|
|
| 64 |
)
|
| 65 |
-
self.
|
| 66 |
return self._build_observation(
|
| 67 |
-
|
|
|
|
| 68 |
)
|
| 69 |
|
| 70 |
def step(
|
|
@@ -73,7 +93,15 @@ class StellaratorEnvironment(
|
|
| 73 |
timeout_s: Optional[float] = None,
|
| 74 |
**kwargs: Any,
|
| 75 |
) -> StellaratorObservation:
|
| 76 |
-
self._state.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
self._state.step_count += 1
|
| 78 |
|
| 79 |
if action.intent == "submit":
|
|
@@ -91,108 +119,131 @@ class StellaratorEnvironment(
|
|
| 91 |
# ------------------------------------------------------------------
|
| 92 |
|
| 93 |
def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
|
| 94 |
-
if not all([action.
|
| 95 |
return self._handle_invalid_run()
|
| 96 |
|
| 97 |
self._state.budget_remaining -= 1
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
direction=action.direction,
|
| 102 |
magnitude=action.magnitude,
|
| 103 |
-
restart=action.restart or "hot",
|
| 104 |
)
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
if diag.qs_residual < self._state.best_qs:
|
| 111 |
-
self._state.best_qs = diag.qs_residual
|
| 112 |
-
self._state.constraints_satisfied = satisfied
|
| 113 |
|
| 114 |
done = self._state.budget_remaining <= 0
|
| 115 |
-
reward = self._compute_reward(
|
| 116 |
-
summary = self._summary_run(action,
|
| 117 |
self._state.history.append(summary)
|
| 118 |
-
self.
|
|
|
|
| 119 |
|
| 120 |
return self._build_observation(
|
| 121 |
-
|
|
|
|
|
|
|
|
|
|
| 122 |
)
|
| 123 |
|
| 124 |
def _handle_submit(self) -> StellaratorObservation:
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
summary = self._summary_submit(satisfied)
|
| 129 |
self._state.history.append(summary)
|
|
|
|
| 130 |
|
| 131 |
return self._build_observation(
|
| 132 |
-
|
|
|
|
|
|
|
|
|
|
| 133 |
)
|
| 134 |
|
| 135 |
def _handle_restore(self) -> StellaratorObservation:
|
| 136 |
self._state.budget_remaining -= 1
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
self._state.
|
| 140 |
-
|
| 141 |
-
self._state.constraints_satisfied = satisfied
|
| 142 |
|
| 143 |
done = self._state.budget_remaining <= 0
|
| 144 |
-
reward = self._compute_reward(
|
| 145 |
-
summary =
|
|
|
|
|
|
|
|
|
|
| 146 |
self._state.history.append(summary)
|
| 147 |
-
self.
|
|
|
|
| 148 |
|
| 149 |
return self._build_observation(
|
| 150 |
-
|
|
|
|
|
|
|
|
|
|
| 151 |
)
|
| 152 |
|
| 153 |
def _handle_invalid_run(self) -> StellaratorObservation:
|
| 154 |
self._state.budget_remaining -= 1
|
| 155 |
-
|
| 156 |
-
satisfied = check_constraints(diag)
|
| 157 |
done = self._state.budget_remaining <= 0
|
| 158 |
-
summary = "Invalid run action:
|
| 159 |
self._state.history.append(summary)
|
|
|
|
| 160 |
return self._build_observation(
|
| 161 |
-
|
|
|
|
|
|
|
|
|
|
| 162 |
)
|
| 163 |
|
| 164 |
# ------------------------------------------------------------------
|
| 165 |
# Reward V0
|
| 166 |
# ------------------------------------------------------------------
|
| 167 |
|
| 168 |
-
def _compute_reward(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
reward = 0.0
|
| 170 |
|
| 171 |
-
if
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
if diag.converged and not check_constraints(diag):
|
| 176 |
-
reward -= 2.0
|
| 177 |
|
| 178 |
-
if
|
| 179 |
-
reward
|
|
|
|
|
|
|
| 180 |
|
| 181 |
if intent != "submit":
|
| 182 |
reward -= 0.1
|
| 183 |
|
| 184 |
if intent == "submit":
|
| 185 |
-
if self._state.
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
|
|
|
|
|
|
| 189 |
else:
|
| 190 |
reward -= 1.0
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
| 196 |
|
| 197 |
return round(reward, 4)
|
| 198 |
|
|
@@ -202,8 +253,7 @@ class StellaratorEnvironment(
|
|
| 202 |
|
| 203 |
def _build_observation(
|
| 204 |
self,
|
| 205 |
-
|
| 206 |
-
satisfied: bool,
|
| 207 |
action_summary: str,
|
| 208 |
reward: float | None = None,
|
| 209 |
done: bool = False,
|
|
@@ -211,29 +261,30 @@ class StellaratorEnvironment(
|
|
| 211 |
text_lines = [
|
| 212 |
action_summary,
|
| 213 |
"",
|
| 214 |
-
f"
|
| 215 |
-
f"
|
| 216 |
-
f"
|
| 217 |
-
f"
|
| 218 |
-
f"
|
| 219 |
-
f"
|
| 220 |
-
f"
|
| 221 |
-
f"
|
| 222 |
]
|
| 223 |
|
| 224 |
return StellaratorObservation(
|
| 225 |
diagnostics_text="\n".join(text_lines),
|
| 226 |
-
|
| 227 |
-
aspect_ratio=
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
step_number=self._state.step_count,
|
| 234 |
budget_remaining=self._state.budget_remaining,
|
| 235 |
-
|
| 236 |
-
|
|
|
|
| 237 |
target_spec=TARGET_SPEC,
|
| 238 |
reward=reward,
|
| 239 |
done=done,
|
|
@@ -243,20 +294,85 @@ class StellaratorEnvironment(
|
|
| 243 |
# Action summaries
|
| 244 |
# ------------------------------------------------------------------
|
| 245 |
|
| 246 |
-
def _summary_run(self, action: StellaratorAction,
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
| 251 |
-
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 255 |
|
| 256 |
-
def _summary_submit(self,
|
| 257 |
-
status = "Constraints satisfied." if satisfied else "Constraints VIOLATED."
|
| 258 |
-
improvement = self._state.initial_qs - self._state.best_qs
|
| 259 |
return (
|
| 260 |
-
f"
|
| 261 |
-
f"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 262 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
from __future__ import annotations
|
| 2 |
|
| 3 |
+
from random import Random
|
| 4 |
from typing import Any, Final, Optional
|
| 5 |
|
| 6 |
from openenv.core import Environment as BaseEnvironment
|
| 7 |
|
| 8 |
from fusion_lab.models import (
|
| 9 |
+
RotatingEllipseParams,
|
| 10 |
StellaratorAction,
|
| 11 |
StellaratorObservation,
|
| 12 |
StellaratorState,
|
| 13 |
)
|
| 14 |
+
from server.physics import (
|
| 15 |
+
ASPECT_RATIO_MAX,
|
| 16 |
+
AVERAGE_TRIANGULARITY_MAX,
|
| 17 |
+
EDGE_IOTA_OVER_NFP_MIN,
|
| 18 |
+
FEASIBILITY_TOLERANCE,
|
| 19 |
+
EvaluationMetrics,
|
| 20 |
+
evaluate_params,
|
| 21 |
+
)
|
| 22 |
|
| 23 |
BUDGET: Final[int] = 6
|
| 24 |
+
N_FIELD_PERIODS: Final[int] = 3
|
| 25 |
+
|
| 26 |
+
PARAMETER_RANGES: Final[dict[str, tuple[float, float]]] = {
|
| 27 |
+
"aspect_ratio": (2.0, 8.0),
|
| 28 |
+
"elongation": (1.0, 5.0),
|
| 29 |
+
"rotational_transform": (0.1, 1.0),
|
| 30 |
+
}
|
| 31 |
+
|
| 32 |
+
PARAMETER_DELTAS: Final[dict[str, dict[str, float]]] = {
|
| 33 |
+
"aspect_ratio": {"small": 0.1, "medium": 0.3, "large": 0.8},
|
| 34 |
+
"elongation": {"small": 0.1, "medium": 0.3, "large": 0.8},
|
| 35 |
+
"rotational_transform": {"small": 0.02, "medium": 0.05, "large": 0.15},
|
| 36 |
+
}
|
| 37 |
+
|
| 38 |
+
BASELINE_PARAMS: Final[RotatingEllipseParams] = RotatingEllipseParams(
|
| 39 |
+
aspect_ratio=3.5,
|
| 40 |
+
elongation=1.5,
|
| 41 |
+
rotational_transform=0.4,
|
| 42 |
+
)
|
| 43 |
|
| 44 |
TARGET_SPEC: Final[str] = (
|
| 45 |
+
"Optimize the P1 benchmark using a rotating-ellipse parameterization. "
|
| 46 |
+
"Constraints: aspect ratio <= 4.0, average triangularity <= -0.5, "
|
| 47 |
+
"edge rotational transform / n_field_periods >= 0.3. "
|
| 48 |
"Budget: 6 evaluations."
|
| 49 |
)
|
| 50 |
|
| 51 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 52 |
class StellaratorEnvironment(
|
| 53 |
BaseEnvironment[StellaratorAction, StellaratorObservation, StellaratorState]
|
| 54 |
):
|
| 55 |
def __init__(self) -> None:
|
| 56 |
super().__init__()
|
|
|
|
| 57 |
self._state = StellaratorState()
|
| 58 |
+
self._last_metrics: EvaluationMetrics | None = None
|
| 59 |
+
self._rng = Random()
|
| 60 |
|
| 61 |
def reset(
|
| 62 |
self,
|
|
|
|
| 64 |
episode_id: Optional[str] = None,
|
| 65 |
**kwargs: Any,
|
| 66 |
) -> StellaratorObservation:
|
| 67 |
+
self._rng = Random(seed)
|
| 68 |
+
params = self._initial_params(seed)
|
| 69 |
+
metrics = evaluate_params(params)
|
| 70 |
self._state = StellaratorState(
|
| 71 |
episode_id=episode_id,
|
| 72 |
step_count=0,
|
| 73 |
+
current_params=params,
|
| 74 |
+
best_params=params,
|
| 75 |
+
initial_score=metrics.p1_score,
|
| 76 |
+
best_score=metrics.p1_score,
|
| 77 |
+
current_feasibility=metrics.p1_feasibility,
|
| 78 |
+
best_feasibility=metrics.p1_feasibility,
|
| 79 |
budget_total=BUDGET,
|
| 80 |
budget_remaining=BUDGET,
|
| 81 |
+
episode_done=False,
|
| 82 |
+
constraints_satisfied=metrics.constraints_satisfied,
|
| 83 |
)
|
| 84 |
+
self._last_metrics = metrics
|
| 85 |
return self._build_observation(
|
| 86 |
+
metrics,
|
| 87 |
+
action_summary="Episode started from the rotating-ellipse baseline.",
|
| 88 |
)
|
| 89 |
|
| 90 |
def step(
|
|
|
|
| 93 |
timeout_s: Optional[float] = None,
|
| 94 |
**kwargs: Any,
|
| 95 |
) -> StellaratorObservation:
|
| 96 |
+
if self._state.episode_done or self._state.budget_remaining <= 0:
|
| 97 |
+
metrics = self._last_metrics or evaluate_params(self._state.current_params)
|
| 98 |
+
return self._build_observation(
|
| 99 |
+
metrics,
|
| 100 |
+
action_summary=("Episode already ended. Call reset() before sending more actions."),
|
| 101 |
+
reward=0.0,
|
| 102 |
+
done=True,
|
| 103 |
+
)
|
| 104 |
+
|
| 105 |
self._state.step_count += 1
|
| 106 |
|
| 107 |
if action.intent == "submit":
|
|
|
|
| 119 |
# ------------------------------------------------------------------
|
| 120 |
|
| 121 |
def _handle_run(self, action: StellaratorAction) -> StellaratorObservation:
|
| 122 |
+
if not all([action.parameter, action.direction, action.magnitude]):
|
| 123 |
return self._handle_invalid_run()
|
| 124 |
|
| 125 |
self._state.budget_remaining -= 1
|
| 126 |
+
params = self._apply_action(
|
| 127 |
+
params=self._state.current_params,
|
| 128 |
+
parameter=action.parameter,
|
| 129 |
direction=action.direction,
|
| 130 |
magnitude=action.magnitude,
|
|
|
|
| 131 |
)
|
| 132 |
+
metrics = evaluate_params(params)
|
| 133 |
+
self._state.current_params = params
|
| 134 |
+
self._state.current_feasibility = metrics.p1_feasibility
|
| 135 |
+
self._state.constraints_satisfied = metrics.constraints_satisfied
|
| 136 |
+
self._update_best(params, metrics)
|
|
|
|
|
|
|
|
|
|
| 137 |
|
| 138 |
done = self._state.budget_remaining <= 0
|
| 139 |
+
reward = self._compute_reward(metrics, action.intent, done)
|
| 140 |
+
summary = self._summary_run(action, metrics)
|
| 141 |
self._state.history.append(summary)
|
| 142 |
+
self._last_metrics = metrics
|
| 143 |
+
self._state.episode_done = done
|
| 144 |
|
| 145 |
return self._build_observation(
|
| 146 |
+
metrics,
|
| 147 |
+
action_summary=summary,
|
| 148 |
+
reward=reward,
|
| 149 |
+
done=done,
|
| 150 |
)
|
| 151 |
|
| 152 |
def _handle_submit(self) -> StellaratorObservation:
|
| 153 |
+
metrics = self._last_metrics or evaluate_params(self._state.current_params)
|
| 154 |
+
reward = self._compute_reward(metrics, "submit", done=True)
|
| 155 |
+
summary = self._summary_submit(metrics)
|
|
|
|
| 156 |
self._state.history.append(summary)
|
| 157 |
+
self._state.episode_done = True
|
| 158 |
|
| 159 |
return self._build_observation(
|
| 160 |
+
metrics,
|
| 161 |
+
action_summary=summary,
|
| 162 |
+
reward=reward,
|
| 163 |
+
done=True,
|
| 164 |
)
|
| 165 |
|
| 166 |
def _handle_restore(self) -> StellaratorObservation:
|
| 167 |
self._state.budget_remaining -= 1
|
| 168 |
+
self._state.current_params = self._state.best_params
|
| 169 |
+
metrics = evaluate_params(self._state.current_params)
|
| 170 |
+
self._state.current_feasibility = metrics.p1_feasibility
|
| 171 |
+
self._state.constraints_satisfied = metrics.constraints_satisfied
|
|
|
|
| 172 |
|
| 173 |
done = self._state.budget_remaining <= 0
|
| 174 |
+
reward = self._compute_reward(metrics, "restore_best", done)
|
| 175 |
+
summary = (
|
| 176 |
+
"Restored the best-known design. "
|
| 177 |
+
f"Score={metrics.p1_score:.6f}, feasibility={metrics.p1_feasibility:.6f}."
|
| 178 |
+
)
|
| 179 |
self._state.history.append(summary)
|
| 180 |
+
self._last_metrics = metrics
|
| 181 |
+
self._state.episode_done = done
|
| 182 |
|
| 183 |
return self._build_observation(
|
| 184 |
+
metrics,
|
| 185 |
+
action_summary=summary,
|
| 186 |
+
reward=reward,
|
| 187 |
+
done=done,
|
| 188 |
)
|
| 189 |
|
| 190 |
def _handle_invalid_run(self) -> StellaratorObservation:
|
| 191 |
self._state.budget_remaining -= 1
|
| 192 |
+
metrics = self._last_metrics or evaluate_params(self._state.current_params)
|
|
|
|
| 193 |
done = self._state.budget_remaining <= 0
|
| 194 |
+
summary = "Invalid run action: parameter, direction, and magnitude are required."
|
| 195 |
self._state.history.append(summary)
|
| 196 |
+
self._state.episode_done = done
|
| 197 |
return self._build_observation(
|
| 198 |
+
metrics,
|
| 199 |
+
action_summary=summary,
|
| 200 |
+
reward=-1.0,
|
| 201 |
+
done=done,
|
| 202 |
)
|
| 203 |
|
| 204 |
# ------------------------------------------------------------------
|
| 205 |
# Reward V0
|
| 206 |
# ------------------------------------------------------------------
|
| 207 |
|
| 208 |
+
def _compute_reward(
|
| 209 |
+
self,
|
| 210 |
+
metrics: EvaluationMetrics,
|
| 211 |
+
intent: str,
|
| 212 |
+
done: bool,
|
| 213 |
+
) -> float:
|
| 214 |
+
previous_metrics = self._last_metrics or metrics
|
| 215 |
reward = 0.0
|
| 216 |
|
| 217 |
+
if metrics.constraints_satisfied and not previous_metrics.constraints_satisfied:
|
| 218 |
+
reward += 3.0
|
| 219 |
+
if previous_metrics.constraints_satisfied and not metrics.constraints_satisfied:
|
| 220 |
+
reward -= 3.0
|
|
|
|
|
|
|
| 221 |
|
| 222 |
+
if metrics.constraints_satisfied:
|
| 223 |
+
reward += (previous_metrics.max_elongation - metrics.max_elongation) * 10.0
|
| 224 |
+
else:
|
| 225 |
+
reward += (previous_metrics.p1_feasibility - metrics.p1_feasibility) * 5.0
|
| 226 |
|
| 227 |
if intent != "submit":
|
| 228 |
reward -= 0.1
|
| 229 |
|
| 230 |
if intent == "submit":
|
| 231 |
+
if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
|
| 232 |
+
improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
|
| 233 |
+
1.0 - self._state.initial_score, 1e-6
|
| 234 |
+
)
|
| 235 |
+
budget_efficiency = self._state.budget_remaining / self._state.budget_total
|
| 236 |
+
reward += 5.0 * improvement_ratio + budget_efficiency
|
| 237 |
else:
|
| 238 |
reward -= 1.0
|
| 239 |
+
elif done:
|
| 240 |
+
if metrics.constraints_satisfied and self._state.best_score > self._state.initial_score:
|
| 241 |
+
improvement_ratio = (self._state.best_score - self._state.initial_score) / max(
|
| 242 |
+
1.0 - self._state.initial_score, 1e-6
|
| 243 |
+
)
|
| 244 |
+
reward += 2.0 * improvement_ratio
|
| 245 |
+
else:
|
| 246 |
+
reward -= 0.5
|
| 247 |
|
| 248 |
return round(reward, 4)
|
| 249 |
|
|
|
|
| 253 |
|
| 254 |
def _build_observation(
|
| 255 |
self,
|
| 256 |
+
metrics: EvaluationMetrics,
|
|
|
|
| 257 |
action_summary: str,
|
| 258 |
reward: float | None = None,
|
| 259 |
done: bool = False,
|
|
|
|
| 261 |
text_lines = [
|
| 262 |
action_summary,
|
| 263 |
"",
|
| 264 |
+
f"max_elongation={metrics.max_elongation:.4f} | best_score={self._state.best_score:.6f}",
|
| 265 |
+
f"aspect_ratio={metrics.aspect_ratio:.4f} (<= {ASPECT_RATIO_MAX:.1f})",
|
| 266 |
+
f"average_triangularity={metrics.average_triangularity:.4f} (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
|
| 267 |
+
f"edge_iota_over_nfp={metrics.edge_iota_over_nfp:.4f} (>= {EDGE_IOTA_OVER_NFP_MIN:.1f})",
|
| 268 |
+
f"feasibility={metrics.p1_feasibility:.6f} | best_feasibility={self._state.best_feasibility:.6f}",
|
| 269 |
+
f"vacuum_well={metrics.vacuum_well:.4f}",
|
| 270 |
+
f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
|
| 271 |
+
f"step={self._state.step_count} | budget={self._state.budget_remaining}/{self._state.budget_total}",
|
| 272 |
]
|
| 273 |
|
| 274 |
return StellaratorObservation(
|
| 275 |
diagnostics_text="\n".join(text_lines),
|
| 276 |
+
max_elongation=metrics.max_elongation,
|
| 277 |
+
aspect_ratio=metrics.aspect_ratio,
|
| 278 |
+
average_triangularity=metrics.average_triangularity,
|
| 279 |
+
edge_iota_over_nfp=metrics.edge_iota_over_nfp,
|
| 280 |
+
p1_score=metrics.p1_score,
|
| 281 |
+
p1_feasibility=metrics.p1_feasibility,
|
| 282 |
+
vacuum_well=metrics.vacuum_well,
|
| 283 |
step_number=self._state.step_count,
|
| 284 |
budget_remaining=self._state.budget_remaining,
|
| 285 |
+
best_score=self._state.best_score,
|
| 286 |
+
best_feasibility=self._state.best_feasibility,
|
| 287 |
+
constraints_satisfied=metrics.constraints_satisfied,
|
| 288 |
target_spec=TARGET_SPEC,
|
| 289 |
reward=reward,
|
| 290 |
done=done,
|
|
|
|
| 294 |
# Action summaries
|
| 295 |
# ------------------------------------------------------------------
|
| 296 |
|
| 297 |
+
def _summary_run(self, action: StellaratorAction, metrics: EvaluationMetrics) -> str:
|
| 298 |
+
assert action.parameter is not None
|
| 299 |
+
assert action.direction is not None
|
| 300 |
+
assert action.magnitude is not None
|
| 301 |
+
previous_metrics = self._last_metrics or metrics
|
| 302 |
+
if metrics.constraints_satisfied:
|
| 303 |
+
delta = previous_metrics.max_elongation - metrics.max_elongation
|
| 304 |
+
objective_summary = (
|
| 305 |
+
f"max_elongation changed by {delta:+.4f} to {metrics.max_elongation:.4f}."
|
| 306 |
+
)
|
| 307 |
+
else:
|
| 308 |
+
delta = previous_metrics.p1_feasibility - metrics.p1_feasibility
|
| 309 |
+
objective_summary = (
|
| 310 |
+
f"feasibility changed by {delta:+.6f} to {metrics.p1_feasibility:.6f}."
|
| 311 |
+
)
|
| 312 |
+
return (
|
| 313 |
+
f"Applied {action.parameter} {action.direction} {action.magnitude}. {objective_summary}"
|
| 314 |
+
)
|
| 315 |
|
| 316 |
+
def _summary_submit(self, metrics: EvaluationMetrics) -> str:
|
|
|
|
|
|
|
| 317 |
return (
|
| 318 |
+
f"Submitted design with best_score={self._state.best_score:.6f}, "
|
| 319 |
+
f"best_feasibility={self._state.best_feasibility:.6f}, "
|
| 320 |
+
f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}."
|
| 321 |
+
)
|
| 322 |
+
|
| 323 |
+
def _initial_params(self, seed: int | None) -> RotatingEllipseParams:
|
| 324 |
+
if seed is None:
|
| 325 |
+
return BASELINE_PARAMS
|
| 326 |
+
rng = Random(seed)
|
| 327 |
+
return RotatingEllipseParams(
|
| 328 |
+
aspect_ratio=self._clamp(
|
| 329 |
+
BASELINE_PARAMS.aspect_ratio + rng.uniform(-0.1, 0.1),
|
| 330 |
+
parameter="aspect_ratio",
|
| 331 |
+
),
|
| 332 |
+
elongation=self._clamp(
|
| 333 |
+
BASELINE_PARAMS.elongation + rng.uniform(-0.1, 0.1),
|
| 334 |
+
parameter="elongation",
|
| 335 |
+
),
|
| 336 |
+
rotational_transform=self._clamp(
|
| 337 |
+
BASELINE_PARAMS.rotational_transform + rng.uniform(-0.015, 0.015),
|
| 338 |
+
parameter="rotational_transform",
|
| 339 |
+
),
|
| 340 |
+
)
|
| 341 |
+
|
| 342 |
+
def _apply_action(
|
| 343 |
+
self,
|
| 344 |
+
params: RotatingEllipseParams,
|
| 345 |
+
parameter: str,
|
| 346 |
+
direction: str,
|
| 347 |
+
magnitude: str,
|
| 348 |
+
) -> RotatingEllipseParams:
|
| 349 |
+
delta = PARAMETER_DELTAS[parameter][magnitude]
|
| 350 |
+
signed_delta = delta if direction == "increase" else -delta
|
| 351 |
+
|
| 352 |
+
next_values = params.model_dump()
|
| 353 |
+
next_values[parameter] = self._clamp(
|
| 354 |
+
next_values[parameter] + signed_delta,
|
| 355 |
+
parameter=parameter,
|
| 356 |
+
)
|
| 357 |
+
return RotatingEllipseParams.model_validate(next_values)
|
| 358 |
+
|
| 359 |
+
def _clamp(self, value: float, *, parameter: str) -> float:
|
| 360 |
+
lower, upper = PARAMETER_RANGES[parameter]
|
| 361 |
+
return min(max(value, lower), upper)
|
| 362 |
+
|
| 363 |
+
def _update_best(self, params: RotatingEllipseParams, metrics: EvaluationMetrics) -> None:
|
| 364 |
+
current_rank = self._candidate_rank(metrics)
|
| 365 |
+
best_rank = (
|
| 366 |
+
(1, self._state.best_score)
|
| 367 |
+
if self._state.best_feasibility <= FEASIBILITY_TOLERANCE
|
| 368 |
+
else (0, -self._state.best_feasibility)
|
| 369 |
)
|
| 370 |
+
if current_rank > best_rank:
|
| 371 |
+
self._state.best_params = params
|
| 372 |
+
self._state.best_score = metrics.p1_score
|
| 373 |
+
self._state.best_feasibility = metrics.p1_feasibility
|
| 374 |
+
|
| 375 |
+
def _candidate_rank(self, metrics: EvaluationMetrics) -> tuple[int, float]:
|
| 376 |
+
if metrics.constraints_satisfied:
|
| 377 |
+
return (1, metrics.p1_score)
|
| 378 |
+
return (0, -metrics.p1_feasibility)
|
server/physics.py
CHANGED
|
@@ -1,141 +1,97 @@
|
|
| 1 |
from __future__ import annotations
|
| 2 |
|
| 3 |
-
import
|
| 4 |
-
import random
|
| 5 |
-
from dataclasses import dataclass, field
|
| 6 |
from typing import Final
|
| 7 |
|
| 8 |
-
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
"zs12": -0.02,
|
| 15 |
-
}
|
| 16 |
-
|
| 17 |
-
OPTIMAL_COEFFS: Final[dict[str, float]] = {
|
| 18 |
-
"rc10": 1.02,
|
| 19 |
-
"rc11": 0.135,
|
| 20 |
-
"zs11": 0.115,
|
| 21 |
-
"zs12": -0.035,
|
| 22 |
-
}
|
| 23 |
-
|
| 24 |
-
MAGNITUDE_DELTAS: Final[dict[str, float]] = {
|
| 25 |
-
"small": 0.005,
|
| 26 |
-
"medium": 0.02,
|
| 27 |
-
"large": 0.05,
|
| 28 |
-
}
|
| 29 |
|
| 30 |
|
| 31 |
@dataclass(frozen=True)
|
| 32 |
-
class
|
| 33 |
-
|
| 34 |
aspect_ratio: float
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
magnetic_well_depth=round(magnetic_well, 4),
|
| 116 |
-
converged=converged,
|
| 117 |
-
)
|
| 118 |
-
|
| 119 |
-
def _compute_qs_residual(self) -> float:
|
| 120 |
-
d = {k: self.coeffs[k] - OPTIMAL_COEFFS[k] for k in OPTIMAL_COEFFS}
|
| 121 |
-
quadratic = (
|
| 122 |
-
2.0 * d["rc10"] ** 2
|
| 123 |
-
+ 8.0 * d["rc11"] ** 2
|
| 124 |
-
+ 8.0 * d["zs11"] ** 2
|
| 125 |
-
+ 15.0 * d["zs12"] ** 2
|
| 126 |
-
)
|
| 127 |
-
cross = 4.0 * d["rc11"] * d["zs11"] - 3.0 * d["rc10"] * d["zs12"]
|
| 128 |
-
noise = self._rng.gauss(0, 0.0003)
|
| 129 |
-
return max(quadratic + cross + 0.002 + noise, 0.001)
|
| 130 |
-
|
| 131 |
-
def _simulate_convergence(self, magnitude: str, restart: str) -> bool:
|
| 132 |
-
fail_prob = {"small": 0.02, "medium": 0.08, "large": 0.20}[magnitude]
|
| 133 |
-
if restart == "hot":
|
| 134 |
-
fail_prob *= 0.5
|
| 135 |
-
for key, val in self.coeffs.items():
|
| 136 |
-
deviation = abs(val - BASELINE_COEFFS[key])
|
| 137 |
-
if deviation > 0.1:
|
| 138 |
-
fail_prob += 0.15
|
| 139 |
-
elif deviation > 0.05:
|
| 140 |
-
fail_prob += 0.05
|
| 141 |
-
return self._rng.random() > min(fail_prob, 0.8)
|
|
|
|
| 1 |
from __future__ import annotations
|
| 2 |
|
| 3 |
+
from dataclasses import dataclass
|
|
|
|
|
|
|
| 4 |
from typing import Final
|
| 5 |
|
| 6 |
+
from fusion_lab.models import RotatingEllipseParams
|
| 7 |
|
| 8 |
+
ASPECT_RATIO_MAX: Final[float] = 4.0
|
| 9 |
+
AVERAGE_TRIANGULARITY_MAX: Final[float] = -0.5
|
| 10 |
+
EDGE_IOTA_OVER_NFP_MIN: Final[float] = 0.3
|
| 11 |
+
FEASIBILITY_TOLERANCE: Final[float] = 0.01
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
|
| 14 |
@dataclass(frozen=True)
|
| 15 |
+
class EvaluationMetrics:
|
| 16 |
+
max_elongation: float
|
| 17 |
aspect_ratio: float
|
| 18 |
+
average_triangularity: float
|
| 19 |
+
edge_iota_over_nfp: float
|
| 20 |
+
p1_score: float
|
| 21 |
+
p1_feasibility: float
|
| 22 |
+
constraints_satisfied: bool
|
| 23 |
+
vacuum_well: float
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def _normalized_violation(value: float, *, limit: float, direction: str) -> float:
|
| 27 |
+
if direction == "max":
|
| 28 |
+
return max((value - limit) / max(abs(limit), 1e-6), 0.0)
|
| 29 |
+
return max((limit - value) / max(abs(limit), 1e-6), 0.0)
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
def evaluate_params(params: RotatingEllipseParams) -> EvaluationMetrics:
|
| 33 |
+
aspect_ratio = round(params.aspect_ratio, 4)
|
| 34 |
+
average_triangularity = round(
|
| 35 |
+
-0.2
|
| 36 |
+
- 0.35 * (params.elongation - 1.0)
|
| 37 |
+
- 0.2 * max(0.0, 0.35 - params.rotational_transform),
|
| 38 |
+
4,
|
| 39 |
+
)
|
| 40 |
+
edge_iota_over_nfp = round(
|
| 41 |
+
params.rotational_transform
|
| 42 |
+
- 0.05 * max(0.0, params.aspect_ratio - ASPECT_RATIO_MAX)
|
| 43 |
+
+ 0.03 * (params.elongation - 1.5),
|
| 44 |
+
4,
|
| 45 |
+
)
|
| 46 |
+
max_elongation = round(
|
| 47 |
+
params.elongation
|
| 48 |
+
+ 0.18 * (params.aspect_ratio - 3.4) ** 2
|
| 49 |
+
+ 0.8 * abs(params.rotational_transform - 0.42)
|
| 50 |
+
+ 0.2,
|
| 51 |
+
4,
|
| 52 |
+
)
|
| 53 |
+
vacuum_well = round(
|
| 54 |
+
0.03
|
| 55 |
+
+ 0.02 * (4.0 - min(params.aspect_ratio, 4.0))
|
| 56 |
+
+ 0.015 * (params.rotational_transform - 0.3)
|
| 57 |
+
- 0.01 * abs(params.elongation - 1.7),
|
| 58 |
+
4,
|
| 59 |
+
)
|
| 60 |
+
|
| 61 |
+
aspect_ratio_violation = _normalized_violation(
|
| 62 |
+
aspect_ratio,
|
| 63 |
+
limit=ASPECT_RATIO_MAX,
|
| 64 |
+
direction="max",
|
| 65 |
+
)
|
| 66 |
+
triangularity_violation = _normalized_violation(
|
| 67 |
+
average_triangularity,
|
| 68 |
+
limit=AVERAGE_TRIANGULARITY_MAX,
|
| 69 |
+
direction="max",
|
| 70 |
+
)
|
| 71 |
+
iota_violation = _normalized_violation(
|
| 72 |
+
edge_iota_over_nfp,
|
| 73 |
+
limit=EDGE_IOTA_OVER_NFP_MIN,
|
| 74 |
+
direction="min",
|
| 75 |
+
)
|
| 76 |
+
|
| 77 |
+
p1_feasibility = round(
|
| 78 |
+
max(aspect_ratio_violation, triangularity_violation, iota_violation),
|
| 79 |
+
6,
|
| 80 |
+
)
|
| 81 |
+
constraints_satisfied = p1_feasibility <= FEASIBILITY_TOLERANCE
|
| 82 |
+
p1_score = (
|
| 83 |
+
round(1.0 - min(max((max_elongation - 1.0) / 9.0, 0.0), 1.0), 6)
|
| 84 |
+
if constraints_satisfied
|
| 85 |
+
else 0.0
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
return EvaluationMetrics(
|
| 89 |
+
max_elongation=max_elongation,
|
| 90 |
+
aspect_ratio=aspect_ratio,
|
| 91 |
+
average_triangularity=average_triangularity,
|
| 92 |
+
edge_iota_over_nfp=edge_iota_over_nfp,
|
| 93 |
+
p1_score=p1_score,
|
| 94 |
+
p1_feasibility=p1_feasibility,
|
| 95 |
+
constraints_satisfied=constraints_satisfied,
|
| 96 |
+
vacuum_well=vacuum_well,
|
| 97 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
uv.lock
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|