| # P1 Environment Contract V1 |
|
|
| **Role:** Live technical contract SSOT for the current implementation phase |
| **Planning dependency:** [`FUSION_DESIGN_LAB_PLAN_V2.md`](./FUSION_DESIGN_LAB_PLAN_V2.md) |
| **Evidence dependency:** [`P1_PARAMETERIZATION_DEEPDIVE.md`](P1_PARAMETERIZATION_DEEPDIVE.md) |
|
|
| ## 1. Scope |
|
|
| This document defines the live technical contract for: |
|
|
| - [`server/physics.py`](../server/physics.py) |
| - [`fusion_lab/models.py`](../fusion_lab/models.py) |
| - [`server/environment.py`](../server/environment.py) |
| - [`server/app.py`](../server/app.py) |
|
|
| If the observation schema, action schema, episode flow, terminal conditions, or reward semantics change, update this file in the same task. |
|
|
| ## 2. Design Split |
|
|
| Keep three layers separate: |
|
|
| 1. boundary builder |
| 2. official verifier |
| 3. environment |
|
|
| Boundary builder owns: |
|
|
| - the repaired low-dimensional family |
| - rotating-ellipse seed generation |
| - explicit triangularity control injection |
|
|
| Official verifier owns: |
|
|
| - boundary in, metrics out |
| - official `P1` feasibility semantics |
| - objective direction and score ordering |
| - low-fidelity live evaluation mode |
| - optional higher-fidelity offline validation mode |
| - explicit failure results when VMEC or forward-model evaluation fails |
|
|
| Environment owns: |
|
|
| - reset pool |
| - discrete actions |
| - episode budget |
| - best-state tracking |
| - reward shaping |
|
|
| ## 3. Boundary Family |
|
|
| The historical 3-knob upstream rotating-ellipse family is not the live contract. |
|
|
| The live controllable knobs are: |
|
|
| - `aspect_ratio` |
| - `elongation` |
| - `rotational_transform` |
| - `triangularity_scale` |
|
|
| Rules: |
|
|
| - stay low-dimensional and human-playable |
| - treat the current family as rotating-ellipse-derived, not plain upstream rotating ellipse |
| - the coarse measured sweep is now recorded, but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks |
|
|
| ## 4. Action Contract |
|
|
| `intent` is one of: |
|
|
| - `run` |
| - `submit` |
| - `restore_best` |
|
|
| For `run`, the action also includes: |
|
|
| - `parameter`: one of `aspect_ratio | elongation | rotational_transform | triangularity_scale` |
| - `direction`: `increase | decrease` |
| - `magnitude`: `small | medium | large` |
|
|
| Constraints: |
|
|
| - keep the discrete interaction style |
| - do not expose the full Fourier action space as the primary environment |
| - do not use action complexity to compensate for missing clarity elsewhere |
|
|
| ## 5. Observation Contract |
|
|
| The observation must stay metric-centered and human-readable. |
|
|
| Required fields: |
|
|
| - `max_elongation` |
| - `aspect_ratio` |
| - `average_triangularity` |
| - `edge_iota_over_nfp` |
| - `aspect_ratio_violation` |
| - `triangularity_violation` |
| - `iota_violation` |
| - `dominant_constraint` |
| - `p1_feasibility` |
| - `p1_score` |
| - `constraints_satisfied` |
| - `vacuum_well` |
| - `evaluation_fidelity` |
| - `evaluation_failed` |
| - `failure_reason` |
| - `step_number` |
| - `budget_remaining` |
| - `no_progress_steps` |
| - `best_low_fidelity_score` |
| - `best_low_fidelity_feasibility` |
| - `target_spec` |
| - `diagnostics_text` |
| - `reward_breakdown` |
| - `action_monitor` |
| - `episode_total_reward` |
| - `trajectory_summary` |
|
|
| Interpretation rules: |
|
|
| - live environment metrics must be labeled as low-fidelity |
| - best-state reporting should reflect the single live reward surface |
| - the observation must be understandable without hidden state |
| - normalized constraint-violation telemetry must follow the official `P1` constraint scales |
| - the dominant active constraint must be visible so a human can explain repair-phase rewards |
| - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward |
| - action telemetry must expose parameter values before and after the action, including clamped, no-op, and repeat-state moves |
| - anti-stagnation state that can change reward must be visible in structured observation fields, not only free text |
|
|
| ## 6. Episode Flow |
|
|
| 1. Reset from one frozen repaired-family seed or a small frozen seed set. |
| 2. Evaluate the initial state with low fidelity and return the first observation. |
| 3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity. |
| 4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget. |
| 5. On `submit`, re-evaluate the current state with low fidelity, consume budget, and end the episode. |
| 6. End the episode on `submit` or budget exhaustion. |
|
|
| Failure semantics: |
|
|
| - failed evaluations still consume budget |
| - failed evaluations produce visible failure observations |
| - failed evaluations apply a documented penalty |
| - the environment must not silently convert failures into success paths |
|
|
| ## 7. Terminal Contract |
|
|
| At termination, the environment must provide: |
|
|
| - final best design metrics |
| - final feasibility status |
| - total reward |
| - a short human-readable trajectory summary |
| - the final reward breakdown and action telemetry for the terminal step |
|
|
| Terminal reporting rules: |
|
|
| - keep submit-time reporting on the same live low-fidelity truth surface as the rest of the episode |
| - keep any higher-fidelity validation artifacts explicitly outside the live environment observation contract |
|
|
| ## 8. Verifier Contract |
|
|
| The verifier of record is `constellaration.problems.GeometricalProblem`. |
|
|
| The implementation must preserve: |
|
|
| - objective direction |
| - constraint direction |
| - feasibility semantics |
| - score ordering |
|
|
| The verifier should stay boundary-based: |
|
|
| - `build_boundary_from_params(...) -> SurfaceRZFourier` |
| - `evaluate_boundary(boundary, fidelity) -> EvaluationMetrics` |
|
|
| Do not treat parameterization-specific logic as verifier truth. |
|
|
| VMEC preset mapping: |
|
|
| - `run`, `restore_best`, and `submit` use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence) |
| - higher-fidelity validation uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution) outside the live environment loop |
| - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries |
|
|
| Training and evaluation rule: |
|
|
| - use the live low-fidelity environment contract, including explicit `submit`, as the RL surface |
| - the standard repository notebook and `training/llm_rollout.py` workflows should stay aligned to that same action and reward contract |
| - keep higher-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts |
| - do not reintroduce a separate high-fidelity submit path into the live environment unless the contract is deliberately redefined |
|
|
| ## 9. Reward V2 |
|
|
| `Reward V2` keeps the verifier-native structure from `Reward V1` and adds a small amount of |
| trajectory-aware shaping. `Reward V1` fixed the main coarse-signal pathology from `Reward V0`: |
| pure `Δ official_feasibility` was too coarse because official feasibility is a max over |
| normalized constraint violations, so useful repair steps on non-dominant constraints could be |
| nearly invisible to the reward. |
|
|
| The remaining `Reward V1` pathology was not verifier mismatch. It was short-horizon shaping: |
|
|
| - the agent got no extra signal for setting a new best infeasible point |
| - near-feasible progress below `0.02` had no milestone signal unless it crossed the full feasible boundary |
| - feasible improvements only saw step-to-step objective deltas, not "new best feasible score" progress |
| - repeated local loops or three-step stagnation had no explicit penalty beyond normal step cost |
|
|
| Target behavior: |
|
|
| - infeasible to feasible crossing gets a clear positive bonus |
| - feasible to infeasible regression gets a clear penalty |
| - when both states are infeasible, reduced official feasibility violation should still help |
| - on low-fidelity `run` steps, setting a new best infeasible feasibility should help |
| - entering the near-feasible corridor around `p1_feasibility <= 0.02` should get a small bounded bonus |
| - when both states are infeasible, reduced normalized triangularity violation should help the most |
| - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help |
| - when both states are feasible, lower `max_elongation` should help |
| - on low-fidelity `run` steps, beating the previous best feasible score should help |
| - larger `run` actions should pay a larger step cost than smaller `run` actions |
| - `restore_best` should keep a flat non-submit step cost |
| - repeated local revisits without improvement should pay a small penalty |
| - three non-improving steps in a row should pay a small stagnation penalty |
| - `submit` should be better than passive exhaustion when the design is genuinely improved |
| - recovery after a failed evaluation may receive a modest bounded bonus |
|
|
| Rules: |
|
|
| - keep reward scalar and verifier-driven |
| - keep the infeasible shaping tied to official normalized constraint violations, not family-name priors |
| - do not add family-specific reward shaping from `scadena`, `CreativeEngineer`, `Samet`, or `egodos` |
| - do not use reward complexity to compensate for blocked parameterization, poor seeds, or unclear observations |
|
|
| ## 10. Reset and Fixture Policy |
|
|
| Reset policy: |
|
|
| - start with exact frozen seeds |
| - keep `n_field_periods = 3` |
| - prefer a small reproducible seed set |
|
|
| Each seed should be: |
|
|
| - reproducible |
| - near enough to the feasible boundary to make the budget meaningful |
| - not already solved |
|
|
| Fixture policy: |
|
|
| - track good, boundary, and clearly bad references |
| - use fixtures for verifier and reward sanity checks |
| - do not turn fixture mining into a separate broad project |
|
|
| ## 11. Open Measurements |
|
|
| These items remain open until measured on the repaired family: |
|
|
| - exact repaired-family range bounds |
| - exact `triangularity_scale` deltas |
| - exact `rotational_transform` bounds |
| - exact reset seed pool |
| - whether the budget should stay at 6 or change |
|
|
| ## 12. Out of Scope |
|
|
| - porting the old `ai-sci-feasible-designs` harness |
| - broad Fourier-mode action space as the main environment |
| - complicated reward shaping before playtest evidence |
| - a wider task family than the single stellarator environment |
|
|