# P1 Environment Contract V1 **Role:** Live technical contract SSOT for the current implementation phase **Planning dependency:** [`FUSION_DESIGN_LAB_PLAN_V2.md`](./FUSION_DESIGN_LAB_PLAN_V2.md) **Evidence dependency:** [`P1_PARAMETERIZATION_DEEPDIVE.md`](P1_PARAMETERIZATION_DEEPDIVE.md) ## 1. Scope This document defines the live technical contract for: - [`server/physics.py`](../server/physics.py) - [`fusion_lab/models.py`](../fusion_lab/models.py) - [`server/environment.py`](../server/environment.py) - [`server/app.py`](../server/app.py) If the observation schema, action schema, episode flow, terminal conditions, or reward semantics change, update this file in the same task. ## 2. Design Split Keep three layers separate: 1. boundary builder 2. official verifier 3. environment Boundary builder owns: - the repaired low-dimensional family - rotating-ellipse seed generation - explicit triangularity control injection Official verifier owns: - boundary in, metrics out - official `P1` feasibility semantics - objective direction and score ordering - low-fidelity live evaluation mode - optional higher-fidelity offline validation mode - explicit failure results when VMEC or forward-model evaluation fails Environment owns: - reset pool - discrete actions - episode budget - best-state tracking - reward shaping ## 3. Boundary Family The historical 3-knob upstream rotating-ellipse family is not the live contract. The live controllable knobs are: - `aspect_ratio` - `elongation` - `rotational_transform` - `triangularity_scale` Rules: - stay low-dimensional and human-playable - treat the current family as rotating-ellipse-derived, not plain upstream rotating ellipse - the coarse measured sweep is now recorded, but reset-seed changes and any budget changes should still wait for paired high-fidelity fixture checks ## 4. Action Contract `intent` is one of: - `run` - `submit` - `restore_best` For `run`, the action also includes: - `parameter`: one of `aspect_ratio | elongation | rotational_transform | triangularity_scale` - `direction`: `increase | decrease` - `magnitude`: `small | medium | large` Constraints: - keep the discrete interaction style - do not expose the full Fourier action space as the primary environment - do not use action complexity to compensate for missing clarity elsewhere ## 5. Observation Contract The observation must stay metric-centered and human-readable. Required fields: - `max_elongation` - `aspect_ratio` - `average_triangularity` - `edge_iota_over_nfp` - `aspect_ratio_violation` - `triangularity_violation` - `iota_violation` - `dominant_constraint` - `p1_feasibility` - `p1_score` - `constraints_satisfied` - `vacuum_well` - `evaluation_fidelity` - `evaluation_failed` - `failure_reason` - `step_number` - `budget_remaining` - `no_progress_steps` - `best_low_fidelity_score` - `best_low_fidelity_feasibility` - `target_spec` - `diagnostics_text` - `reward_breakdown` - `action_monitor` - `episode_total_reward` - `trajectory_summary` Interpretation rules: - live environment metrics must be labeled as low-fidelity - best-state reporting should reflect the single live reward surface - the observation must be understandable without hidden state - normalized constraint-violation telemetry must follow the official `P1` constraint scales - the dominant active constraint must be visible so a human can explain repair-phase rewards - reward telemetry must expose which bonuses, penalties, and shaping terms contributed to the scalar reward - action telemetry must expose parameter values before and after the action, including clamped, no-op, and repeat-state moves - anti-stagnation state that can change reward must be visible in structured observation fields, not only free text ## 6. Episode Flow 1. Reset from one frozen repaired-family seed or a small frozen seed set. 2. Evaluate the initial state with low fidelity and return the first observation. 3. On `run`, perturb one controllable parameter and re-evaluate with low fidelity. 4. On `restore_best`, revert to the best known low-fidelity state, re-evaluate, and consume budget. 5. On `submit`, re-evaluate the current state with low fidelity, consume budget, and end the episode. 6. End the episode on `submit` or budget exhaustion. Failure semantics: - failed evaluations still consume budget - failed evaluations produce visible failure observations - failed evaluations apply a documented penalty - the environment must not silently convert failures into success paths ## 7. Terminal Contract At termination, the environment must provide: - final best design metrics - final feasibility status - total reward - a short human-readable trajectory summary - the final reward breakdown and action telemetry for the terminal step Terminal reporting rules: - keep submit-time reporting on the same live low-fidelity truth surface as the rest of the episode - keep any higher-fidelity validation artifacts explicitly outside the live environment observation contract ## 8. Verifier Contract The verifier of record is `constellaration.problems.GeometricalProblem`. The implementation must preserve: - objective direction - constraint direction - feasibility semantics - score ordering The verifier should stay boundary-based: - `build_boundary_from_params(...) -> SurfaceRZFourier` - `evaluate_boundary(boundary, fidelity) -> EvaluationMetrics` Do not treat parameterization-specific logic as verifier truth. VMEC preset mapping: - `run`, `restore_best`, and `submit` use the `low_fidelity` VMEC preset (~0.6s, tolerant convergence) - higher-fidelity validation uses the `from_boundary_resolution` VMEC preset (~4s, adaptive convergence matching boundary Fourier resolution) outside the live environment loop - the `high_fidelity` VMEC preset (minimum 10 modes, strict convergence) is not used because it does not converge on the current `mpol=3, ntor=3` boundaries Training and evaluation rule: - use the live low-fidelity environment contract, including explicit `submit`, as the RL surface - the standard repository notebook and `training/llm_rollout.py` workflows should stay aligned to that same action and reward contract - keep higher-fidelity validation in offline scripts, paired fixture checks, and final evidence artifacts - do not reintroduce a separate high-fidelity submit path into the live environment unless the contract is deliberately redefined ## 9. Reward V2 `Reward V2` keeps the verifier-native structure from `Reward V1` and adds a small amount of trajectory-aware shaping. `Reward V1` fixed the main coarse-signal pathology from `Reward V0`: pure `Δ official_feasibility` was too coarse because official feasibility is a max over normalized constraint violations, so useful repair steps on non-dominant constraints could be nearly invisible to the reward. The remaining `Reward V1` pathology was not verifier mismatch. It was short-horizon shaping: - the agent got no extra signal for setting a new best infeasible point - near-feasible progress below `0.02` had no milestone signal unless it crossed the full feasible boundary - feasible improvements only saw step-to-step objective deltas, not "new best feasible score" progress - repeated local loops or three-step stagnation had no explicit penalty beyond normal step cost Target behavior: - infeasible to feasible crossing gets a clear positive bonus - feasible to infeasible regression gets a clear penalty - when both states are infeasible, reduced official feasibility violation should still help - on low-fidelity `run` steps, setting a new best infeasible feasibility should help - entering the near-feasible corridor around `p1_feasibility <= 0.02` should get a small bounded bonus - when both states are infeasible, reduced normalized triangularity violation should help the most - when both states are infeasible, reduced normalized aspect-ratio and edge-iota violations should also help - when both states are feasible, lower `max_elongation` should help - on low-fidelity `run` steps, beating the previous best feasible score should help - larger `run` actions should pay a larger step cost than smaller `run` actions - `restore_best` should keep a flat non-submit step cost - repeated local revisits without improvement should pay a small penalty - three non-improving steps in a row should pay a small stagnation penalty - `submit` should be better than passive exhaustion when the design is genuinely improved - recovery after a failed evaluation may receive a modest bounded bonus Rules: - keep reward scalar and verifier-driven - keep the infeasible shaping tied to official normalized constraint violations, not family-name priors - do not add family-specific reward shaping from `scadena`, `CreativeEngineer`, `Samet`, or `egodos` - do not use reward complexity to compensate for blocked parameterization, poor seeds, or unclear observations ## 10. Reset and Fixture Policy Reset policy: - start with exact frozen seeds - keep `n_field_periods = 3` - prefer a small reproducible seed set Each seed should be: - reproducible - near enough to the feasible boundary to make the budget meaningful - not already solved Fixture policy: - track good, boundary, and clearly bad references - use fixtures for verifier and reward sanity checks - do not turn fixture mining into a separate broad project ## 11. Open Measurements These items remain open until measured on the repaired family: - exact repaired-family range bounds - exact `triangularity_scale` deltas - exact `rotational_transform` bounds - exact reset seed pool - whether the budget should stay at 6 or change ## 12. Out of Scope - porting the old `ai-sci-feasible-designs` harness - broad Fourier-mode action space as the main environment - complicated reward shaping before playtest evidence - a wider task family than the single stellarator environment