fusion-design-lab / docs /P1_PARAMETERIZATION_DEEPDIVE.md
CreativeEngineer's picture
feat: add verifier-native reward v1
2fccde8
# P1 Parameterization Deep-Dive
**Date:** March 7, 2026
**Role:** Evidence and rationale record
**Status:** Supporting doc, not a live planning or contract SSOT
This document keeps the durable evidence behind the repaired low-dimensional `P1` environment:
- why the historical 3-knob family failed
- what the original winning session actually did
- what the recorded 4-knob sweep proved
- why the current environment is intentionally a playable stepping stone rather than a leaderboard-matching optimizer
## 1. Structural Blocker
### Symptom
The old 3-parameter action space:
- `aspect_ratio`
- `elongation`
- `rotational_transform`
could not satisfy the `P1` constraints under the real `constellaration` verifier path.
### Evidence
A 125-point grid sweep over the historical 3-knob range produced `0/125` feasible designs.
Observed behavior:
- `average_triangularity` stayed near `+0.005`
- `p1_feasibility` stayed near `1.00995`
- varying `n_field_periods` did not resolve the blocker
### Root Cause
`generate_rotating_ellipse(aspect_ratio, elongation, rotational_transform, n_field_periods)` does not meaningfully expose the Fourier mode that controls triangularity.
The historical `rotational_transform` range was also too low to reach the `abs(edge_iota_over_nfp) >= 0.3` requirement reliably.
## 2. Original Winning Session
The original successful `P1` path in `ai-sci-feasible-designs` did not rely on the raw 3-knob family alone.
The winning session:
1. built a low-dimensional sweep with a fourth knob
2. found feasible seeds quickly
3. refined around those seeds with stronger optimizers
4. used leaderboard-quality anchors later in the pipeline
### Missing Fourth Knob
The historical script added `tri_scale` by injecting the `m=2, n=0` Fourier mode after generating the base rotating-ellipse shape.
That missing triangularity control is the key reason the raw 3-knob family was structurally blocked.
### Recovered Useful Ranges
The original script used substantially different useful ranges than the blocked runtime:
```text
aspect_ratio: [3.0, 3.6]
elongation: [1.4, 2.2]
rotational_transform: [1.5, 2.2]
tri_scale: [0.55, 0.8]
```
## 3. Harness Campaign Comparison
Recorded `P1` campaign runs in `ai-sci-feasible-designs` also found zero feasible candidates.
That failure does not disprove the repaired low-dimensional path. It mostly shows that the campaign guidance and search style diverged from the winning approach:
- the campaigns pushed the agent away from broad low-dimensional exploration
- the winning session did broad sweeps and large early moves
- the campaign path used richer Fourier candidates, but not the same successful cold-start behavior
## 4. Recorded 4-Knob Sweep
A recorded 4-knob sweep using explicit triangularity injection showed that the repaired family can reach `P1` feasibility.
Recorded sweep family:
```text
aspect_ratio: [3.2, 3.8]
elongation: [1.2, 1.8]
rotational_transform: [1.2, 1.8]
tri_scale: [0.4, 0.7]
n_field_periods: 3
mpol / ntor: 3 / 3
```
What that sweep established:
- explicit triangularity control fixes the structural blocker
- repaired-family feasibility is reachable in principle
- repaired-family defaults still need measured calibration before they should be narrated as stable
## 5. Verifier Alignment Evidence
The current runtime verifier alignment is sound:
- the official `GeometricalProblem` API is used for feasibility and objective semantics
- score conversion matches the official `P1` objective direction
- the runtime split is boundary-based: build boundary first, then evaluate boundary
- low-fidelity `run` and high-fidelity `submit` are treated as separate truth surfaces
This matters because the repair belongs in the boundary family, not in redefined verifier semantics.
## 6. Reward Implications
The repaired family changes what is possible, but it does not justify a complicated reward.
The main reward conclusions remain:
- keep reward tied to official verifier scalars
- keep feasibility-first behavior
- do not add per-constraint or knob-specific shaping
- tune from playtest evidence, not from theory alone
## 7. Why The Environment Is Still Valid
The repaired 4-knob family is not a leaderboard-matching optimizer. That is acceptable for this repo.
The purpose of the environment is:
- teach and evaluate constrained design behavior
- keep the observation/action/reward loop legible
- preserve an explainable path from action to verifier feedback
The winning high-fidelity score chase used a much richer downstream optimization story. This repo does not need to reproduce that full pipeline to be a valid hackathon environment artifact.
## 8. Design Implications Kept From This Analysis
- keep multiple frozen reset seeds rather than one memorized starting state
- keep reward based on official scalars rather than hand-coded constraint bonuses
- keep known winners as calibration fixtures, not direct reward targets
- keep domain knowledge in seeds and fixtures, not in opaque reward tricks
## 9. Primary References
Fusion Design Lab:
- [`server/physics.py`](../server/physics.py)
- [`server/environment.py`](../server/environment.py)
- [`fusion_lab/models.py`](../fusion_lab/models.py)
- [`docs/P1_ENV_CONTRACT_V1.md`](P1_ENV_CONTRACT_V1.md)
Reference repo:
- `ai-sci-feasible-designs/docs/harness/raw-session.md`
- historical `scripts/search_p1_lowdim.py`
- `ai-sci-feasible-designs/docs/P1_SCORE_CHASE_NOTES.md`
- `P1_CAMPAIGN_POSTMORTEM.md`