CreativeEngineer commited on
Commit
714d655
·
1 Parent(s): e8134ed

docs: clarify measured sweep provenance

Browse files
Files changed (1) hide show
  1. docs/MEASURED_SWEEP_REPORT.md +31 -13
docs/MEASURED_SWEEP_REPORT.md CHANGED
@@ -18,17 +18,18 @@ Plan V2 §21 and Deliverables Map priority #1.
18
 
19
  ### Procedure
20
 
21
- 1. **Timed a single evaluation** on local CPU to establish per-eval cost:
22
- ~2.5s (vs ~0.6s on the Northflank H100).
 
23
 
24
  2. **Ran a broad 3-point grid sweep** (`baselines/measured_sweep.py --grid-points 3`)
25
  over the full `PARAMETER_RANGES` to check crash zones, feasibility coverage,
26
- and per-parameter trends. 81 configurations, ~2.7 minutes.
27
 
28
- 3. **Ran a targeted fine-grid sweep** over the known feasible zone
29
- (`rot_transform ∈ [1.50, 1.80]`, `tri_scale ∈ [0.55, 0.65]`) to map the
30
- exact feasibility boundary, crash gradient, and candidate reset seeds.
31
- 315 configurations, ~13 minutes.
32
 
33
  4. **Validated the 3 current reset seeds** individually against the live
34
  `build_boundary_from_params` + `evaluate_boundary` code path.
@@ -36,13 +37,14 @@ Plan V2 §21 and Deliverables Map priority #1.
36
  5. **Checked delta reachability**: whether 6 steps at maximum delta can traverse
37
  each parameter range end-to-end.
38
 
39
- 6. **Cross-referenced** results against the prior 256-point sweep documented in
40
- `docs/P1_PARAMETERIZATION_DEEPDIVE.md` §4.
41
 
42
  ### Tools
43
 
44
- - Script: `baselines/measured_sweep.py` (committed, reusable with `--grid-points`
45
- and `--output-dir` flags)
 
46
  - Code path: `server.physics.build_boundary_from_params` →
47
  `server.physics.evaluate_boundary` (low-fidelity)
48
  - Runtime: local CPU (Darwin), `uv run python`
@@ -52,7 +54,7 @@ Plan V2 §21 and Deliverables Map priority #1.
52
  ## 1. Sweep Configuration
53
 
54
  Two sweeps were run on local CPU using the live `build_boundary_from_params` +
55
- `evaluate_boundary` code path (low-fidelity VMEC, ~2.5s per eval on local CPU).
56
 
57
  ### Broad sweep (3-point grid)
58
 
@@ -151,6 +153,11 @@ with `tri_scale < 0.60` are feasible. The binding constraint is always
151
 
152
  ### Top feasible designs (by score)
153
 
 
 
 
 
 
154
  | AR | elong | rt | ts | score | feas | elong_out | tri | iota |
155
  |----|-------|------|------|---------|---------|-----------|---------|--------|
156
  | 3.6 | 1.2 | 1.55 | 0.60 | 0.3446 | 0.0046 | 6.8985 | -0.4977 | 0.2989 |
@@ -260,5 +267,16 @@ crash avoidance from the 1.75-1.80 boundary.
260
  ## 8. Artifacts
261
 
262
  - `baselines/sweep_results/measured_sweep_20260308T050043Z.json` — broad 81-point sweep
 
263
  - `baselines/sweep_results/targeted_sweep.json` — targeted 315-point sweep
264
- - Prior 256-point sweep data in `docs/P1_PARAMETERIZATION_DEEPDIVE.md` §4
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ### Procedure
20
 
21
+ 1. **Timed a single evaluation** on local CPU to establish approximate per-eval
22
+ cost. Observed ~2.5s per low-fidelity evaluation (timing metadata is now
23
+ saved to the JSON artifact by the sweep script for future runs).
24
 
25
  2. **Ran a broad 3-point grid sweep** (`baselines/measured_sweep.py --grid-points 3`)
26
  over the full `PARAMETER_RANGES` to check crash zones, feasibility coverage,
27
+ and per-parameter trends. 81 configurations.
28
 
29
+ 3. **Ran a targeted fine-grid sweep** (`baselines/measured_sweep.py --targeted`)
30
+ over the known feasible zone (`rot_transform ∈ [1.50, 1.80]`,
31
+ `tri_scale [0.55, 0.65]`) to map the exact feasibility boundary, crash
32
+ gradient, and candidate reset seeds. 315 configurations.
33
 
34
  4. **Validated the 3 current reset seeds** individually against the live
35
  `build_boundary_from_params` + `evaluate_boundary` code path.
 
37
  5. **Checked delta reachability**: whether 6 steps at maximum delta can traverse
38
  each parameter range end-to-end.
39
 
40
+ 6. **Cross-referenced** results against the prior recorded 4-knob sweep
41
+ documented in `docs/P1_PARAMETERIZATION_DEEPDIVE.md` §4.
42
 
43
  ### Tools
44
 
45
+ - Script: `baselines/measured_sweep.py`
46
+ - broad grid: `--grid-points N` (evenly spaced across `PARAMETER_RANGES`)
47
+ - targeted: `--targeted` (pre-defined value set around the known feasible zone)
48
  - Code path: `server.physics.build_boundary_from_params` →
49
  `server.physics.evaluate_boundary` (low-fidelity)
50
  - Runtime: local CPU (Darwin), `uv run python`
 
54
  ## 1. Sweep Configuration
55
 
56
  Two sweeps were run on local CPU using the live `build_boundary_from_params` +
57
+ `evaluate_boundary` code path (low-fidelity VMEC).
58
 
59
  ### Broad sweep (3-point grid)
60
 
 
153
 
154
  ### Top feasible designs (by score)
155
 
156
+ Note: the `P1` verifier uses a 1% feasibility tolerance (`FEASIBILITY_TOLERANCE = 0.01`
157
+ in `server/physics.py`). Designs with `feas ≤ 0.01` are considered feasible by
158
+ `GeometricalProblem.is_feasible()`. Rows below showing `feas=0.0046` are feasible
159
+ under this tolerance even though their raw feasibility is nonzero.
160
+
161
  | AR | elong | rt | ts | score | feas | elong_out | tri | iota |
162
  |----|-------|------|------|---------|---------|-----------|---------|--------|
163
  | 3.6 | 1.2 | 1.55 | 0.60 | 0.3446 | 0.0046 | 6.8985 | -0.4977 | 0.2989 |
 
267
  ## 8. Artifacts
268
 
269
  - `baselines/sweep_results/measured_sweep_20260308T050043Z.json` — broad 81-point sweep
270
+ (produced by `baselines/measured_sweep.py --grid-points 3`)
271
  - `baselines/sweep_results/targeted_sweep.json` — targeted 315-point sweep
272
+ (produced during the initial investigation session before the `--targeted`
273
+ mode was committed; running `baselines/measured_sweep.py --targeted` now
274
+ reproduces equivalent sweep coverage/results under a timestamped
275
+ `measured_sweep_*.json` filename, but with a different JSON schema)
276
+ - Prior recorded 4-knob sweep data referenced in `docs/P1_PARAMETERIZATION_DEEPDIVE.md` §4
277
+
278
+ Provenance note: the committed JSON artifacts predate the `metadata` block added
279
+ to the sweep script. Future runs include `metadata.elapsed_seconds` and
280
+ `metadata.seconds_per_eval`. The existing broad sweep artifact contains only
281
+ `analysis` and `results`; the existing targeted sweep artifact is a raw list of
282
+ rows without a `metadata` or `analysis` wrapper.