Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
ba716cf
1
Parent(s): 16065b1
docs: sync status trackers to verifier state
Browse files- README.md +10 -7
- TODO.md +10 -2
- baselines/README.md +9 -0
- demo/README.md +7 -0
- docs/FUSION_DELIVERABLES_MAP.md +19 -11
- docs/FUSION_DESIGN_LAB_PLAN_V2.md +18 -1
- docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +3 -0
- docs/PIVOT_P1_ROTATING_ELLIPSE.md +20 -2
- server/data/README.md +4 -0
- server/data/p1/README.md +7 -0
- training/README.md +6 -0
- training/notebooks/README.md +6 -0
README.md
CHANGED
|
@@ -22,6 +22,7 @@ Implementation status:
|
|
| 22 |
- docs are aligned to fresh `P1` wiring in this repo
|
| 23 |
- shared models, baselines, and server/client entry points now reflect the locked `P1` contract
|
| 24 |
- the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
|
|
|
|
| 25 |
|
| 26 |
## Execution Status
|
| 27 |
|
|
@@ -35,6 +36,8 @@ Implementation status:
|
|
| 35 |
- [x] Replace the synthetic evaluator with `constellaration`
|
| 36 |
- [ ] Add tracked `P1` fixtures under `server/data/p1/`
|
| 37 |
- [ ] Run manual playtesting and record the first reward pathology
|
|
|
|
|
|
|
| 38 |
- [ ] Deploy the real environment to HF Space
|
| 39 |
|
| 40 |
## Known Gaps
|
|
@@ -47,7 +50,7 @@ Implementation status:
|
|
| 47 |
Current mode:
|
| 48 |
|
| 49 |
- strategic task choice is already locked
|
| 50 |
-
- the next work is
|
| 51 |
- new planning text should only appear when a real blocker forces a decision change
|
| 52 |
|
| 53 |
## Planned Repository Layout
|
|
@@ -100,15 +103,15 @@ uv sync --extra notebooks
|
|
| 100 |
|
| 101 |
## Immediate Next Steps
|
| 102 |
|
| 103 |
-
1.
|
| 104 |
-
2.
|
|
|
|
|
|
|
| 105 |
- import `constellaration`
|
| 106 |
- run one rotating-ellipse generation plus one low-fidelity verifier call
|
| 107 |
- write an artifact to persistent storage
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
5. Add the Colab notebook under `training/notebooks`.
|
| 111 |
-
6. Run manual playtest episodes before heavy training work.
|
| 112 |
|
| 113 |
These are implementation steps, not another planning phase.
|
| 114 |
|
|
|
|
| 22 |
- docs are aligned to fresh `P1` wiring in this repo
|
| 23 |
- shared models, baselines, and server/client entry points now reflect the locked `P1` contract
|
| 24 |
- the current environment uses `constellaration` for low-fidelity `run` steps and high-fidelity `submit` evaluation
|
| 25 |
+
- the remaining runtime work is fixture coverage, manual playtesting, heuristic refresh, and deployment evidence
|
| 26 |
|
| 27 |
## Execution Status
|
| 28 |
|
|
|
|
| 36 |
- [x] Replace the synthetic evaluator with `constellaration`
|
| 37 |
- [ ] Add tracked `P1` fixtures under `server/data/p1/`
|
| 38 |
- [ ] Run manual playtesting and record the first reward pathology
|
| 39 |
+
- [ ] Refresh the heuristic baseline for the real verifier path
|
| 40 |
+
- [ ] Pass the Northflank smoke test on the H100 workspace
|
| 41 |
- [ ] Deploy the real environment to HF Space
|
| 42 |
|
| 43 |
## Known Gaps
|
|
|
|
| 50 |
Current mode:
|
| 51 |
|
| 52 |
- strategic task choice is already locked
|
| 53 |
+
- the next work is fixtures, manual playtesting, heuristic refresh, smoke validation, and deployment
|
| 54 |
- new planning text should only appear when a real blocker forces a decision change
|
| 55 |
|
| 56 |
## Planned Repository Layout
|
|
|
|
| 103 |
|
| 104 |
## Immediate Next Steps
|
| 105 |
|
| 106 |
+
1. Add tracked `P1` fixtures under `server/data/p1`.
|
| 107 |
+
2. Run manual playtest episodes and record the first real reward pathology, if any.
|
| 108 |
+
3. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
|
| 109 |
+
4. Pass a Northflank smoke test:
|
| 110 |
- import `constellaration`
|
| 111 |
- run one rotating-ellipse generation plus one low-fidelity verifier call
|
| 112 |
- write an artifact to persistent storage
|
| 113 |
+
5. Deploy the environment to HF Space.
|
| 114 |
+
6. Add the Colab notebook under `training/notebooks`.
|
|
|
|
|
|
|
| 115 |
|
| 116 |
These are implementation steps, not another planning phase.
|
| 117 |
|
TODO.md
CHANGED
|
@@ -36,9 +36,9 @@ Priority source:
|
|
| 36 |
|
| 37 |
```mermaid
|
| 38 |
flowchart TD
|
| 39 |
-
A["Northflank Smoke Test"] -->
|
| 40 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 41 |
-
C --> D
|
| 42 |
D --> E["Fixture Checks"]
|
| 43 |
E --> F["Manual Playtest"]
|
| 44 |
F --> G["Reward V1"]
|
|
@@ -121,6 +121,13 @@ flowchart TD
|
|
| 121 |
[AGENTS.md](AGENTS.md),
|
| 122 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 123 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 124 |
- [x] Decide the non-submit terminal reward policy
|
| 125 |
Goal:
|
| 126 |
budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
|
|
@@ -184,3 +191,4 @@ flowchart TD
|
|
| 184 |
- [ ] Do not add training-first complexity before manual playtesting
|
| 185 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 186 |
- [ ] Do not describe the current baseline reset state as already feasible
|
|
|
|
|
|
| 36 |
|
| 37 |
```mermaid
|
| 38 |
flowchart TD
|
| 39 |
+
A["Northflank Smoke Test"] --> E["Fixture Checks"]
|
| 40 |
B["P1 Contract Lock"] --> D["P1 Models + Environment"]
|
| 41 |
+
C["constellaration Physics Wiring"] --> D
|
| 42 |
D --> E["Fixture Checks"]
|
| 43 |
E --> F["Manual Playtest"]
|
| 44 |
F --> G["Reward V1"]
|
|
|
|
| 121 |
[AGENTS.md](AGENTS.md),
|
| 122 |
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 123 |
|
| 124 |
+
- [ ] Write down whether `Reward V0` survives unchanged
|
| 125 |
+
Goal:
|
| 126 |
+
if playtesting does not reveal a real pathology, record that outcome explicitly instead of forcing a `V1`
|
| 127 |
+
Related:
|
| 128 |
+
[README.md](README.md),
|
| 129 |
+
[Plan V2](docs/FUSION_DESIGN_LAB_PLAN_V2.md)
|
| 130 |
+
|
| 131 |
- [x] Decide the non-submit terminal reward policy
|
| 132 |
Goal:
|
| 133 |
budget exhaustion now yields a smaller end-of-episode reward than `submit`, so non-submitting agents still get terminal feedback without outranking explicit submit behavior
|
|
|
|
| 191 |
- [ ] Do not add training-first complexity before manual playtesting
|
| 192 |
- [ ] Do not describe low-fidelity `run` metrics as equivalent to high-fidelity `submit` results
|
| 193 |
- [ ] Do not describe the current baseline reset state as already feasible
|
| 194 |
+
- [ ] Do not force a `Reward V1` story if `Reward V0` survives manual playtesting
|
baselines/README.md
CHANGED
|
@@ -1,5 +1,14 @@
|
|
| 1 |
Random and heuristic baselines will live here.
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
The first baseline milestone is:
|
| 4 |
|
| 5 |
- one random agent
|
|
|
|
| 1 |
Random and heuristic baselines will live here.
|
| 2 |
|
| 3 |
+
## Status
|
| 4 |
+
|
| 5 |
+
- [x] random baseline exists
|
| 6 |
+
- [x] heuristic baseline exists
|
| 7 |
+
- [x] baseline comparison script exists
|
| 8 |
+
- [x] baseline comparison rerun completed on the real verifier path
|
| 9 |
+
- [ ] heuristic refreshed after the real-verifier rerun
|
| 10 |
+
- [ ] presentation-ready comparison trace exported
|
| 11 |
+
|
| 12 |
The first baseline milestone is:
|
| 13 |
|
| 14 |
- one random agent
|
demo/README.md
CHANGED
|
@@ -1,5 +1,12 @@
|
|
| 1 |
Demo assets belong here.
|
| 2 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
Expected contents:
|
| 4 |
|
| 5 |
- one stable episode capture
|
|
|
|
| 1 |
Demo assets belong here.
|
| 2 |
|
| 3 |
+
## Status
|
| 4 |
+
|
| 5 |
+
- [ ] stable episode capture exported
|
| 6 |
+
- [ ] reward-iteration story note exported
|
| 7 |
+
- [ ] baseline comparison figure exported
|
| 8 |
+
- [ ] final 1-minute script drafted
|
| 9 |
+
|
| 10 |
Expected contents:
|
| 11 |
|
| 12 |
- one stable episode capture
|
docs/FUSION_DELIVERABLES_MAP.md
CHANGED
|
@@ -6,6 +6,16 @@ Northflank is the recommended compute workspace behind those artifacts. HF Space
|
|
| 6 |
|
| 7 |
Use this map to sequence execution, not to reopen already-locked task choices.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Deliverables Tree
|
| 10 |
|
| 11 |
```mermaid
|
|
@@ -90,14 +100,12 @@ flowchart LR
|
|
| 90 |
|
| 91 |
## Priority Order
|
| 92 |
|
| 93 |
-
1.
|
| 94 |
-
2.
|
| 95 |
-
3.
|
| 96 |
-
4.
|
| 97 |
-
5.
|
| 98 |
-
6.
|
| 99 |
-
7.
|
| 100 |
-
8.
|
| 101 |
-
9.
|
| 102 |
-
10. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
|
| 103 |
-
11. Polish the repo only after the artifacts are real.
|
|
|
|
| 6 |
|
| 7 |
Use this map to sequence execution, not to reopen already-locked task choices.
|
| 8 |
|
| 9 |
+
## Current Branch Status
|
| 10 |
+
|
| 11 |
+
- [x] `P1` contract is frozen in code
|
| 12 |
+
- [x] official `constellaration` verifier loop is wired
|
| 13 |
+
- [x] baseline comparison has been rerun on the real verifier path
|
| 14 |
+
- [ ] tracked fixtures are checked in
|
| 15 |
+
- [ ] manual playtest evidence exists
|
| 16 |
+
- [ ] heuristic baseline has been refreshed for the real verifier path
|
| 17 |
+
- [ ] HF Space deployment is live
|
| 18 |
+
|
| 19 |
## Deliverables Tree
|
| 20 |
|
| 21 |
```mermaid
|
|
|
|
| 100 |
|
| 101 |
## Priority Order
|
| 102 |
|
| 103 |
+
1. Add tracked fixtures and run fixture sanity checks.
|
| 104 |
+
2. Manual-playtest the environment and record the first real pathology, if any.
|
| 105 |
+
3. Refresh the heuristic baseline from that evidence.
|
| 106 |
+
4. Bring up the Northflank H100 workspace with persistent storage.
|
| 107 |
+
5. Pass the Northflank smoke test.
|
| 108 |
+
6. Make one stable OpenEnv `P1` task work remotely with clear, reproducible rules.
|
| 109 |
+
7. Use the notebook to show traces and comparisons; include training only if it adds signal.
|
| 110 |
+
8. Record the demo around environment clarity, verifier fidelity, reward shaping, and one stable trajectory.
|
| 111 |
+
9. Polish the repo only after the artifacts are real.
|
|
|
|
|
|
docs/FUSION_DESIGN_LAB_PLAN_V2.md
CHANGED
|
@@ -4,6 +4,19 @@
|
|
| 4 |
**Track:** Statement 3.1 (World Modeling — Professional Tasks)
|
| 5 |
**Status:** Judge-aligned plan with `P1` locked
|
| 6 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
## 1. Submission Thesis
|
| 8 |
|
| 9 |
We are not primarily submitting "a trained model for fusion."
|
|
@@ -154,7 +167,7 @@ Allowed reuse:
|
|
| 154 |
|
| 155 |
Implementation handoff:
|
| 156 |
|
| 157 |
-
- the remaining work is now
|
| 158 |
- do not treat supporting decision notes as a new planning backlog
|
| 159 |
|
| 160 |
## 8.1 Compute Surfaces
|
|
@@ -296,6 +309,10 @@ We should expect at least some of these:
|
|
| 296 |
|
| 297 |
The reward is only acceptable after we test for those behaviors.
|
| 298 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 299 |
## 12. Verifier and Reward Fixture Checks
|
| 300 |
|
| 301 |
Before training, we should validate environment wiring with a few fixed fixtures.
|
|
|
|
| 4 |
**Track:** Statement 3.1 (World Modeling — Professional Tasks)
|
| 5 |
**Status:** Judge-aligned plan with `P1` locked
|
| 6 |
|
| 7 |
+
## 0. Current Branch Status
|
| 8 |
+
|
| 9 |
+
- [x] `P1` task family is locked
|
| 10 |
+
- [x] rotating-ellipse `P1` contract is implemented in code
|
| 11 |
+
- [x] real `constellaration` verifier wiring is in place
|
| 12 |
+
- [x] low-fidelity `run` plus high-fidelity `submit` split is documented
|
| 13 |
+
- [x] post-terminal `step()` guard is in place
|
| 14 |
+
- [x] baseline comparison has been rerun on the real verifier path
|
| 15 |
+
- [ ] tracked `P1` fixtures are added
|
| 16 |
+
- [ ] manual playtest evidence is recorded
|
| 17 |
+
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 18 |
+
- [ ] HF Space deployment evidence is recorded
|
| 19 |
+
|
| 20 |
## 1. Submission Thesis
|
| 21 |
|
| 22 |
We are not primarily submitting "a trained model for fusion."
|
|
|
|
| 167 |
|
| 168 |
Implementation handoff:
|
| 169 |
|
| 170 |
+
- the remaining work is now fixture coverage, manual playtesting, heuristic refresh, smoke validation, and deployment
|
| 171 |
- do not treat supporting decision notes as a new planning backlog
|
| 172 |
|
| 173 |
## 8.1 Compute Surfaces
|
|
|
|
| 309 |
|
| 310 |
The reward is only acceptable after we test for those behaviors.
|
| 311 |
|
| 312 |
+
Important execution rule:
|
| 313 |
+
|
| 314 |
+
- if manual playtesting does not reveal a real pathology, keep `Reward V0` and document that outcome rather than forcing a `Reward V1`
|
| 315 |
+
|
| 316 |
## 12. Verifier and Reward Fixture Checks
|
| 317 |
|
| 318 |
Before training, we should validate environment wiring with a few fixed fixtures.
|
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md
CHANGED
|
@@ -13,7 +13,9 @@ Do not expand scope beyond one stable task. Training is supporting evidence, not
|
|
| 13 |
- [x] baselines and API surface have been moved to the `P1` contract
|
| 14 |
- [x] add a post-terminal guard in `step()`
|
| 15 |
- [x] replace the synthetic evaluator with `constellaration`
|
|
|
|
| 16 |
- [ ] add tracked fixtures and manual playtest evidence
|
|
|
|
| 17 |
|
| 18 |
## Plan V2 Inheritance
|
| 19 |
|
|
@@ -103,6 +105,7 @@ Transition rule:
|
|
| 103 |
- bad behavior
|
| 104 |
- refinement to reward V1
|
| 105 |
- improved behavior
|
|
|
|
| 106 |
|
| 107 |
Exit condition: you can explain why the environment now rewards the intended behavior.
|
| 108 |
|
|
|
|
| 13 |
- [x] baselines and API surface have been moved to the `P1` contract
|
| 14 |
- [x] add a post-terminal guard in `step()`
|
| 15 |
- [x] replace the synthetic evaluator with `constellaration`
|
| 16 |
+
- [x] re-run baselines on the real verifier path
|
| 17 |
- [ ] add tracked fixtures and manual playtest evidence
|
| 18 |
+
- [ ] refresh the heuristic baseline after the real-verifier rerun
|
| 19 |
|
| 20 |
## Plan V2 Inheritance
|
| 21 |
|
|
|
|
| 105 |
- bad behavior
|
| 106 |
- refinement to reward V1
|
| 107 |
- improved behavior
|
| 108 |
+
8. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
|
| 109 |
|
| 110 |
Exit condition: you can explain why the environment now rewards the intended behavior.
|
| 111 |
|
docs/PIVOT_P1_ROTATING_ELLIPSE.md
CHANGED
|
@@ -6,6 +6,15 @@
|
|
| 6 |
|
| 7 |
Use this file as rationale for the pivot, not as a fresh planning queue. Once the pivot is accepted, implementation should follow the SSOT plan docs.
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Decision
|
| 10 |
|
| 11 |
Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
|
|
@@ -147,7 +156,6 @@ current_params: {aspect_ratio, elongation, rotational_transform}
|
|
| 147 |
best_params: {aspect_ratio, elongation, rotational_transform}
|
| 148 |
initial_score: float
|
| 149 |
best_score: float
|
| 150 |
-
current_feasibility: float
|
| 151 |
best_feasibility: float
|
| 152 |
history: list[str]
|
| 153 |
```
|
|
@@ -171,13 +179,17 @@ history: list[str]
|
|
| 171 |
|
| 172 |
### Phase 1: Physics Backend (~1 hour)
|
| 173 |
|
|
|
|
|
|
|
| 174 |
Rewrite `server/physics.py` to wrap:
|
| 175 |
- `constellaration.initial_guess.generate_rotating_ellipse` for boundary generation
|
| 176 |
- `constellaration.forward_model.forward_model` with low-fi settings for evaluation
|
| 177 |
-
- `constellaration.problems.GeometricalProblem` for official P1 scoring on
|
| 178 |
|
| 179 |
### Phase 2: Environment Contract (~1 hour)
|
| 180 |
|
|
|
|
|
|
|
| 181 |
Update `server/environment.py`:
|
| 182 |
- New observation schema with P1 metrics
|
| 183 |
- New action schema for rotating-ellipse perturbations
|
|
@@ -188,6 +200,8 @@ Update `fusion_lab/models.py` for new schemas.
|
|
| 188 |
|
| 189 |
### Phase 3: Manual Playtest (~30 min)
|
| 190 |
|
|
|
|
|
|
|
| 191 |
Validate hypothesis: "6 actions is enough."
|
| 192 |
- Play 5-10 episodes manually
|
| 193 |
- Log: can a human reach feasibility? Improve elongation?
|
|
@@ -196,12 +210,16 @@ Validate hypothesis: "6 actions is enough."
|
|
| 196 |
|
| 197 |
### Phase 4: Baselines (~30 min)
|
| 198 |
|
|
|
|
|
|
|
| 199 |
- Random agent
|
| 200 |
- Heuristic agent (greedy toward known-good parameter region)
|
| 201 |
- Comparison table
|
| 202 |
|
| 203 |
### Phase 5: Deploy + Evidence (~2 hours)
|
| 204 |
|
|
|
|
|
|
|
| 205 |
- Update Dockerfile/deps for constellaration
|
| 206 |
- `openenv validate` + `openenv push`
|
| 207 |
- Colab notebook connecting to live environment
|
|
|
|
| 6 |
|
| 7 |
Use this file as rationale for the pivot, not as a fresh planning queue. Once the pivot is accepted, implementation should follow the SSOT plan docs.
|
| 8 |
|
| 9 |
+
## Current Branch Status
|
| 10 |
+
|
| 11 |
+
- [x] pivot accepted
|
| 12 |
+
- [x] rotating-ellipse `P1` contract is implemented
|
| 13 |
+
- [x] `constellaration` verifier path is wired
|
| 14 |
+
- [ ] tracked fixtures are added
|
| 15 |
+
- [ ] manual playtest evidence is recorded
|
| 16 |
+
- [ ] heuristic baseline is refreshed for the real verifier path
|
| 17 |
+
|
| 18 |
## Decision
|
| 19 |
|
| 20 |
Pivot the OpenEnv environment to use the official ConStellaration P1 benchmark with real VMEC physics, scoped to the rotating-ellipse low-dimensional parameter space.
|
|
|
|
| 156 |
best_params: {aspect_ratio, elongation, rotational_transform}
|
| 157 |
initial_score: float
|
| 158 |
best_score: float
|
|
|
|
| 159 |
best_feasibility: float
|
| 160 |
history: list[str]
|
| 161 |
```
|
|
|
|
| 179 |
|
| 180 |
### Phase 1: Physics Backend (~1 hour)
|
| 181 |
|
| 182 |
+
Status: done.
|
| 183 |
+
|
| 184 |
Rewrite `server/physics.py` to wrap:
|
| 185 |
- `constellaration.initial_guess.generate_rotating_ellipse` for boundary generation
|
| 186 |
- `constellaration.forward_model.forward_model` with low-fi settings for evaluation
|
| 187 |
+
- `constellaration.problems.GeometricalProblem` for official P1 scoring on every evaluation
|
| 188 |
|
| 189 |
### Phase 2: Environment Contract (~1 hour)
|
| 190 |
|
| 191 |
+
Status: done.
|
| 192 |
+
|
| 193 |
Update `server/environment.py`:
|
| 194 |
- New observation schema with P1 metrics
|
| 195 |
- New action schema for rotating-ellipse perturbations
|
|
|
|
| 200 |
|
| 201 |
### Phase 3: Manual Playtest (~30 min)
|
| 202 |
|
| 203 |
+
Status: open.
|
| 204 |
+
|
| 205 |
Validate hypothesis: "6 actions is enough."
|
| 206 |
- Play 5-10 episodes manually
|
| 207 |
- Log: can a human reach feasibility? Improve elongation?
|
|
|
|
| 210 |
|
| 211 |
### Phase 4: Baselines (~30 min)
|
| 212 |
|
| 213 |
+
Status: partial. Baselines exist, but the heuristic needs refresh on the real verifier path.
|
| 214 |
+
|
| 215 |
- Random agent
|
| 216 |
- Heuristic agent (greedy toward known-good parameter region)
|
| 217 |
- Comparison table
|
| 218 |
|
| 219 |
### Phase 5: Deploy + Evidence (~2 hours)
|
| 220 |
|
| 221 |
+
Status: open.
|
| 222 |
+
|
| 223 |
- Update Dockerfile/deps for constellaration
|
| 224 |
- `openenv validate` + `openenv push`
|
| 225 |
- Colab notebook connecting to live environment
|
server/data/README.md
CHANGED
|
@@ -1,3 +1,7 @@
|
|
| 1 |
Baseline VMEC inputs and related static assets belong here.
|
| 2 |
|
| 3 |
Do not commit generated solver outputs or large transient artifacts.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
Baseline VMEC inputs and related static assets belong here.
|
| 2 |
|
| 3 |
Do not commit generated solver outputs or large transient artifacts.
|
| 4 |
+
|
| 5 |
+
## Status
|
| 6 |
+
|
| 7 |
+
- [ ] tracked `P1` fixture assets added under `server/data/p1/`
|
server/data/p1/README.md
CHANGED
|
@@ -10,4 +10,11 @@ Intended contents:
|
|
| 10 |
|
| 11 |
These fixtures are for verifier and reward sanity checks.
|
| 12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
Do not copy the old `ai-sci-feasible-designs` harness here. Reuse only the specific JSON artifacts needed for the fresh `P1` environment.
|
|
|
|
| 10 |
|
| 11 |
These fixtures are for verifier and reward sanity checks.
|
| 12 |
|
| 13 |
+
## Status
|
| 14 |
+
|
| 15 |
+
- [ ] known-good or near-winning fixture added
|
| 16 |
+
- [ ] near-boundary fixture added
|
| 17 |
+
- [ ] clearly infeasible fixture added
|
| 18 |
+
- [ ] fixture sanity note written
|
| 19 |
+
|
| 20 |
Do not copy the old `ai-sci-feasible-designs` harness here. Reuse only the specific JSON artifacts needed for the fresh `P1` environment.
|
training/README.md
CHANGED
|
@@ -1,3 +1,9 @@
|
|
| 1 |
Training and evaluation notebooks belong here.
|
| 2 |
|
| 3 |
This repository treats notebooks as supporting evidence for the environment, not the primary product.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
Training and evaluation notebooks belong here.
|
| 2 |
|
| 3 |
This repository treats notebooks as supporting evidence for the environment, not the primary product.
|
| 4 |
+
|
| 5 |
+
## Status
|
| 6 |
+
|
| 7 |
+
- [ ] Northflank notebook artifacts saved
|
| 8 |
+
- [ ] Colab notebook saved
|
| 9 |
+
- [ ] training evidence included only if it is persuasive
|
training/notebooks/README.md
CHANGED
|
@@ -12,6 +12,12 @@ Recommended split:
|
|
| 12 |
- Northflank notebook: main compute workspace on the team H100
|
| 13 |
- Colab notebook: thin public artifact required by the hackathon
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
Operational defaults:
|
| 16 |
|
| 17 |
- use the same Python dependency set as the repo runtime
|
|
|
|
| 12 |
- Northflank notebook: main compute workspace on the team H100
|
| 13 |
- Colab notebook: thin public artifact required by the hackathon
|
| 14 |
|
| 15 |
+
## Status
|
| 16 |
+
|
| 17 |
+
- [ ] Northflank smoke notebook note saved
|
| 18 |
+
- [ ] manual-playtest notebook or trace notebook saved
|
| 19 |
+
- [ ] thin public Colab notebook saved
|
| 20 |
+
|
| 21 |
Operational defaults:
|
| 22 |
|
| 23 |
- use the same Python dependency set as the repo runtime
|