Spaces:

CreativeEngineer
/

fusion-design-lab

Paused

App Files Files Community

CreativeEngineer commited on Mar 8

Commit

6deaccc

1 Parent(s): eb446cf

refactor: align p1 runtime contract and baseline reporting

Browse files

Files changed (14) hide show

README.md +10 -9
TODO.md +10 -4
baselines/compare.py +24 -7
baselines/heuristic_agent.py +15 -4
baselines/random_agent.py +22 -9
docs/FUSION_DESIGN_LAB_PLAN_V2.md +9 -7
docs/FUSION_NEXT_12_HOURS_CHECKLIST.md +13 -8
docs/P1_ENV_CONTRACT_V1.md +7 -5
docs/PIVOT_P1_ROTATING_ELLIPSE.md +12 -7
fusion_lab/models.py +20 -30
hackathan_raw_guidance.md +0 -239
server/contract.py +26 -0
server/environment.py +37 -36
tests/test_repo_scaffold.py +0 -9

README.md CHANGED Viewed

@@ -30,8 +30,8 @@ Implementation status:
 ## Execution Status
 - [x] Lock the `P1` contract in code
-- [x] Rewrite shared models to the rotating-ellipse `P1` schema
-- [x] Rewrite the environment loop to the rotating-ellipse `P1` schema
 - [x] Update the API/task surface to match `P1`
 - [x] Update baseline agents to the `P1` contract
 - [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
@@ -59,7 +59,7 @@ Implementation status:
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
-- `best_score` and `best_feasibility` are currently context-dependent in observations: run-time views reflect low-fidelity rollout state, while submit-time views can reflect high-fidelity best state. Keep that distinction explicit in docs, traces, and baseline interpretation until the contract is simplified further.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
@@ -121,12 +121,13 @@ uv sync --extra notebooks
 ## Immediate Next Steps
 1. Run a small measured sweep on the repaired family to choose useful ranges, deltas, and reset seeds.
-2. Add tracked `P1` fixtures under `server/data/p1`.
-3. Run manual playtest episodes and record the first real reward pathology, if any.
-4. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
-5. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
-6. Deploy the environment to HF Space.
-7. Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.

 ## Execution Status
 - [x] Lock the `P1` contract in code
+- [x] Rewrite shared models to the repaired low-dimensional `P1` schema
+- [x] Rewrite the environment loop to the repaired low-dimensional `P1` schema
 - [x] Update the API/task surface to match `P1`
 - [x] Update baseline agents to the `P1` contract
 - [x] Add a post-terminal guard so `step()` is a no-op after `done=True`
 - `run` uses low-fidelity `constellaration` metrics, while `submit` re-evaluates the current design with high-fidelity `skip_qi`; do not present step-time metrics as final submission metrics.
 - VMEC failure semantics are now explicit in the runtime path. Failed evaluations cost budget, produce a visible failure observation, and apply a penalty.
 - Terminal reward/reporting now uses a fidelity-consistent basis: `submit` compares against high-fidelity reference state instead of low-fidelity rollout score state.
+- Observation best-state reporting is now split explicitly between low-fidelity rollout state and high-fidelity submit state; baseline traces and demo copy should use those explicit fields rather than infer a mixed best-state story.
 - Budget exhaustion now returns a smaller terminal reward than explicit `submit`; keep that asymmetry when tuning reward so agents still prefer deliberate submission.
 - The real-verifier baseline rerun showed the old heuristic is no longer useful as-is: over 5 seeded episodes, both agents stayed at `0.0` mean best score and the heuristic underperformed random on reward. The heuristic needs redesign after the repaired parameterization and manual playtesting.
 ## Immediate Next Steps
 1. Run a small measured sweep on the repaired family to choose useful ranges, deltas, and reset seeds.
+2. Verify that observation semantics are human-readable and that low-fi `run` versus high-fi `submit` best-state reporting is not ambiguous.
+3. Add tracked `P1` fixtures under `server/data/p1`.
+4. Run manual playtest episodes and record the first real reward pathology, if any.
+5. Refresh the heuristic baseline using manual playtest evidence, then save one comparison trace.
+6. Use the passing Northflank H100 setup to produce remote traces and comparisons from the real verifier path.
+7. Deploy the environment to HF Space.
+8. Add the Colab notebook under `training/notebooks`.
 These are implementation steps, not another planning phase.

TODO.md CHANGED Viewed

@@ -20,8 +20,8 @@ Priority source:
 ## Current State
 - [x] `P1` strategy is locked
-- [x] shared models reflect the rotating-ellipse `P1` contract
-- [x] environment loop reflects the rotating-ellipse `P1` contract
 - [x] API/task surface reflects `P1`
 - [x] baselines reflect the `P1` contract
 - [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
@@ -74,7 +74,7 @@ flowchart TD
 - [x] Verify that the current 3-knob family can or cannot approach P1 feasibility
   Goal:
-  decide whether parameterization repair is a blocker before more reward work
   Related:
   [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
@@ -164,12 +164,18 @@ flowchart TD
   Related:
   [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
 - [ ] Add 1-2 tracked `P1` fixtures
   Files:
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
   Note:
-  add fixtures only after the parameterization repair produces a meaningful near-boundary region
 - [ ] Run fixture sanity checks
   Goal:

 ## Current State
 - [x] `P1` strategy is locked
+- [x] shared models reflect the repaired low-dimensional `P1` contract
+- [x] environment loop reflects the repaired low-dimensional `P1` contract
 - [x] API/task surface reflects `P1`
 - [x] baselines reflect the `P1` contract
 - [x] repo docs call out the low-fi/high-fi `constellaration` split honestly
 - [x] Verify that the current 3-knob family can or cannot approach P1 feasibility
   Goal:
+  resolve the historical gating question about whether parameterization repair was required before more reward work
   Related:
   [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
   Related:
   [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
+- [x] Clarify or split fidelity-dependent best-state observation fields
+  Goal:
+  replace ambiguous mixed best-state reporting with explicit low-fidelity and high-fidelity best-state fields before fixture evidence or baseline comparisons
+  Related:
+  [P1 Environment Contract](docs/P1_ENV_CONTRACT_V1.md)
 - [ ] Add 1-2 tracked `P1` fixtures
   Files:
   [server/data/p1/README.md](server/data/p1/README.md),
   [P1 Pivot Record](docs/PIVOT_P1_ROTATING_ELLIPSE.md)
   Note:
+  add fixtures only after the repaired family is calibrated into a meaningful near-boundary region
 - [ ] Run fixture sanity checks
   Goal:

baselines/compare.py CHANGED Viewed

@@ -14,27 +14,36 @@ def main(n_episodes: int = 20) -> None:
     random_rewards: list[float] = []
     heuristic_rewards: list[float] = []
-    random_best_scores: list[float] = []
-    heuristic_best_scores: list[float] = []
     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
         random_rewards.append(rr)
-        random_best_scores.append(rt[-1]["best_score"])
         hr, ht = heuristic_episode(env, seed=i)
         heuristic_rewards.append(hr)
-        heuristic_best_scores.append(ht[-1]["best_score"])
     r_mean = sum(random_rewards) / len(random_rewards)
     h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
-    r_score = sum(random_best_scores) / len(random_best_scores)
-    h_score = sum(heuristic_best_scores) / len(heuristic_best_scores)
     print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
     print("-" * 51)
     print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
-    print(f"{'Mean best P1 score':<25} {r_score:>12.6f} {h_score:>12.6f}")
     print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
     print()
@@ -42,6 +51,14 @@ def main(n_episodes: int = 20) -> None:
     print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
 if __name__ == "__main__":
     n = int(sys.argv[1]) if len(sys.argv) > 1 else 20
     main(n)

     random_rewards: list[float] = []
     heuristic_rewards: list[float] = []
+    random_final_scores: list[float] = []
+    heuristic_final_scores: list[float] = []
+    random_feasible: list[int] = []
+    heuristic_feasible: list[int] = []
     for i in range(n_episodes):
         rr, rt = random_episode(env, seed=i)
+        _require_submit_fidelity(rt[-1], baseline_name="random")
         random_rewards.append(rr)
+        random_final_scores.append(rt[-1]["score"])
+        random_feasible.append(1 if rt[-1]["constraints_satisfied"] else 0)
         hr, ht = heuristic_episode(env, seed=i)
+        _require_submit_fidelity(ht[-1], baseline_name="heuristic")
         heuristic_rewards.append(hr)
+        heuristic_final_scores.append(ht[-1]["score"])
+        heuristic_feasible.append(1 if ht[-1]["constraints_satisfied"] else 0)
     r_mean = sum(random_rewards) / len(random_rewards)
     h_mean = sum(heuristic_rewards) / len(heuristic_rewards)
+    r_score = sum(random_final_scores) / len(random_final_scores)
+    h_score = sum(heuristic_final_scores) / len(heuristic_final_scores)
+    r_feasible = sum(random_feasible)
+    h_feasible = sum(heuristic_feasible)
     print(f"{'Metric':<25} {'Random':>12} {'Heuristic':>12}")
     print("-" * 51)
     print(f"{'Mean reward':<25} {r_mean:>+12.4f} {h_mean:>+12.4f}")
+    print(f"{'Mean final P1 score':<25} {r_score:>12.6f} {h_score:>12.6f}")
+    print(f"{'Feasible finals':<25} {r_feasible:>12d} {h_feasible:>12d}")
     print(f"{'Episodes':<25} {n_episodes:>12d} {n_episodes:>12d}")
     print()
     print(f"Heuristic wins: {wins}/{n_episodes} episodes ({100 * wins / n_episodes:.0f}%)")
+def _require_submit_fidelity(final_step: dict[str, object], *, baseline_name: str) -> None:
+    fidelity = final_step["evaluation_fidelity"]
+    if fidelity != "high":
+        raise ValueError(
+            f"{baseline_name} baseline ended on {fidelity!r} instead of high-fidelity submit."
+        )
 if __name__ == "__main__":
     n = int(sys.argv[1]) if len(sys.argv) > 1 else 20
     main(n)

baselines/heuristic_agent.py CHANGED Viewed

@@ -13,10 +13,19 @@ def heuristic_episode(
 ) -> tuple[float, list[dict[str, object]]]:
     obs = env.reset(seed=seed)
     total_reward = 0.0
-    trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
     while not obs.done:
-        action = _choose_action(obs)
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
@@ -24,7 +33,8 @@ def heuristic_episode(
                 "step": len(trace),
                 "action": _action_label(action),
                 "score": obs.p1_score,
-                "best_score": obs.best_score,
                 "reward": obs.reward,
                 "failure": obs.evaluation_failed,
             }
@@ -95,7 +105,8 @@ def main(n_episodes: int = 20) -> None:
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
-            f"final_score={final['score']:.6f}  best_score={final['best_score']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

 ) -> tuple[float, list[dict[str, object]]]:
     obs = env.reset(seed=seed)
     total_reward = 0.0
+    trace: list[dict[str, object]] = [
+        {
+            "step": 0,
+            "score": obs.p1_score,
+            "evaluation_fidelity": obs.evaluation_fidelity,
+            "constraints_satisfied": obs.constraints_satisfied,
+        }
+    ]
     while not obs.done:
+        action = (
+            StellaratorAction(intent="submit") if obs.budget_remaining <= 1 else _choose_action(obs)
+        )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
                 "step": len(trace),
                 "action": _action_label(action),
                 "score": obs.p1_score,
+                "evaluation_fidelity": obs.evaluation_fidelity,
+                "constraints_satisfied": obs.constraints_satisfied,
                 "reward": obs.reward,
                 "failure": obs.evaluation_failed,
             }
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
+            f"final_score={final['score']:.6f}  fidelity={final['evaluation_fidelity']}  "
+            f"constraints={'yes' if final['constraints_satisfied'] else 'no'}  "
             f"reward={total_reward:+.4f}"
         )

baselines/random_agent.py CHANGED Viewed

@@ -24,15 +24,25 @@ def random_episode(
     rng = random.Random(seed)
     obs = env.reset(seed=seed)
     total_reward = 0.0
-    trace: list[dict[str, object]] = [{"step": 0, "score": obs.p1_score}]
     while not obs.done:
-        action = StellaratorAction(
-            intent="run",
-            parameter=rng.choice(PARAMETERS),
-            direction=rng.choice(DIRECTIONS),
-            magnitude=rng.choice(MAGNITUDES),
-        )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
@@ -40,7 +50,9 @@ def random_episode(
                 "step": len(trace),
                 "action": action.intent,
                 "score": obs.p1_score,
-                "best_score": obs.best_score,
                 "reward": obs.reward,
             }
         )
@@ -58,7 +70,8 @@ def main(n_episodes: int = 20) -> None:
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
-            f"final_score={final['score']:.6f}  best_score={final['best_score']:.6f}  "
             f"reward={total_reward:+.4f}"
         )

     rng = random.Random(seed)
     obs = env.reset(seed=seed)
     total_reward = 0.0
+    trace: list[dict[str, object]] = [
+        {
+            "step": 0,
+            "score": obs.p1_score,
+            "evaluation_fidelity": obs.evaluation_fidelity,
+            "constraints_satisfied": obs.constraints_satisfied,
+        }
+    ]
     while not obs.done:
+        if obs.budget_remaining <= 1:
+            action = StellaratorAction(intent="submit")
+        else:
+            action = StellaratorAction(
+                intent="run",
+                parameter=rng.choice(PARAMETERS),
+                direction=rng.choice(DIRECTIONS),
+                magnitude=rng.choice(MAGNITUDES),
+            )
         obs = env.step(action)
         total_reward += obs.reward or 0.0
         trace.append(
                 "step": len(trace),
                 "action": action.intent,
                 "score": obs.p1_score,
+                "evaluation_fidelity": obs.evaluation_fidelity,
+                "constraints_satisfied": obs.constraints_satisfied,
+                "evaluation_failed": obs.evaluation_failed,
                 "reward": obs.reward,
             }
         )
         rewards.append(total_reward)
         print(
             f"Episode {i:3d}: steps={len(trace) - 1}  "
+            f"final_score={final['score']:.6f}  fidelity={final['evaluation_fidelity']}  "
+            f"constraints={'yes' if final['constraints_satisfied'] else 'no'}  "
             f"reward={total_reward:+.4f}"
         )

docs/FUSION_DESIGN_LAB_PLAN_V2.md CHANGED Viewed

@@ -244,8 +244,10 @@ The observation should expose:
 - `failure_reason`
 - `step_number`
 - `budget_remaining`
-- `best_score`
-- `best_feasibility`
 - `target_spec`
 - concise textual summary of the last action outcome in `diagnostics_text`
@@ -253,10 +255,9 @@ The observation must be interpretable by a human without additional hidden state
 Current runtime note:
-- `best_score` and `best_feasibility` are not yet fully split by fidelity in the observation schema
-- low-fidelity run observations display rollout best state
-- high-fidelity submit observations may display high-fidelity best state instead
-- keep that distinction explicit in docs and traces until the contract is simplified further
 ### Action Space
@@ -625,10 +626,11 @@ Deliverables:
 ### Phase 2
-Freeze initial fixtures and manual-playtest the environment.
 Deliverables:
 - one good or near-boundary fixture
 - bad fixtures
 - 5 to 10 episode logs

 - `failure_reason`
 - `step_number`
 - `budget_remaining`
+- `best_low_fidelity_score`
+- `best_low_fidelity_feasibility`
+- `best_high_fidelity_score`
+- `best_high_fidelity_feasibility`
 - `target_spec`
 - concise textual summary of the last action outcome in `diagnostics_text`
 Current runtime note:
+- the live observation surface now exposes explicit low-fidelity and high-fidelity best-state fields
+- low-fi run steps and high-fi submit steps no longer overload one generic `best_score` field
+- traces and baselines should use the explicit fields instead of reconstructing a mixed best-state story
 ### Action Space
 ### Phase 2
+Audit observation clarity, then freeze initial fixtures and manual-playtest the environment.
 Deliverables:
+- observation semantics note covering low-fi vs high-fi reporting and best-state fields
 - one good or near-boundary fixture
 - bad fixtures
 - 5 to 10 episode logs

docs/FUSION_NEXT_12_HOURS_CHECKLIST.md CHANGED Viewed

@@ -98,35 +98,40 @@ Transition rule:
 ## Hour 2-4: Verify Wiring, Then Manual Playtest
 1. Run a small measured sweep on the repaired family before freezing defaults.
-2. Run fixture checks:
    - known-good or near-winning design
    - near-boundary designs
    - clearly bad designs
    - do not rely on the current default baseline params as the only starting point
-3. Confirm:
    - verifier outputs are sane
    - reward ordering is sane
    - objective direction is correct
-4. Manually play 5 to 10 episodes.
-5. Log for each step:
    - observation
    - chosen action
    - expected effect
    - returned reward
    - confusion or exploit if observed
-6. Identify at least one bad incentive or exploit.
-7. Patch reward or penalty logic immediately.
-8. Write the reward shaping story:
    - initial reward V0
    - bad behavior
    - refinement to reward V1
    - improved behavior
-9. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
 Exit condition: you can explain why the environment now rewards the intended behavior.
 Artifacts:
 - measured range and delta note
 - fixture check note
 - manual playtest log
 - reward shaping note

 ## Hour 2-4: Verify Wiring, Then Manual Playtest
 1. Run a small measured sweep on the repaired family before freezing defaults.
+2. Audit observation clarity:
+   - low-fi `run` metrics are clearly labeled
+   - high-fi `submit` metrics are clearly labeled
+   - low-fidelity and high-fidelity best-state fields are explicit and human-readable
+3. Run fixture checks:
    - known-good or near-winning design
    - near-boundary designs
    - clearly bad designs
    - do not rely on the current default baseline params as the only starting point
+4. Confirm:
    - verifier outputs are sane
    - reward ordering is sane
    - objective direction is correct
+5. Manually play 5 to 10 episodes.
+6. Log for each step:
    - observation
    - chosen action
    - expected effect
    - returned reward
    - confusion or exploit if observed
+7. Identify at least one bad incentive or exploit.
+8. Patch reward or penalty logic immediately.
+9. Write the reward shaping story:
    - initial reward V0
    - bad behavior
    - refinement to reward V1
    - improved behavior
+10. If no real pathology appears, record that `Reward V0` survived playtesting and move on.
 Exit condition: you can explain why the environment now rewards the intended behavior.
 Artifacts:
 - measured range and delta note
+- observation semantics note
 - fixture check note
 - manual playtest log
 - reward shaping note

docs/P1_ENV_CONTRACT_V1.md CHANGED Viewed

@@ -160,8 +160,10 @@ Keep:
 - `failure_reason`
 - `step_number`
 - `budget_remaining`
-- `best_score`
-- `best_feasibility`
 - `target_spec`
 - `diagnostics_text`
@@ -170,7 +172,7 @@ Add clarity about fidelity:
 - low-fidelity step-time metrics should be labeled as such
 - high-fidelity submit-time metrics should be labeled as such
 - do not expose them as if they are the same truth surface
-- in the current runtime, `best_score` and `best_feasibility` can switch meaning with fidelity context, so traces and baselines should not treat them as one invariant metric yet
 This can be done either by:
@@ -182,7 +184,7 @@ The minimum requirement is that a reader can tell whether a metric came from low
 Current repo state:
 - the live observation surface now exposes evaluation fidelity and failure state
-- the exact naming can still be refined after playtesting, but low-fi vs high-fi is no longer implicit
 - terminal reward/reporting is now fidelity-consistent: `submit` compares against high-fi reference state instead of low-fi rollout score state
 ## Reward V0
@@ -217,7 +219,7 @@ Do not use reward complexity to compensate for missing action expressivity or mi
 Additional fidelity rule:
-- do not compare a high-fidelity submit score against low-fidelity `initial_score` or `best_score` state
 - terminal reward and submit summaries should use a fidelity-consistent basis
 ## Reset Strategy

 - `failure_reason`
 - `step_number`
 - `budget_remaining`
+- `best_low_fidelity_score`
+- `best_low_fidelity_feasibility`
+- `best_high_fidelity_score`
+- `best_high_fidelity_feasibility`
 - `target_spec`
 - `diagnostics_text`
 - low-fidelity step-time metrics should be labeled as such
 - high-fidelity submit-time metrics should be labeled as such
 - do not expose them as if they are the same truth surface
+- the live runtime should expose separate low-fidelity and high-fidelity best-state fields instead of overloading one generic best-state metric
 This can be done either by:
 Current repo state:
 - the live observation surface now exposes evaluation fidelity and failure state
+- the live observation surface now exposes separate low-fidelity and high-fidelity best-state fields
 - terminal reward/reporting is now fidelity-consistent: `submit` compares against high-fi reference state instead of low-fi rollout score state
 ## Reward V0
 Additional fidelity rule:
+- do not compare a high-fidelity submit score against low-fidelity baseline state
 - terminal reward and submit summaries should use a fidelity-consistent basis
 ## Reset Strategy

docs/PIVOT_P1_ROTATING_ELLIPSE.md CHANGED Viewed

@@ -147,16 +147,18 @@ evaluation_failed: bool
 failure_reason: str
 step_number: int
 budget_remaining: int
-best_score: float
-best_feasibility: float
 target_spec: str
 ```
 Current requirement:
 - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
-- in the current runtime, `best_score` and `best_feasibility` may reflect low-fidelity rollout state during `run` and high-fidelity best state during `submit`
-- do not narrate those fields as one fidelity-independent quantity until the contract is simplified further
 ### Reward V0
@@ -195,9 +197,12 @@ Current execution note:
 step_count: int
 current_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
 best_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
-initial_score: float
-best_score: float
-best_feasibility: float
 history: list[str]
 ```

 failure_reason: str
 step_number: int
 budget_remaining: int
+best_low_fidelity_score: float
+best_low_fidelity_feasibility: float
+best_high_fidelity_score: float | None
+best_high_fidelity_feasibility: float | None
 target_spec: str
 ```
 Current requirement:
 - the observation and diagnostics text should make the low-fi vs high-fi distinction explicit
+- best-state reporting should be split explicitly between low-fidelity rollout state and high-fidelity submit state
+- do not narrate low-fi and high-fi best-state fields as one combined metric
 ### Reward V0
 step_count: int
 current_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
 best_params: {aspect_ratio, elongation, rotational_transform, triangularity_scale}
+initial_low_fidelity_score: float
+initial_high_fidelity_score: float | None
+best_low_fidelity_score: float
+best_low_fidelity_feasibility: float
+best_high_fidelity_score: float | None
+best_high_fidelity_feasibility: float | None
 history: list[str]
 ```

fusion_lab/models.py CHANGED Viewed

@@ -24,6 +24,15 @@ class LowDimBoundaryParams(BaseModel):
     triangularity_scale: float
 class StellaratorAction(Action):
     intent: ActionIntent
     parameter: ParameterName | None = None
@@ -46,43 +55,24 @@ class StellaratorObservation(Observation):
     failure_reason: str = ""
     step_number: int = 0
     budget_remaining: int = 6
-    best_score: float = 0.0
-    best_feasibility: float = float("inf")
     constraints_satisfied: bool = True
     target_spec: str = ""
 class StellaratorState(State):
-    initial_params: LowDimBoundaryParams = Field(
-        default_factory=lambda: LowDimBoundaryParams(
-            aspect_ratio=3.6,
-            elongation=1.4,
-            rotational_transform=1.6,
-            triangularity_scale=0.55,
-        )
-    )
-    current_params: LowDimBoundaryParams = Field(
-        default_factory=lambda: LowDimBoundaryParams(
-            aspect_ratio=3.6,
-            elongation=1.4,
-            rotational_transform=1.6,
-            triangularity_scale=0.55,
-        )
-    )
-    best_params: LowDimBoundaryParams = Field(
-        default_factory=lambda: LowDimBoundaryParams(
-            aspect_ratio=3.6,
-            elongation=1.4,
-            rotational_transform=1.6,
-            triangularity_scale=0.55,
-        )
-    )
-    initial_score: float = 0.0
     initial_high_fidelity_score: float | None = None
-    best_score: float = 0.0
-    best_feasibility: float = float("inf")
     best_high_fidelity_score: float | None = None
-    best_high_fidelity_feasibility: float = float("inf")
     budget_total: int = 6
     budget_remaining: int = 6
     episode_done: bool = False

     triangularity_scale: float
+def default_low_dim_boundary_params() -> LowDimBoundaryParams:
+    return LowDimBoundaryParams(
+        aspect_ratio=3.6,
+        elongation=1.4,
+        rotational_transform=1.5,
+        triangularity_scale=0.55,
+    )
 class StellaratorAction(Action):
     intent: ActionIntent
     parameter: ParameterName | None = None
     failure_reason: str = ""
     step_number: int = 0
     budget_remaining: int = 6
+    best_low_fidelity_score: float = 0.0
+    best_low_fidelity_feasibility: float = float("inf")
+    best_high_fidelity_score: float | None = None
+    best_high_fidelity_feasibility: float | None = None
     constraints_satisfied: bool = True
     target_spec: str = ""
 class StellaratorState(State):
+    initial_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
+    current_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
+    best_params: LowDimBoundaryParams = Field(default_factory=default_low_dim_boundary_params)
+    initial_low_fidelity_score: float = 0.0
     initial_high_fidelity_score: float | None = None
+    best_low_fidelity_score: float = 0.0
+    best_low_fidelity_feasibility: float = float("inf")
     best_high_fidelity_score: float | None = None
+    best_high_fidelity_feasibility: float | None = None
     budget_total: int = 6
     budget_remaining: int = 6
     episode_done: bool = False

hackathan_raw_guidance.md DELETED Viewed

@@ -1,239 +0,0 @@
-## **OpenEnv Hackathon Participant Guide**
-Welcome to the [OpenEnv Hackathon](https://cerebralvalley.ai/e/open-env-hackathon), hacker! 👋 We’re thrilled to have you on board.
-This guide is your all-in-one resource for the event, including schedule, rules, technical resources, problem statements, judging information, and more. Please read this carefully; most answers can be found here.
-## **1. Join the [PyTorch Discord Server](https://discord.gg/VBcf6VtfY6)**
-- You’ll be given a Hackathon Participant role by an admin, which will give you access to the hackathon-specific channels.
-- Here, you’ll be able to interact with hackers and sponsors, introduce yourselves, and form teams (for a maximum team size of **3**).
-- If you don't receive your role within **24 hours of joining,** please ping @CV.
-- Please submit your Discord username below so we can grant you the role
-[linkEmbed]
-## **2. Location**
-**|** Shack15 (1 Ferry Building, Suite 201, San Francisco CA. 94111)
-- **Venue Access:** Shack15 is on the 2nd floor of the Ferry Building. Go up the Ferry Building elevator to the second floor, and turn left. Here you will see the main entrance to Shack15.
-- **Parking:** Parking near the Ferry Building is extremely limited. Consider parking farther out and taking Uber, Lyft, or Public Transportation.
-[youtube]
-## **3. WiFi Information**
-- **Username:** SHACK15_Members
-- **Password:** M3mb3r$4L!f3
-## **4. Hackathon Schedule**
-**Saturday, March 7 (Outline)**
-- **9:00 AM:** Doors Open •󠁏 Breakfast Served •󠁏 Team Formation
-- **10:00 AM – 11:30AM**: Kick-off presentations with Meta, Hugging Face, UC Berkeley, CoreWeave, OpenPipe, Unsloth AI, Fleet AI, Mercor, Scaler AI Labs, Snorkel AI, Patronus AI, Halluminate and Scale AI
-- **11:30 AM:** Hacking Begins
-- **1:00 PM:** Lunch Served
-- **6:00 PM:** Dinner Served
-- **10:00 PM:** Doors Close •󠁏 Re-entry not permitted
-**Sunday, March 8 (Outline)**
-- **9:00AM:** Doors Open •󠁏 Breakfast Served
-- **1:00PM:** Hacking stops •󠁏 Submissions Due
-- **1:15PM:** First Round Judging Begins
-- **2:00PM:** Lunch Served
-- **3:00PM:** Final Round Judging Begins
-- **4:00PM:** Winners Announced and Closing
-- **5:00PM:** Doors Close
-All presentation slides can be found here
-[linkEmbed]
-## **5. Hackathon and Submission Rules**
-To keep things fair and aligned with our goals, all teams must follow these rules:
-- **Open Source:** Please ensure your repository is public.
-- **New Work Only:** All projects must be started from scratch during the hackathon with no previous work.
-- **Team Size:** Teams may have up to **3** members.
-- **Banned Projects:** Projects will be disqualified if they: violate legal, ethical, or platform policies, use code, data, or assets you do not have the rights to.
-- Your project **must** use OpenEnv (stable release 0.2.1) deployed on HF spaces
-- You must show a minimal training script for your environment using Unsloth or HF TRL in Colab.
-- You must upload a **one minute** demo video to YouTube talking about your submission.
-## **6. Hackathon Problem Statements**
-Your project must address at least **one of the five** required problem statements.
-- Some problem statements include **optional partner-sponsored sub-problem statements**, which are additional focus areas related to the main theme.
-- Your project may align with **multiple partner sub-problem statements**, but you can only be **judged for a maximum of two**. Please **select up to two** when submitting.
-- Projects that match these partner sub-problem statements are eligible for **extra partner prizes**, judged separately from the main track winners.
-- Each partner sub-problem statement carries a prize of **$10,000 USD**.
-**Statement 1: Multi-Agent Interactions**
-Environments for this theme involve cooperation, competition, negotiation, and coalition formation. Learning from these environments will enable agents to model the beliefs and incentives of others in partially observable settings. This drives theory-of-mind reasoning and emergent strategic behavior.
-- **Expected Outcome:** an environment that can be used to train multi-agent task handling in a LLM
-- **Example Environments:** Market simulations, compute-allocation negotiations, collaborative puzzle worlds, mixed cooperative/competitive strategy games.
-- **Partner Sub-Themes:**
-  - **Fleet AI:** Scalable Oversight: Environments that train oversight agents to monitor, analyze, and explain the behavior of other AI agents operating in complex, multi-agent settings.
-  - **Halluminate:** Multi-Actor Environments: Build a realistic environment where an agent interacts with and manages multiple actors (agents) to discover and achieve the task
-**Statement 2: (Super) Long-Horizon Planning & Instruction Following**
-You will build environments that require deep, multi-step reasoning with sparse or delayed rewards. After using these environments, the goal is to enable agents to decompose goals, track state over extended trajectories, and recover from early mistakes. The aim is to push beyond shallow next-token reasoning toward structured planning and durable internal representations.
-- **Expected Outcome:** an environment that can capture and improve LLM behaviour on challenging long horizon tasks that need long running sessions beyond context memory limits.
-- **Example Environments:** Research-planning simulators, large-scale codebase refactoring tasks, strategic resource management worlds, long-horizon logistics optimization, extremely complicated long-horizon instruction following (e.g., 300 instructions scattered around).
-- **Partner Sub-Themes:**
-  - **Mercor:** Make an environment with capped/uncapped rewards where frontier model rewards scale with token output.
-  - **Scale AI:** Environments for long horizon workflows for non-code use cases within a business setting: focusing on either Sales, Project management, or HR & IT.
-**Statement 3: World Modeling**
-- **Statement 3.1: Professional Tasks:** Here you will develop environments that require real interaction with tools, APIs, or dynamic systems where the model is expected to do real hard work instead of exploiting short-cuts to arrive at the desired outcome. Learning from these environments will enable agents to maintain consistent internal state, update beliefs based on outcomes, and orchestrate multi-step workflows. The goal is to strengthen causal reasoning and persistent world models.
-  - **Expected Outcome:** an environment capturing nuances of a defined partially observable world and improve LLM interaction with it
-  - **Example Environments:** Dynamic browser/API ecosystems, enterprise applications, scientific workflow loops (papers → code → experiments), economic simulations with feedback, tool-discovery benchmarks.
-  - **Partner Sub-Theme:**
-    - **Scaler AI Labs:** Multi-App RL Environment for Enterprise Workflows: Create RL environments to demonstrate complex workflows, business rule nuances etc in a large enterprise
-- **Statement 3.2: Personalized Tasks:** Here we will develop an environment that offers real personalized task handling, imagine replying to personal messages or handling dinner conflicts due to work conflicts, replying to tough emails. Think any personal assistant tasks.
-  - **Expected Outcome:** An environment that gives the model a realistic simulation of handling personal tasks, conflicts and managing them as delegations
-  - **Example Environments:** Executive Assistant Meeting Planner, Dinner and drive planning, email and message replying, etc
-  - **Partner Sub-Theme:**
-    - **Patronus AI:** Consumer Workflows with Schema Drift: Multi-step consumer workflow environments where the underlying data schemas, API contracts, and t&cs/policies/rules change.
-**Statement 4: Self-Improvement**
-The focus here is to create environments where agents can learn to generate new challenges, escalate difficulty, and improve through self-play or adaptive curricula. Rather than optimizing fixed tasks, the goal is for agents to learn to drive their own capability growth. The objective is recursive skill amplification.
-- **Expected Outcome:** an environment for improving self-play of a LLM over a defined set of tasks
-- **Example Environments:** Self-play negotiation arenas, auto-generated math/proof tasks, evolving coding competitions, adaptive RL curricula.
-- **Partner Sub-Theme:**
-  - **Snorkel AI:** Simulated Experts-in-the-Loop: Environment that simulates interactions with real subject-matter experts, with changing requirements / preferences.
-**Statement 5: Wild Card - Impress Us!**
-We do not want to limit your focus if your idea doesn’t fit the boxes above, we want and WILL reward out of box tasks, please be creative but remember to add submissions that meaningfully add value to LLM training on a certain task.
-More details about each theme can be found here:
-[linkEmbed]
-## **7. CV Hackathon Winners**
-[linkEmbed]
-## **8. OpenEnv Provided Resources**
-**Please read through the entire slideshow here. This includes:**
-- OpenEnv Fundamentals, Architecture
-- Local Dev, Docker, and HF Spaces Deployment
-- OpenEnv in Practice
-- Training (TRL & Unsloth)
-- How-to-Access-Infrastructure (including GPU Request Form)
-[linkEmbed]
-## **9. Partner Provided Resources**
-- **Unsloth AI Resources**
-  - <https://unsloth.ai/docs/get-started/unsloth-notebooks#grpo-reasoning-rl-notebooks>
-- **Mercor Resources**
-  - Dataset: <https://huggingface.co/datasets/mercor/apex-agents>
-  - Archipelago repo to run the eval: <https://github.com/Mercor-Intelligence/archipelago>
-  - APEX-Agents paper: <https://arxiv.org/abs/2601.14242>
-- **Hugging Face Resources**
-  - **$30** in Compute and Inference Credits
-  - To claim your credits, set up a HF account here: <https://huggingface.co/join>
-  - Then, follow this link: <https://huggingface.co/openenv-community>
-  - You will be granted **$30** of compute and inference credits!
-- **Northflank Resources**
-  - Each team gets an H100
-  - Northflank instructions
-    [linkEmbed]
-  - Join the NorthFlank discord channel for any questions
-  - Please fill out this form:
-    [linkEmbed]
-- **Cursor Resources**
-  - **$50** in Cursor Credits, **apply below**
-    [linkEmbed]
-## **10. Judging & Submissions**
-Judges will be taking place on **Sunday, March 8**. These judges are evaluating your **technical demos** in the following categories. _Show us what you have built_ to solve our problem statements. Please **do not** show us a presentation. We'll be checking to ensure your project was built **entirely during the event**; no previous work is allowed.
-**|** **Teams should submit [here](https://cerebralvalley.ai/e/openenv-hackathon-sf/hackathon/submit) when they have completed hacking.** In the submission form, you will have to upload a **one minute** demo video on YouTube talking about your submission. You must also show a minimal training script for your environment using Unsloth or HF TRL in Colab.
-**Please ensure your project uses** use OpenEnv (stable release 0.2.1) deployed on HF spaces.
-[linkEmbed]
-**Judging Criteria**
-- **Environment Innovation (40%) -** Is the environment novel, creative, or challenging? Does it meaningfully test the agent’s behavior?
-- **Storytelling (30%) -** Does the team clearly explain the problem, environment, and agent behavior? Is the demo engaging and easy to follow?
-- **Training Script Showing Improvement in Rewards (20%) -** Does the demo provide observable evidence of training progress (reward curves, metrics, or before/after behavior)?
-- **Reward and Training Pipeline Setup (10%) -** Is the reward logic coherent, and does the pipeline produce meaningful improvement in the agent’s inference (how it acts in the environment)?
-**Judging Process**
-**|** Judging proceeds in two rounds:
-- Hackers will be assigned groups of judges; \~3 minutes to pitch followed by 1-2 minutes of Q/A
-- The top **six** teams in ranking will get to demo on stage to a panel of judges; \~3 minutes to pitch followed by 2-3 minutes for Q/A.
-## **11. Prizes**
-- **1st Place:** $15,000 USD Cash
-- **2nd Place:** $9,000 USD Cash
-- **3rd Place:** $6,000 USD Cash

server/contract.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from __future__ import annotations
+from typing import Final
+from fusion_lab.models import LowDimBoundaryParams, default_low_dim_boundary_params
+N_FIELD_PERIODS: Final[int] = 3
+DEFAULT_RESET_SEED: Final[LowDimBoundaryParams] = default_low_dim_boundary_params()
+RESET_SEEDS: Final[tuple[LowDimBoundaryParams, ...]] = (
+    DEFAULT_RESET_SEED,
+    LowDimBoundaryParams(
+        aspect_ratio=3.4,
+        elongation=1.4,
+        rotational_transform=1.6,
+        triangularity_scale=0.55,
+    ),
+    LowDimBoundaryParams(
+        aspect_ratio=3.8,
+        elongation=1.4,
+        rotational_transform=1.5,
+        triangularity_scale=0.55,
+    ),
+)
+SMOKE_TEST_PARAMS: Final[LowDimBoundaryParams] = DEFAULT_RESET_SEED

server/environment.py CHANGED Viewed

@@ -71,10 +71,9 @@ class StellaratorEnvironment(
             initial_params=params,
             current_params=params,
             best_params=params,
-            initial_score=metrics.p1_score,
-            best_score=metrics.p1_score,
-            best_feasibility=metrics.p1_feasibility,
-            best_high_fidelity_feasibility=float("inf"),
             budget_total=BUDGET,
             budget_remaining=BUDGET,
             episode_done=False,
@@ -151,13 +150,13 @@ class StellaratorEnvironment(
     def _handle_submit(self) -> StellaratorObservation:
         metrics = self._evaluate_params(self._state.current_params, fidelity="high")
-        initial_submit_metrics = self._initial_high_fidelity_metrics()
         best_submit_metrics = self._refresh_best_high_fidelity_metrics(metrics)
         reward = self._compute_reward(
             metrics,
             "submit",
             done=True,
-            initial_reference_score=initial_submit_metrics.p1_score,
         )
         summary = self._summary_submit(metrics, best_submit_metrics)
         self._state.history.append(summary)
@@ -253,7 +252,7 @@ class StellaratorEnvironment(
             base_score = (
                 initial_reference_score
                 if initial_reference_score is not None
-                else self._state.initial_score
             )
             improved = metrics.constraints_satisfied and metrics.p1_score > base_score
             if improved:
@@ -274,6 +273,10 @@ class StellaratorEnvironment(
         reward: float | None = None,
         done: bool = False,
     ) -> StellaratorObservation:
         text_lines = [
             action_summary,
             "",
@@ -284,13 +287,20 @@ class StellaratorEnvironment(
             text_lines.append(f"failure_reason={metrics.failure_reason}")
         text_lines.extend(
             [
-                f"max_elongation={metrics.max_elongation:.4f}  |  best_score={self._display_best_score(metrics):.6f}",
                 f"aspect_ratio={metrics.aspect_ratio:.4f}  (<= {ASPECT_RATIO_MAX:.1f})",
                 f"average_triangularity={metrics.average_triangularity:.4f}  (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
                 f"edge_iota_over_nfp={metrics.edge_iota_over_nfp:.4f}  (>= {EDGE_IOTA_OVER_NFP_MIN:.1f})",
                 (
-                    f"feasibility={metrics.p1_feasibility:.6f}  |  "
-                    f"best_feasibility={self._display_best_feasibility(metrics):.6f}"
                 ),
                 f"vacuum_well={metrics.vacuum_well:.4f}",
                 f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
@@ -312,8 +322,10 @@ class StellaratorEnvironment(
             failure_reason=metrics.failure_reason,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
-            best_score=self._display_best_score(metrics),
-            best_feasibility=self._display_best_feasibility(metrics),
             constraints_satisfied=metrics.constraints_satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
@@ -372,7 +384,7 @@ class StellaratorEnvironment(
             return f"Restore-best failed during low-fidelity evaluation: {metrics.failure_reason}"
         return (
             "Restored the best-known design. "
-            f"Score={metrics.p1_score:.6f}, feasibility={metrics.p1_feasibility:.6f}."
         )
     def _initial_params(self, seed: int | None) -> LowDimBoundaryParams:
@@ -427,12 +439,12 @@ class StellaratorEnvironment(
             and self._last_metrics.evaluation_failed
         )
-    def _initial_high_fidelity_metrics(self) -> EvaluationMetrics:
         if self._state.initial_high_fidelity_score is not None:
-            return self._evaluate_params(self._state.initial_params, fidelity="high")
         metrics = self._evaluate_params(self._state.initial_params, fidelity="high")
         self._state.initial_high_fidelity_score = metrics.p1_score
-        return metrics
     def _refresh_best_high_fidelity_metrics(
         self,
@@ -446,21 +458,10 @@ class StellaratorEnvironment(
         self._state.best_high_fidelity_feasibility = best_metrics.p1_feasibility
         return best_metrics
-    def _display_best_score(self, metrics: EvaluationMetrics) -> float:
-        if (
-            metrics.evaluation_fidelity == "high"
-            and self._state.best_high_fidelity_score is not None
-        ):
-            return self._state.best_high_fidelity_score
-        return self._state.best_score
-    def _display_best_feasibility(self, metrics: EvaluationMetrics) -> float:
-        if (
-            metrics.evaluation_fidelity == "high"
-            and self._state.best_high_fidelity_score is not None
-        ):
-            return self._state.best_high_fidelity_feasibility
-        return self._state.best_feasibility
     def _update_best(self, params: LowDimBoundaryParams, metrics: EvaluationMetrics) -> None:
         if metrics.evaluation_failed:
@@ -470,11 +471,11 @@ class StellaratorEnvironment(
             (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
         )
         best = (
-            (1, self._state.best_score)
-            if self._state.best_feasibility <= FEASIBILITY_TOLERANCE
-            else (0, -self._state.best_feasibility)
         )
         if current > best:
             self._state.best_params = params
-            self._state.best_score = metrics.p1_score
-            self._state.best_feasibility = metrics.p1_feasibility

             initial_params=params,
             current_params=params,
             best_params=params,
+            initial_low_fidelity_score=metrics.p1_score,
+            best_low_fidelity_score=metrics.p1_score,
+            best_low_fidelity_feasibility=metrics.p1_feasibility,
             budget_total=BUDGET,
             budget_remaining=BUDGET,
             episode_done=False,
     def _handle_submit(self) -> StellaratorObservation:
         metrics = self._evaluate_params(self._state.current_params, fidelity="high")
+        initial_submit_score = self._initial_high_fidelity_score()
         best_submit_metrics = self._refresh_best_high_fidelity_metrics(metrics)
         reward = self._compute_reward(
             metrics,
             "submit",
             done=True,
+            initial_reference_score=initial_submit_score,
         )
         summary = self._summary_submit(metrics, best_submit_metrics)
         self._state.history.append(summary)
             base_score = (
                 initial_reference_score
                 if initial_reference_score is not None
+                else self._state.initial_low_fidelity_score
             )
             improved = metrics.constraints_satisfied and metrics.p1_score > base_score
             if improved:
         reward: float | None = None,
         done: bool = False,
     ) -> StellaratorObservation:
+        best_low_fidelity_score = self._state.best_low_fidelity_score
+        best_low_fidelity_feasibility = self._state.best_low_fidelity_feasibility
+        best_high_fidelity_score = self._state.best_high_fidelity_score
+        best_high_fidelity_feasibility = self._state.best_high_fidelity_feasibility
         text_lines = [
             action_summary,
             "",
             text_lines.append(f"failure_reason={metrics.failure_reason}")
         text_lines.extend(
             [
+                f"max_elongation={metrics.max_elongation:.4f}",
                 f"aspect_ratio={metrics.aspect_ratio:.4f}  (<= {ASPECT_RATIO_MAX:.1f})",
                 f"average_triangularity={metrics.average_triangularity:.4f}  (<= {AVERAGE_TRIANGULARITY_MAX:.1f})",
                 f"edge_iota_over_nfp={metrics.edge_iota_over_nfp:.4f}  (>= {EDGE_IOTA_OVER_NFP_MIN:.1f})",
+                f"feasibility={metrics.p1_feasibility:.6f}",
+                f"best_low_fidelity_score={best_low_fidelity_score:.6f}",
+                f"best_low_fidelity_feasibility={best_low_fidelity_feasibility:.6f}",
                 (
+                    "best_high_fidelity_score="
+                    f"{self._format_optional_metric(best_high_fidelity_score)}"
+                ),
+                (
+                    "best_high_fidelity_feasibility="
+                    f"{self._format_optional_metric(best_high_fidelity_feasibility)}"
                 ),
                 f"vacuum_well={metrics.vacuum_well:.4f}",
                 f"constraints={'SATISFIED' if metrics.constraints_satisfied else 'VIOLATED'}",
             failure_reason=metrics.failure_reason,
             step_number=self._state.step_count,
             budget_remaining=self._state.budget_remaining,
+            best_low_fidelity_score=best_low_fidelity_score,
+            best_low_fidelity_feasibility=best_low_fidelity_feasibility,
+            best_high_fidelity_score=best_high_fidelity_score,
+            best_high_fidelity_feasibility=best_high_fidelity_feasibility,
             constraints_satisfied=metrics.constraints_satisfied,
             target_spec=TARGET_SPEC,
             reward=reward,
             return f"Restore-best failed during low-fidelity evaluation: {metrics.failure_reason}"
         return (
             "Restored the best-known design. "
+            f"Low-fidelity score={metrics.p1_score:.6f}, feasibility={metrics.p1_feasibility:.6f}."
         )
     def _initial_params(self, seed: int | None) -> LowDimBoundaryParams:
             and self._last_metrics.evaluation_failed
         )
+    def _initial_high_fidelity_score(self) -> float:
         if self._state.initial_high_fidelity_score is not None:
+            return self._state.initial_high_fidelity_score
         metrics = self._evaluate_params(self._state.initial_params, fidelity="high")
         self._state.initial_high_fidelity_score = metrics.p1_score
+        return metrics.p1_score
     def _refresh_best_high_fidelity_metrics(
         self,
         self._state.best_high_fidelity_feasibility = best_metrics.p1_feasibility
         return best_metrics
+    def _format_optional_metric(self, value: float | None) -> str:
+        if value is None:
+            return "n/a"
+        return f"{value:.6f}"
     def _update_best(self, params: LowDimBoundaryParams, metrics: EvaluationMetrics) -> None:
         if metrics.evaluation_failed:
             (1, metrics.p1_score) if metrics.constraints_satisfied else (0, -metrics.p1_feasibility)
         )
         best = (
+            (1, self._state.best_low_fidelity_score)
+            if self._state.best_low_fidelity_feasibility <= FEASIBILITY_TOLERANCE
+            else (0, -self._state.best_low_fidelity_feasibility)
         )
         if current > best:
             self._state.best_params = params
+            self._state.best_low_fidelity_score = metrics.p1_score
+            self._state.best_low_fidelity_feasibility = metrics.p1_feasibility

tests/test_repo_scaffold.py DELETED Viewed

@@ -1,9 +0,0 @@
-from server.environment import TASK, environment_status
-def test_environment_scaffold_status() -> None:
-    assert environment_status() == "scaffolded"
-def test_task_budget_is_fixed() -> None:
-    assert TASK["budget"] == 6