Spaces:

CreativeEngineer
/

fusion-design-lab

Running on CPU Upgrade

fusion-design-lab / docs /P1_MANUAL_PLAYTEST_LOG.md

feat: add verifier-native reward v1

2fccde8 8 days ago

3.07 kB

P1 Manual Playtest Log

Scope:

initial low-fidelity manual sanity check from the current default reset seed
focus: can a human read the observation, choose a plausible move, and see a legible reward change?

Episode A: repair toward feasibility

Start state:

seed: 0
params: aspect_ratio=3.6, elongation=1.4, rotational_transform=1.5, triangularity_scale=0.55
low-fidelity feasibility: 0.050653
low-fidelity score: 0.0
constraints satisfied: false

Step 1:

action: increase rotational_transform by medium
expectation: improve iota without changing triangularity much
result: feasibility stayed effectively flat at 0.050653
reward: -0.1
interpretation: legible but weak; this move alone does not solve the boundary issue

Step 2:

action: increase triangularity_scale by medium
expectation: push the boundary over the triangularity threshold
result: low-fidelity feasibility moved to 0.0
result: low-fidelity score moved to 0.291660
constraints satisfied: true
reward: +3.1533
interpretation: good reward behavior; the feasibility crossing was clearly positive and easy to understand

Episode B: move the wrong way

Start state:

Step 1:

action: decrease triangularity_scale by medium
expectation: worsen triangularity and move away from feasibility
result: feasibility worsened to 0.107113
reward: -0.3823
interpretation: good negative signal; the environment penalized an obviously bad move without needing a complicated reward term

Current conclusion:

At the time of this initial playtest, Reward V0 was legible on the low-fidelity repair path around the default reset seed
a real submit trace is now recorded; next manual validation is to extend beyond the initial 5-10 episode path and look for one clear exploit or ambiguity

Episode C: submit-side manual trace

Scope:

same seed-0 start state as episode A
actions: rotational_transform increase medium, triangularity_scale increase medium, elongation decrease small, submit

Step sequence:

Step 1: rotational_transform increase medium
- low-fidelity feasibility changed by 0.000000 (still infeasible)
- reward: -0.1000
Step 2: triangularity_scale increase medium
- crossed feasibility boundary
- low-fidelity feasibility moved from 0.050653 to 0.000000
- reward: +3.1533
Step 3: elongation decrease small
- low-fidelity feasibility moved to 0.000865
- reward: +0.2665
Step 4: submit (high-fidelity)
- final feasibility: 0.000865
- final high-fidelity score: 0.296059
- final reward: +2.0098
- final diagnostics evaluation_fidelity=high, constraints=SATISFIED, best_high_fidelity_score=0.296059

Artifact:

manual submit trace JSON Note: this is a historical submit-side artifact from the earlier Reward V0 / pre-telemetry contract surface. Keep it as supporting evidence for the old submit path, not as the current Reward V1 observation-format example.