Commit History

feat: prompt-comparison section + retry logic + scale-up to 15 tasks × {mini,nano} × 3 prompts
6ca9a91
verified

TheUnicat commited on

fix: re-judge experiment rollouts after credit-exhaustion; retry logic in batched + single-criterion judge
8410720
verified

TheUnicat commited on

feat: V₀=0.5 baseline + det ceilings, prompt-pill UI, 60 experiment rollouts
2cd2802
verified

TheUnicat commited on

feat: per-turn state value + turn score on Demo (state_v1 trajectories baked into 228 rollouts)
e7cdebc
verified

TheUnicat commited on

stream rollout messages per-turn via env_response hook (no more end-of-rollout replay)
59ea1c7
verified

TheUnicat commited on

purge stale standalone rollouts, mirror to current 228
97cbf70
verified

TheUnicat commited on

default judge → opus 4.7; expose configured judge via /api/health
9578447
verified

TheUnicat commited on

fix: move RUNS_DIR to /app/runs (HF persistent /data hides image-baked content)
5b90296
verified

TheUnicat commited on

swap to 228-rollout baseline + app.py cors regex
7da70ba
verified

TheUnicat commited on

cors: regex support
21b09dc
verified

TheUnicat commited on

fix: handle /app/app.py layout in HF Space
44699a2
verified

TheUnicat commited on

restore spivak PDF + 5 spivak tasks
1c77bbf
verified

TheUnicat commited on

deploy
b8dc460
verified

TheUnicat commited on

initial commit
aa460d6
verified

TheUnicat commited on