Commit History

docs: publish real SFT log_history and add provenance note
93f985b

vx7sh commited on

docs: publish real SFT log_history and add provenance note
1a3f787

vx7sh commited on

docs: add final blog post for submission
f7e8508

vx7sh commited on

docs: add notebook and concise proof screenshots
4828f2f

vx7sh commited on

docs: add training reproduction notebook link
aaf2446

vx7sh commited on

docs: add terminal proof screenshots
1be44ff
verified

v4xsh commited on

docs: add final training evidence plots
f12d4f1

vx7sh commited on

docs: finalize phase-aware training evidence
d36ec16

vx7sh commited on

fix(eval): add phase-aware hard and cascade action selection
1e79bac

vx7sh commited on

fix(training): allow grpo script to import app config
c83ba8e

vx7sh commited on

fix(grpo): keep reward rollouts aligned with prompt seeds
977c01a

vx7sh commited on

fix(grpo): align hard and cascade training with final eval
96b891c

vx7sh commited on

fix(grpo): gate hard-task rollouts on correct first action
72d562f

vx7sh commited on

fix(grpo): salvage malformed action prefixes for reward scoring
41f1550

vx7sh commited on

fix(grpo): prefix-anchored generation + continuous reward ladder
3a9c2b7

vx7sh commited on

fix(grpo): pre-apply chat template and tighten generation params
10ca05a

vx7sh commited on

fix(docker): use python:3.13-slim to fix build (audioop-lts requires >=3.13)
3e6e1ca

vx7sh commited on

fix(rl): cycle patch files in hard/cascade rollouts and trim default RL pool
ec08742

vx7sh commited on

docs(evidence): add multi-agent eval, long-horizon trace, and constrained eval logs
9731ebe

vx7sh commited on

feat(rl): GRPO continuation from SFT adapter with multi-step task rollouts
678d74f

vx7sh commited on

feat(curriculum): adaptive difficulty for telemetry, masking, and secondary failures
3928ed0

vx7sh commited on

test(reward): make reward audit run in-process when server is offline
26ea725

vx7sh commited on

fix(reward): preserve task pass thresholds and switch MER to efficiency multiplier
00c2406

vx7sh commited on

feat(env): add fleet_coordination multi-agent task
39931d5

vx7sh commited on

Harden submission evidence and reward integrity
3360325

vx7sh commited on

Document training evidence and improve model evaluation
7fb0542

vx7sh commited on

Add training evidence summary
186b8b1

vx7sh commited on

Document training results and artifacts
d9cb5e7

vx7sh commited on

Use constrained action scoring for model evaluation
f93b5c1

vx7sh commited on

Use multi-step SFT pipeline for reliable training evidence
775670f

vx7sh commited on

Add SFT warmup and model evaluation pipeline
73f137c

vx7sh commited on

Improve GRPO rewards and add model evaluation
37b9396

vx7sh commited on

Handle chat completions in GRPO reward parser
ade5d78

vx7sh commited on

Use vanilla TRL GRPO training stack
e530995

vx7sh commited on

Patch Unsloth GRPO text-only trainer attributes
0cebfea

vx7sh commited on

Fix Hugging Face Space and GRPO training config
53d7ae2

vx7sh commited on

Fix Hugging Face Space Docker config
b5314ec

vx7sh commited on

Add Hugging Face Spaces config
a3ef87a

vx7sh commited on

chore(grading): raise medium pass threshold to 0.7 and document audit notes
76e100b

vx7sh commited on

feat(eval): add seed variance reporting script
95b7a07

vx7sh commited on

feat(eval): add benchmark harness and reward integrity artifacts
ee2f27b

vx7sh commited on

test: add coalition and anti-gaming grader coverage
7d2be3f

vx7sh commited on

feat(env): add curriculum, challenge generation, coalition, and black-swan mechanics
edc6488

vx7sh commited on

chore(submission): add baseline snapshot and validation script
d3e5b9b

vx7sh commited on

chore(deps): pin full runtime requirements
f54a1f4

vx7sh commited on

feat(env): add robustness, uncertainty consensus, and partial observability
487d9c3

vx7sh commited on

Ship Round 2 manifest/docs, dashboard, and GRPO training pipeline
ff665de

vx7sh commited on

Add Fleet supervisor-worker delegation layer with /delegate API and tests
0f99e53

vx7sh commited on

Add cascade grader and regression tests for cascade task
ce9bc2c

vx7sh commited on

Track LLM token usage and send token_count in actions
319df08

vx7sh commited on