SystemTruth / notebooks

Commit History

finalization: blog + README + execution rewrite, drop 3B + openclaw shim
0058c94

Madhav189 commited on

notebook: broaden Cell 10 SFT-only load except clause
c0ea16e

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: fix GRPO prompts β€” apply chat template before passing to TRL
215c8ad

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: install openenv-core (and other repo deps) in Cell 0
091927e

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: switch Cell 0 to Unsloth's official uv pattern + harden Cells 10-12
b2632ec

Madhav189 Claude Opus 4.7 (1M context) commited on

training: precision rewrite to prevent SFT collapse + GRPO variance starvation
42ab8f0

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: use Qwen2.5-3B-Instruct (has chat template) not the base model
2214e76

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: align with Unsloth-recommended TRL 0.22.2
65d2643

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: show pip install progress (drop -q flag)
93e560c

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: rewrite for robustness and resumability
17dba36

Madhav189 Claude Opus 4.7 (1M context) commited on

notebook: fix TRL >=1.0 API breakage
32e423b

Madhav189 Claude Opus 4.7 (1M context) commited on

hackathon sprint: grader collapse + coliseum rename + training pipeline
c9baa73

Madhav189 Claude Opus 4.7 (1M context) commited on

data: 30 expert episodes + training notebook
f337985

Madhav189 commited on

Tier-escalating sre-gym v3.0 β€” compute β†’ horizon β†’ realism
2733f3f

Madhav189 Claude Opus 4.7 (1M context) commited on