Commit History

GRPO v3 (filtered data + staged reward + stronger HP) - reasons 7/10, val 50.9 pct
bdb4dc1
verified

Salma204 commited on

GRPO v1 (thinking ON) - val 51.5 pct, replaces SFT v1_off
0100b42
verified

Salma204 commited on

Revert to v1 - best CI score 45%
aed4a8c
verified

Salma204 commited on

Phase 1 Wikipedia + Phase 2 SFT v3 - 55.2% val set, 8/10 validation samples
58b60cd
verified

Salma204 commited on

Delete model.safetensors.index.json
60a77d3
verified

lemfender commited on

Delete model-00002-of-00002.safetensors
a61dd5d
verified

lemfender commited on

Delete model-00001-of-00002.safetensors
5e4b4b1
verified

lemfender commited on

Upload folder using huggingface_hub
c889476
verified

lemfender commited on

Upload folder using huggingface_hub
fde83fe
verified

lemfender commited on

initial commit
cbd6917
verified

lemfender commited on