Update README with detailed data pipeline and reproduction steps bab0696 verified NotoriousH2 commited on Mar 19
SFT + RS-SFT + GRPO (500 steps, beta=0.04). GSM8K ~46.2% cbc68ed verified NotoriousH2 commited on Mar 19