CL-From-Nothing/Nemotron-Cascade-8B_minesweeper_offline_5K_rewrite_offline_rewrite Viewer • Updated 1 day ago • 5.01k • 4
CL-From-Nothing/Nemotron-Cascade-8B_sudoku_reasoning_offline_rewrite Viewer • Updated 1 day ago • 14.1k • 3
CL-From-Nothing/Nemotron-Cascade-8B_minesweeper_offline_5K_rewrite_offline_rewrite Viewer • Updated 1 day ago • 5.01k • 4
CL-From-Nothing/Nemotron-Cascade-8B_sudoku_reasoning_offline_rewrite Viewer • Updated 1 day ago • 14.1k • 3
CL-From-Nothing/sft_training_sudoku_level_3_stitch_train_half_mask-parquet_nemotron-cascade-8b-mathrl_epoch_3 8B • Updated 2 days ago • 39
CL-From-Nothing/sft_training_sudoku_level_3_stitch_train_half_mask-parquet_nemotron-cascade-8b-mathrl_epoch_3 8B • Updated 2 days ago • 39
CL-From-Nothing/sudoku-stitch-Nemotron-Cascade-8B-MathRL-Student Viewer • Updated 5 days ago • 14.1k • 8
CL-From-Nothing/sudoku-stitch-Nemotron-Cascade-8B-MathRL-Student Viewer • Updated 5 days ago • 14.1k • 8
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 41