CL-From-Nothing/DeepSeek-R1-0528-Qwen3-8B_kukurasu_train_2samples Viewer • Updated 6 days ago • 20k • 5
CL-From-Nothing/DeepSeek-R1-0528-Qwen3-8B_kukurasu_train_2samples Viewer • Updated 6 days ago • 20k • 5
CL-From-Nothing/Nemotron-Cascade-8B_minesweeper_offline_5K_rewrite_offline_rewrite Viewer • Updated 10 days ago • 5.01k • 7
CL-From-Nothing/Nemotron-Cascade-8B_sudoku_reasoning_offline_rewrite Viewer • Updated 10 days ago • 14.1k • 10
CL-From-Nothing/Nemotron-Cascade-8B_minesweeper_offline_5K_rewrite_offline_rewrite Viewer • Updated 10 days ago • 5.01k • 7
CL-From-Nothing/Nemotron-Cascade-8B_sudoku_reasoning_offline_rewrite Viewer • Updated 10 days ago • 14.1k • 10
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 18
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 42