FlowRL: Matching Reward Distributions for LLM Reasoning Paper • 2509.15207 • Published Sep 18, 2025 • 116
instruction-pretrain/ft-instruction-synthesizer-collection Viewer • Updated Mar 1, 2025 • 249k • 218 • 63
instruction-pretrain/general-instruction-augmented-corpora Preview • Updated Mar 1, 2025 • 3.41k • 20
Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20, 2024 • 96