SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 123
mistral-community/Mixtral-8x22B-Instruct-v0.1-4bit Text Generation • 143B • Updated Jul 1, 2024 • 137 • 11