Collections
Discover the best community collections!
Collections trending this week
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 1.33M • • 13.1k -
deepseek-ai/DeepSeek-R1-Zero
Text Generation • Updated • 5.3k • 948 -
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
Text Generation • Updated • 92.5k • • 752 -
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Text Generation • 33B • Updated • 959k • • 1.53k
-
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28 -
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 31 -
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Paper • 2501.17161 • Published • 124 -
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Paper • 2412.12098 • Published • 4
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 1.33M • • 13.1k -
deepseek-ai/DeepSeek-R1-Zero
Text Generation • Updated • 5.3k • 948 -
deepseek-ai/DeepSeek-R1-Distill-Llama-70B
Text Generation • Updated • 92.5k • • 752 -
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
Text Generation • 33B • Updated • 959k • • 1.53k