X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests
Paper
•
2601.06953
•
Published
•
43
None defined yet.
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe