From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space Paper • 2604.14142 • Published Apr 15 • 30
OpenWorldLib: A Unified Codebase and Definition of Advanced World Models Paper • 2604.04707 • Published Apr 6 • 204
DataFlex: A Unified Framework for Data-Centric Dynamic Training of Large Language Models Paper • 2603.26164 • Published Mar 27 • 365
Advancing Block Diffusion Language Models for Test-Time Scaling Paper • 2602.09555 • Published Feb 10 • 4
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78
Advancing Block Diffusion Language Models for Test-Time Scaling Paper • 2602.09555 • Published Feb 10 • 4
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published Feb 9 • 5
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published Feb 9 • 5
OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration Paper • 2602.08344 • Published Feb 9 • 5
ToolSafe: Enhancing Tool Invocation Safety of LLM-based agents via Proactive Step-level Guardrail and Feedback Paper • 2601.10156 • Published Jan 15 • 26
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies Paper • 2512.19673 • Published Dec 22, 2025 • 66