GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 11 days ago • 56
PhoneHarness: Harnessing Phone-Use Agents through Mixed GUI, CLI, and Tool Actions Paper • 2606.14832 • Published 15 days ago • 12
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 11 days ago • 56
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 11 days ago • 56
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning Paper • 2604.16029 • Published Apr 17 • 23
Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning Paper • 2604.16029 • Published Apr 17 • 23
The Station: An Open-World Environment for AI-Driven Discovery Paper • 2511.06309 • Published Nov 9, 2025 • 37
CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling Paper • 2510.04204 • Published Oct 5, 2025 • 21
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11, 2025 • 62
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models Paper • 2508.10751 • Published Aug 14, 2025 • 29
Re:Form -- Reducing Human Priors in Scalable Formal Software Verification with RL in LLMs: A Preliminary Study on Dafny Paper • 2507.16331 • Published Jul 22, 2025 • 22
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos Paper • 2507.05675 • Published Jul 8, 2025 • 27
ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper • 2506.18095 • Published Jun 22, 2025 • 67
QFFT, Question-Free Fine-Tuning for Adaptive Reasoning Paper • 2506.12860 • Published Jun 15, 2025 • 18