MultiBind: A Benchmark for Attribute Misbinding in Multi-Subject Generation Paper • 2603.21937 • Published 3 days ago • 6
ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model Paper • 2603.22281 • Published 3 days ago • 12
Attend Before Attention: Efficient and Scalable Video Understanding via Autoregressive Gazing Paper • 2603.12254 • Published 14 days ago • 20
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published 3 days ago • 117
Controlled Self-Evolution for Algorithmic Code Optimization Paper • 2601.07348 • Published Jan 12 • 116
KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions Paper • 2601.04745 • Published Jan 8 • 59
MemGovern: Enhancing Code Agents through Learning from Governed Human Experiences Paper • 2601.06789 • Published Jan 11 • 80
FLUX-Reason-6M & PRISM-Bench: A Million-Scale Text-to-Image Reasoning Dataset and Comprehensive Benchmark Paper • 2509.09680 • Published Sep 11, 2025 • 44
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28, 2025 • 49
EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs Paper • 2509.09174 • Published Sep 11, 2025 • 62
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 79
Towards a Unified View of Large Language Model Post-Training Paper • 2509.04419 • Published Sep 4, 2025 • 76
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs Paper • 2510.09201 • Published Oct 10, 2025 • 50
Every Attention Matters: An Efficient Hybrid Architecture for Long-Context Reasoning Paper • 2510.19338 • Published Oct 22, 2025 • 117
The End of Manual Decoding: Towards Truly End-to-End Language Models Paper • 2510.26697 • Published Oct 30, 2025 • 119
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 127