Generation Enhances Understanding in Unified Multimodal Models via Multi-Representation Generation Paper • 2601.21406 • Published 1 day ago • 3
Everything in Its Place: Benchmarking Spatial Intelligence of Text-to-Image Models Paper • 2601.20354 • Published 3 days ago • 96
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 1 day ago • 45
Less is More: Optimizing Function Calling for LLM Execution on Edge Devices Paper • 2411.15399 • Published Nov 23, 2024 • 1
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective 4 days ago • 37
iFSQ: Improving FSQ for Image Generation with 1 Line of Code Paper • 2601.17124 • Published 7 days ago • 30
Behavior Knowledge Merge in Reinforced Agentic Models Paper • 2601.13572 • Published 11 days ago • 23
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems Paper • 2601.11004 • Published 15 days ago • 30
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 8 days ago • 51
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 24
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 126