Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Paper • 2606.19195 • Published 9 days ago • 135
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published May 26 • 144
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published Apr 29 • 112
3AM: Segment Anything with Geometric Consistency in Videos Paper • 2601.08831 • Published Jan 13 • 34
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 247