X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction Paper • 2605.05765 • Published 6 days ago • 17
Improved Visual-Spatial Reasoning via R1-Zero-Like Training Paper • 2504.00883 • Published Apr 1, 2025 • 67
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1, 2025 • 96
MLCM: Multistep Consistency Distillation of Latent Diffusion Model Paper • 2406.05768 • Published Jun 9, 2024 • 13