VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published Aug 30, 2024 • 11
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting Paper • 2503.08668 • Published Aug 3, 2025
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering Paper • 2604.08209 • Published 2 days ago • 16
InCoder-32B-Thinking: Industrial Code World Model for Thinking Paper • 2604.03144 • Published 8 days ago • 222
HopChain: Multi-Hop Data Synthesis for Generalizable Vision-Language Reasoning Paper • 2603.17024 • Published 24 days ago • 108
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published May 27, 2025 • 16
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21, 2025 • 78
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories Paper • 2503.08625 • Published Mar 11, 2025 • 27