Vision
updated
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper
• 2506.22434
• Published
• 10
VisionThink: Smart and Efficient Vision Language Model via Reinforcement
Learning
Paper
• 2507.13348
• Published
• 79
RewardDance: Reward Scaling in Visual Generation
Paper
• 2509.08826
• Published
• 73
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for
Multimodal LLMs
Paper
• 2510.18876
• Published
• 37
Back to Basics: Let Denoising Generative Models Denoise
Paper
• 2511.13720
• Published
• 69
Diversity Has Always Been There in Your Visual Autoregressive Models
Paper
• 2511.17074
• Published
• 8
The Image as Its Own Reward: Reinforcement Learning with Adversarial Reward for Image Generation
Paper
• 2511.20256
• Published
• 28
World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models
Paper
• 2511.22787
• Published
• 10
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields
Paper
• 2601.03252
• Published
• 102
DINO-SAE: DINO Spherical Autoencoder for High-Fidelity Image Reconstruction and Generation
Paper
• 2601.22904
• Published
• 15
Unified Personalized Reward Model for Vision Generation
Paper
• 2602.02380
• Published
• 20
Zooming without Zooming: Region-to-Image Distillation for Fine-Grained Multimodal Perception
Paper
• 2602.11858
• Published
• 59
DeepImageSearch: Benchmarking Multimodal Agents for Context-Aware Image Retrieval in Visual Histories
Paper
• 2602.10809
• Published
• 55
PyVision-RL: Forging Open Agentic Vision Models via RL
Paper
• 2602.20739
• Published
• 29
Reading, Not Thinking: Understanding and Bridging the Modality Gap When Text Becomes Pixels in Multimodal LLMs
Paper
• 2603.09095
• Published
• 23