Vision-aligned Latent Reasoning for Multi-modal Large Language Model Paper • 2602.04476 • Published 8 days ago • 15
Contrastive Representation Regularization for Vision-Language-Action Models Paper • 2510.01711 • Published Oct 2, 2025 • 4
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22, 2024 • 40