VideoNSA: Native Sparse Attention Scales Video Understanding Paper • 2510.02295 • Published Oct 2, 2025 • 10
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 212
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Dec 31, 2025 • 557
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing Paper • 2506.22868 • Published Jun 28, 2025 • 5
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21, 2025 • 55
Unofficial Mamba2 for Hf Transformers Collection Just the original weights converted to be compatible with transformers. • 5 items • Updated Oct 16, 2024 • 2
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published May 8, 2025 • 87
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4, 2025 • 66
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5, 2025 • 9