LiteFrame: Efficient Vision Encoders Unlock Frame Scaling in Video LLMs Paper • 2605.17260 • Published 18 days ago • 25
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 903
view article Article Streaming datasets: 100x More Efficient +3 andito, lhoestq, burtenshaw, pcuenq, merve • Oct 27, 2025 • 86
VideoNSA: Native Sparse Attention Scales Video Understanding Paper • 2510.02295 • Published Oct 2, 2025 • 10
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 224
STR-Match: Matching SpatioTemporal Relevance Score for Training-Free Video Editing Paper • 2506.22868 • Published Jun 28, 2025 • 5
Diffusion vs. Autoregressive Language Models: A Text Embedding Perspective Paper • 2505.15045 • Published May 21, 2025 • 56
Unofficial Mamba2 for Hf Transformers Collection Just the original weights converted to be compatible with transformers. • 5 items • Updated Oct 16, 2024 • 2
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published May 8, 2025 • 89
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4, 2025 • 66
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach Paper • 2502.03639 • Published Feb 5, 2025 • 9
TVG: A Training-free Transition Video Generation Method with Diffusion Models Paper • 2408.13413 • Published Aug 24, 2024 • 14