Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 15 days ago • 69
(1D) Ordered Tokens Enable Efficient Test-Time Search Paper • 2604.15453 • Published 26 days ago • 18
VGGRPO: Towards World-Consistent Video Generation with 4D Latent Reward Paper • 2603.26599 • Published Mar 27 • 65
HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming Paper • 2512.21338 • Published Dec 24, 2025 • 23
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory Paper • 2512.07802 • Published Dec 8, 2025 • 46
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published Dec 1, 2025 • 75
Multimodality Helps Few-shot 3D Point Cloud Semantic Segmentation Paper • 2410.22489 • Published Oct 29, 2024 • 1
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model Paper • 2503.16282 • Published Mar 20, 2025 • 6