Collections
Discover the best community collections!
Collections including paper arxiv:2601.03233
-
Lightricks/LTX-2
Image-to-Video • Updated • 497k • • 752 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 86 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 29 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 86
-
Lightricks/LTX-2
Image-to-Video • Updated • 497k • • 752 -
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 86 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 44 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 43 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 29 -
LTX-2: Efficient Joint Audio-Visual Foundation Model
Paper • 2601.03233 • Published • 86
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43