Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time Paper ⢠2606.15631 ⢠Published 13 days ago ⢠16
MuCo Collection MuCo: Multi-turn Contrastive Learning for Multimodal Embedding Model [CVPR 2026] ⢠4 items ⢠Updated Apr 13 ⢠2
Grounding World Simulation Models in a Real-World Metropolis Paper ⢠2603.15583 ⢠Published Mar 16 ⢠155
Exploring Conditions for Diffusion models in Robotic Control Paper ⢠2510.15510 ⢠Published Oct 17, 2025 ⢠40
Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs Paper ⢠2510.13251 ⢠Published Oct 15, 2025 ⢠14
Token Bottleneck: One Token to Remember Dynamics Paper ⢠2507.06543 ⢠Published Jul 9, 2025 ⢠20
HyperCLOVA X SEED Collection HyperCLOVA X SEED is NAVER's lightweight open-source lineup with a strong focus on Korean language performance ⢠6 items ⢠Updated Dec 24, 2025 ⢠42
ProLIP Collection Official ProLIP weights, Probabilistic Language-Image Pre-Training (ICLR 2025) ⢠7 items ⢠Updated Apr 18, 2025 ⢠10
MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation Paper ⢠2411.19067 ⢠Published Nov 28, 2024 ⢠8
Cosmos-Tokenizer1 Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/cosmos3 ⢠22 items ⢠Updated 15 days ago ⢠44
Unified Speech-Text Pretraining for Spoken Dialog Modeling Paper ⢠2402.05706 ⢠Published Feb 8, 2024 ⢠7
Rethinking Spatial Dimensions of Vision Transformers Paper ⢠2103.16302 ⢠Published Mar 30, 2021 ⢠2
RDNet Collection DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs [ECCV 2024] ⢠9 items ⢠Updated Oct 16, 2024 ⢠3
rope-vit Collection Rotary Position Embedding for Vision Transformer [ECCV 2024] ⢠22 items ⢠Updated Oct 16, 2024 ⢠5
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs Paper ⢠2403.19588 ⢠Published Mar 28, 2024 ⢠4
Rotary Position Embedding for Vision Transformer Paper ⢠2403.13298 ⢠Published Mar 20, 2024 ⢠6