WorldVLA Collection https://github.com/alibaba-damo-academy/WorldVLA ⢠8 items ⢠Updated Jun 25, 2025 ⢠1
JinaVDR (Visual Document Retrieval) Collection max. ~1000 images and OCR text included ⢠42 items ⢠Updated Jul 20, 2025 ⢠8
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper ⢠2505.04601 ⢠Published May 7, 2025 ⢠29
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Paper ⢠2505.17012 ⢠Published May 22, 2025 ⢠12
Training-Free Reasoning and Reflection in MLLMs Paper ⢠2505.16151 ⢠Published May 22, 2025 ⢠9
SpaceThinker Collection Test Time Compute for Quantitative Spatial Reasoning using synthetic reasoning traces from 3D scene graphs ⢠7 items ⢠Updated Oct 23, 2025 ⢠2
Cosmos-Transfer1 Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/cosmos-transfer25 ⢠6 items ⢠Updated 18 days ago ⢠30
Cosmos Collection ā ļø This collection is archived. š https://huggingface.co/collections/nvidia/nvidia-cosmos-2 ⢠31 items ⢠Updated 18 days ago ⢠299
Qwen2-VL Collection Vision-language model series based on Qwen2 ⢠16 items ⢠Updated Dec 31, 2025 ⢠227
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog ⢠10 items ⢠Updated Dec 23, 2025 ⢠86
LLM-Neo Collection Model hub for LLM-Neo, including Llama3.1-Neo-1B-100w and Minitron-4B-Depth-Neo-10w. ⢠3 items ⢠Updated Nov 20, 2024 ⢠6
VLM Judge Distillation Collection Distilling the 13B SpaceLLaVA VLM-as-a-Judge into a Florence-2 model to efficiently quality filter spatialVQA datasets like OpenSpaces ⢠4 items ⢠Updated Nov 14, 2024 ⢠1
DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second ⢠4 items ⢠Updated Aug 25, 2025 ⢠12
OpenSpaces VLMs Collection VLMs fine-tuned for spatial VQA using the OpenSpaces dataset. ⢠5 items ⢠Updated Mar 30, 2025 ⢠2
Molmo Collection Artifacts for open multimodal language models. ⢠5 items ⢠Updated Dec 23, 2025 ⢠309
SpaceVLMs Collection Features VLMs fine-tuned for enhanced spatial reasoning using a synthetic data pipeline similar to Spatial VLM. ⢠11 items ⢠Updated Feb 13, 2025 ⢠6
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper ⢠2401.12168 ⢠Published Jan 22, 2024 ⢠29
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning Paper ⢠2309.06440 ⢠Published Sep 12, 2023 ⢠10