view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Jul 9, 2025 • 772
view article Article Understanding Gemma 3n: How MatFormer Gives You Many Models in One Jun 26, 2025 • 49
Hunyuan3D 2.5: Towards High-Fidelity 3D Assets Generation with Ultimate Details Paper • 2506.16504 • Published Jun 19, 2025 • 31
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 251
BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation Paper • 2506.07530 • Published Jun 9, 2025 • 20
SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published Jun 2, 2025 • 148
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published Jun 3, 2025 • 53
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3, 2025 • 51
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning Paper • 2503.21620 • Published Mar 27, 2025 • 62
Eagle Collection Eagle is a family of frontier vision-language models with data-centric strategies. The model supports both HD image and long-context video input. • 15 items • Updated 1 day ago • 38
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control +2 Feb 4, 2025 • 188
view article Article Llama can now see and run on your device - welcome Llama 3.2 +5 Sep 25, 2024 • 191
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Jul 10, 2025 • 87
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 60