| license: cc-by-nc-sa-4.0 | |
| base_model: | |
| - Qwen/Qwen2.5-VL-3B-Instruct | |
| tags: | |
| - robotics | |
| - vision-language-action-model | |
| - vision-language-model | |
| library_name: transformers | |
| # Model Card for InternVLA-M1 | |
| ## Description: | |
| **InternVLA-M1** is an open-source, end-to-end **vision–language–action (VLA) framework** for building and researching generalist robot policies. The checkpoints in this repository were pretrained on the system2 dataset. | |
| - 🌐 Homepage: [InternVLA-M1 Project Page](https://internrobotics.github.io/internvla-m1.github.io/) | |
| - 💻 Codebase: [InternVLA-M1 GitHub Repo](https://github.com/InternRobotics/InternVLA-M1) | |
|  | |
| ## Citation | |
| ``` | |
| @misc{internvla2024, | |
| title = {InternVLA-M1: Latent Spatial Grounding for Instruction-Following Robotic Manipulation}, | |
| author = {InternVLA-M1 Contributors}, | |
| year = {2025}, | |
| booktitle={arXiv}, | |
| } | |
| ``` |