--- license: cc-by-nc-sa-4.0 tags: - robotics - vision-language-action-model - vision-language-model --- # Model Card for InternVLA-M1-Pretrain-RT-1-Bridge ## Description: **InternVLA-M1** is an open-source, end-to-end **vision–language–action (VLA) framework** for building and researching generalist robot policies. The checkpoints in this repository were trained on the RT-1 and Bridge datasets. - 🌐 Homepage: [InternVLA-M1 Project Page](https://internrobotics.github.io/internvla-m1.github.io/) - 💻 Codebase: [InternVLA-M1 GitHub Repo](https://github.com/InternRobotics/InternVLA-M1) ![image/png](https://github.com/InternRobotics/InternVLA-M1/raw/InternVLA-M1/assets/teaser.png) ## Quick Start ```python # ===== system2 demo ===== from InternVLA.model.framework.M1 import InternVLA_M1 from PIL import Image import requests from io import BytesIO def load_image_from_url(url: str) -> Image.Image: resp = requests.get(url, timeout=15) resp.raise_for_status() img = Image.open(BytesIO(resp.content)).convert("RGB") return img saved_model_path = "/PATH//checkpoints/steps_50000_pytorch_model.pt" internVLA_M1 = InternVLA_M1.from_pretrained( saved_model_path ) image_url="https://github.com/InternRobotics/InternVLA-M1/blob/InternVLA-M1/assets/table.jpeg" image = load_image_from_url(image_url) question = "give the bbox for the apple." response = internVLA_M1.chat_with_M1(image, question) # ===== predict_action demo ===== # constuct input: batch size = 1, two views view1 = load_image_from_url(image_url) view2 = view1.copy() batch_images = [[view1]] # List[List[PIL.Image]] instructions = ["pick up the apple and place it on the plate."] if torch.cuda.is_available(): internVLA_M1 = internVLA_M1.to("cuda") # action predict pred = internVLA_M1.predict_action( batch_images=batch_images, instructions=instructions, cfg_scale=1.5, use_ddim=True, num_ddim_steps=10, ) normalized_actions = pred["normalized_actions"] # [B, T, action_dim] ``` ## Citation ``` @misc{internvla2024, title = {InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy}, author = {InternVLA-M1 Contributors}, year = {2025}, booktitle={arXiv}, } ```