OpenTrackVLA πŸ€– πŸ‘€

Visual Navigation & Following for Everyone.

OpenTrackVLA is a fully open-source Vision-Language-Action (VLA) stack that turns monocular video and natural-language instructions into actionable, short-horizon waypoints.

While we explore massive backbones (8B/30B) internally, this repository is dedicated to democratizing embodied AI. We have intentionally released our highly efficient 0.6B checkpoint along with the full training pipeline.

πŸš€ Why OpenTrackVLA?

  • Fully Open Source: We release the model weights, inference code, and the training stackβ€”not just the inference wrapper.
  • Accessible: Designed to reproduce, fine-tune, and deploy with affordable compute .
  • Multimodal Control: Combines learned priors with visual input to guide real or simulated robots via simple text prompts.

Acknowledgment: OpenTrackVLA builds on the ideas introduced by the original TrackVLA project. Their partially-open release inspired this community-driven effort to keep the ecosystem open so researchers and developers can continue improving the stack together.

Demo In Action

The system processes video history and text instructions to predict future waypoints. Below are examples of the tracker in action:

Tracked clip 1 Tracked clip 2

This directory contains the HuggingFace-friendly export of the OpenTrackVLA planner.
Full project (code, datasets, training pipeline): https://github.com/om-ai-lab/OpenTrackVLA


Downloading from HuggingFace

Python

from transformers import AutoModel

model = AutoModel.from_pretrained("omlab/opentrackvla-qwen06b").eval()

Habitat evaluation using this export

OpenTrackVLA GitHub Repository
Full Project Documentation

trained_agent.py prefers HuggingFace weights when either env var is set:

  • HF_MODEL_DIR=/abs/path/to/open_trackvla_hf (already downloaded)
  • HF_MODEL_ID=omlab/opentrackvla-qwen06b (auto-download via huggingface_hub)

Example:

HF_MODEL_ID=omlab/opentrackvla-qwen06b bash eval.sh
Downloads last month
5
Safetensors
Model size
0.6B params
Tensor type
F32
Β·
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support