openvla-micro / README_github.md
theguy21's picture
Initial upload: base + distill checkpoints, model code, train_shim.py
dd9b4af verified
|
Raw
History Blame Contribute Delete
2.03 kB

OpenVLA-Micro

Small-vision VLA for CPU robot deployment β€” trained on LIBERO-90.

This repo is structured so it can be used as a plain source checkout, a Python package, or a Hugging Face model/code bundle. Large weight files are meant to stay out of GitHub history and be hosted separately when needed.

Component Detail
Vision DINOv2-S (384d, 256 patches) + SigLIP-B/16 (768d, 196 patches)
Projector ShimMLP(384β†’2048β†’8704) + ShimMLP(768β†’2048β†’8704) β†’ Concat β†’ Linear(8704β†’896) β†’ GELU β†’ Linear(896β†’896)
LLM Qwen2.5-0.5B (896 hidden, 151k vocab, 256 extra tokens)
Action 7-DoF, discretized into 256 bins per dim, minmax de-normalization
Trainable 38.1M params (shim MLPs + LoRA rank 8 on projector)
Frozen DINOv2, SigLIP, Qwen2.5 (all layers + lm_head + embed_tokens)
Training 5000 steps, batch 64, LR 2e-4 w/ 200-step warmup β†’ cosine to 1e-5

Inference

Install as a package:

pip install -e .

Run the CLI:

openvla-micro --checkpoint openvla-micro-merged.pt --image demo.jpg "pick up the red block"
from PIL import Image
from modeling_openvla_micro import OpenVLAMicro

model = OpenVLAMicro.from_pretrained("openvla-micro-merged.pt", device="cpu")
model.eval()

image = Image.open("demo.jpg").convert("RGB")
action = model.predict_action(image, "pick up the red block")
print(action)  # [dx, dy, dz, droll, dpitch, dyaw, gripper]

The checkpoint argument can also be a Hugging Face repo ID if that repo contains openvla-micro-merged.pt or openvla-micro-distill.pt.

Data

Checkpoint includes normalization statistics for the following dataset:

  • libero_90: 7-DoF end-effector deltas, 256-bin tokenization

Citation

Built on openvla-mini and MiniVLA.

@misc{openvla-micro-2026,
  author = {},
  title = {OpenVLA-Micro: Small-vision VLA for CPU Robot Deployment},
  year = {2026},
}