Reinforcement Learning
Transformers
English
robotics
vla
vision-language-action
openvla
omnivla
robot
qwen
dinov2
siglip
Instructions to use theguy21/openvla-micro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use theguy21/openvla-micro with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("theguy21/openvla-micro", dtype="auto") - Notebooks
- Google Colab
- Kaggle
OpenVLA-Micro
Small-vision VLA for CPU robot deployment β trained on LIBERO-90.
This repo is structured so it can be used as a plain source checkout, a Python package, or a Hugging Face model/code bundle. Large weight files are meant to stay out of GitHub history and be hosted separately when needed.
| Component | Detail |
|---|---|
| Vision | DINOv2-S (384d, 256 patches) + SigLIP-B/16 (768d, 196 patches) |
| Projector | ShimMLP(384β2048β8704) + ShimMLP(768β2048β8704) β Concat β Linear(8704β896) β GELU β Linear(896β896) |
| LLM | Qwen2.5-0.5B (896 hidden, 151k vocab, 256 extra tokens) |
| Action | 7-DoF, discretized into 256 bins per dim, minmax de-normalization |
| Trainable | 38.1M params (shim MLPs + LoRA rank 8 on projector) |
| Frozen | DINOv2, SigLIP, Qwen2.5 (all layers + lm_head + embed_tokens) |
| Training | 5000 steps, batch 64, LR 2e-4 w/ 200-step warmup β cosine to 1e-5 |
Inference
Install as a package:
pip install -e .
Run the CLI:
openvla-micro --checkpoint openvla-micro-merged.pt --image demo.jpg "pick up the red block"
from PIL import Image
from modeling_openvla_micro import OpenVLAMicro
model = OpenVLAMicro.from_pretrained("openvla-micro-merged.pt", device="cpu")
model.eval()
image = Image.open("demo.jpg").convert("RGB")
action = model.predict_action(image, "pick up the red block")
print(action) # [dx, dy, dz, droll, dpitch, dyaw, gripper]
The checkpoint argument can also be a Hugging Face repo ID if that repo contains openvla-micro-merged.pt or openvla-micro-distill.pt.
Data
Checkpoint includes normalization statistics for the following dataset:
libero_90: 7-DoF end-effector deltas, 256-bin tokenization
Citation
Built on openvla-mini and MiniVLA.
@misc{openvla-micro-2026,
author = {},
title = {OpenVLA-Micro: Small-vision VLA for CPU Robot Deployment},
year = {2026},
}