Reinforcement Learning
Transformers
English
robotics
vla
vision-language-action
openvla
omnivla
robot
qwen
dinov2
siglip
Instructions to use theguy21/openvla-micro with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use theguy21/openvla-micro with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("theguy21/openvla-micro", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| # OpenVLA-Micro | |
| Small-vision VLA for CPU robot deployment β trained on LIBERO-90. | |
| This repo is structured so it can be used as a plain source checkout, a Python package, or a Hugging Face model/code bundle. Large weight files are meant to stay out of GitHub history and be hosted separately when needed. | |
| | Component | Detail | | |
| |-----------|--------| | |
| | Vision | DINOv2-S (384d, 256 patches) + SigLIP-B/16 (768d, 196 patches) | | |
| | Projector | ShimMLP(384β2048β8704) + ShimMLP(768β2048β8704) β Concat β Linear(8704β896) β GELU β Linear(896β896) | | |
| | LLM | Qwen2.5-0.5B (896 hidden, 151k vocab, 256 extra tokens) | | |
| | Action | 7-DoF, discretized into 256 bins per dim, minmax de-normalization | | |
| | Trainable | 38.1M params (shim MLPs + LoRA rank 8 on projector) | | |
| | Frozen | DINOv2, SigLIP, Qwen2.5 (all layers + lm_head + embed_tokens) | | |
| | Training | 5000 steps, batch 64, LR 2e-4 w/ 200-step warmup β cosine to 1e-5 | | |
| ## Inference | |
| Install as a package: | |
| ```bash | |
| pip install -e . | |
| ``` | |
| Run the CLI: | |
| ```bash | |
| openvla-micro --checkpoint openvla-micro-merged.pt --image demo.jpg "pick up the red block" | |
| ``` | |
| ```python | |
| from PIL import Image | |
| from modeling_openvla_micro import OpenVLAMicro | |
| model = OpenVLAMicro.from_pretrained("openvla-micro-merged.pt", device="cpu") | |
| model.eval() | |
| image = Image.open("demo.jpg").convert("RGB") | |
| action = model.predict_action(image, "pick up the red block") | |
| print(action) # [dx, dy, dz, droll, dpitch, dyaw, gripper] | |
| ``` | |
| The checkpoint argument can also be a Hugging Face repo ID if that repo contains `openvla-micro-merged.pt` or `openvla-micro-distill.pt`. | |
| ## Data | |
| Checkpoint includes normalization statistics for the following dataset: | |
| - `libero_90`: 7-DoF end-effector deltas, 256-bin tokenization | |
| ## Citation | |
| Built on [openvla-mini](https://github.com/openvla/openvla-mini) and [MiniVLA](https://github.com/rail-berkeley/TinyVLA). | |
| ``` | |
| @misc{openvla-micro-2026, | |
| author = {}, | |
| title = {OpenVLA-Micro: Small-vision VLA for CPU Robot Deployment}, | |
| year = {2026}, | |
| } | |
| ``` | |