MiniVLA Fine-tuned on LAMPE Dataset (4DOF Actions)
This is a fine-tuned version of Stanford-ILIAD/minivla-vq-bridge-prismatic on the LAMPE dataset for 4-degree-of-freedom (4DOF) action prediction.
Model Details
- Base Model:
Stanford-ILIAD/minivla-vq-bridge-prismatic - Fine-tuning Method: LoRA (Low-Rank Adaptation)
- LoRA Rank: 32
- Training Steps: 3000
- Final Training Loss: 0.4204
- Action Dimensions: 4DOF (Base, Joint2, Joint3, Joint4)
- Dataset: LAMPE Dataset (50 trajectories, 3410 transitions)
Training Configuration
- Batch Size: 8
- Learning Rate: 5e-4
- Gradient Accumulation: 1
- Max Steps: 3000
- Save Steps: 1000
- Image Augmentation: Enabled
- Optimizer: AdamW
Performance
Training Metrics (Step 3000)
- Loss: 0.4204
- Action Accuracy: 79.17%
- L1 Error: 0.0150
Validation Metrics (on training data)
- Mean L1 Error: 0.5250
- Per-Joint Errors:
- Base: 1.3281 ± 0.6149
- Joint2: 0.1392 ± 0.1120
- Joint3: 0.1214 ± 0.0715
- Joint4: 0.5114 ± 0.3704
Usage
Loading the Model (Recommended: Using LoRA Adapter)
Option 1: Using LoRA Adapter (Recommended - 68MB, faster to download)
from prismatic.models.load import load_vla
from peft import PeftModel
import torch
# Load base model
vla = load_vla(
"Stanford-ILIAD/minivla-vq-bridge-prismatic",
load_for_training=False,
)
# Load LoRA adapter weights
from huggingface_hub import hf_hub_download
adapter_path = hf_hub_download(
repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
filename="adapter-weights/adapter_model.safetensors",
local_dir="./adapters"
)
# Note: PEFT loading for Prismatic models may need custom handling
# Alternative: Load checkpoint state dict and extract LoRA weights
# Load dataset statistics
import json
from huggingface_hub import hf_hub_download
stats_path = hf_hub_download(
repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
vla.norm_stats = json.load(f)
vla = vla.to("cuda:0")
vla.eval()
Option 2: Using Full Checkpoint (~5GB)
from prismatic.models.load import load_vla
import torch
# Load base model
vla = load_vla(
"Stanford-ILIAD/minivla-vq-bridge-prismatic",
load_for_training=False,
)
# Load fine-tuned checkpoint (large file!)
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
filename="checkpoints/step-003000-loss=0.4204.pt"
)
checkpoint = torch.load(checkpoint_path, map_location="cuda:0")
vla.load_state_dict(checkpoint["model"], strict=False)
# Load dataset statistics
import json
stats_path = hf_hub_download(
repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
vla.norm_stats = json.load(f)
vla = vla.to("cuda:0")
vla.eval()
Inference
from PIL import Image
# Load image and provide instruction
image = Image.open("path/to/image.jpg")
instruction = "move left"
# Predict action
action = vla.predict_action(
image=image,
instruction=instruction,
unnorm_key="lampe_dataset",
do_sample=False
)
# Action is a 4DOF array: [Base, Joint2, Joint3, Joint4]
print(f"Predicted action: {action}")
Using the Validation Script
python validate.py
The validation script will:
- Auto-detect the latest checkpoint
- Load the model
- Validate on sample data
- Show interactive mode for frame-by-frame inspection
Deployment
The repository includes deployment scripts for running the model as a server and connecting from a Raspberry Pi client.
Server Deployment
Option 1: Using Pinggy Tunnel (Recommended - Works through most firewalls)
python3 start_server_with_pinggy.py --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug" --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic" --port 8000
This will:
- Start the FastAPI server on port 8000
- Create a Pinggy tunnel using SSH over port 443
- Display a public URL (e.g.,
https://xxx.free.pinggy.link)
Option 2: Using Cloudflare Tunnel
python3 start_server_with_tunnel.py --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug" --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic" --port 8000
Option 3: Direct Server (for local testing)
python3 depoly_lampe.py --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug" --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic" --port 8000 --host 0.0.0.0
Raspberry Pi Client
On your Raspberry Pi, install dependencies:
pip install requests opencv-python-headless numpy json-numpy
Then run the client:
python3 rpi_client.py --server_url https://xxx.free.pinggy.link --camera_id 0 --instruction "turn left" --fps 5.0
The client will:
- Capture images from the camera
- Send them to the server for inference
- Receive predicted actions
- Execute actions on the robot (implement
execute_action()method)
Testing the Deployment
Test local server:
python3 test_local_server.py --port 8000 --test_act
Test Cloudflare tunnel:
python3 test_cloudflare_connection.py --url https://xxx.trycloudflare.com --test_act
Test Pinggy tunnel:
python3 test_pinggy_connection.py --url https://xxx.free.pinggy.link --test_act
Deployment Files
depoly_lampe.py: Main deployment server (FastAPI)rpi_client.py: Raspberry Pi client for camera capture and robot controlstart_server_with_pinggy.py: Server + Pinggy tunnel automationstart_server_with_tunnel.py: Server + Cloudflare tunnel automationtest_local_server.py: Test script for local servertest_cloudflare_connection.py: Test script for Cloudflare tunneltest_pinggy_connection.py: Test script for Pinggy tunnel
Files Included
checkpoints/: Model checkpoints saved during trainingstep-001000-loss=X.XXXX.ptstep-002000-loss=X.XXXX.ptstep-003000-loss=0.4204.pt(final checkpoint)
adapter-tmp/: LoRA adapter weights (if using LoRA)dataset_statistics.json: Dataset statistics for action normalizationvalidate.py: Validation script for testing the modelfinetune.py: Fine-tuning script (for reference)- Deployment Scripts:
depoly_lampe.py: FastAPI deployment serverrpi_client.py: Raspberry Pi client for robot controlstart_server_with_pinggy.py: Server + Pinggy tunnel automationstart_server_with_tunnel.py: Server + Cloudflare tunnel automationtest_local_server.py: Local server testingtest_cloudflare_connection.py: Cloudflare tunnel testingtest_pinggy_connection.py: Pinggy tunnel testing
Fine-tuning Details
The model was fine-tuned using the following command:
python3 -m torch.distributed.run --standalone --nnodes 1 --nproc-per-node 1 vla-scripts/finetune.py --vla_path "Stanford-ILIAD/minivla-vq-bridge-prismatic" --data_root_dir "dataset" --dataset_name "lampe_dataset" --dataset_statistics_path "dataset/dataset_statistics.json" --run_root_dir "runs" --adapter_tmp_dir "adapter-tmp" --lora_rank 32 --batch_size 8 --grad_accumulation_steps 1 --learning_rate 5e-4 --max_steps 3000 --save_steps 1000 --image_aug True --wandb_mode "offline"
Dataset
The model was fine-tuned on the LAMPE dataset with:
- Format: RLDS (RL Dataset Specification)
- Action Encoding: JOINT_POS (4DOF)
- Normalization: BOUNDS_Q99 (maps [q01, q99] to [-1, 1])
- Number of Trajectories: 50
- Number of Transitions: 3410
Codebase Modifications Required
To replicate this fine-tuning, you need to modify the OpenVLA-Mini codebase to support the lampe_dataset:
prismatic/vla/datasets/rlds/oxe/configs.py: Addlampe_datasetentry withActionEncoding.JOINT_POS(4DOF)prismatic/vla/datasets/rlds/oxe/transforms.py: Addlampe_dataset_transformfunctionprismatic/vla/datasets/rlds/oxe/mixtures.py: Addlampe_datasettoOXE_NAMED_MIXTURESprismatic/vla/datasets/rlds/oxe/materialize.py: Modify to acceptdataset_statisticsparameterprismatic/vla/datasets/datasets.py: ModifyRLDSDatasetto passdataset_statistics_dict
The finetune.py script already includes all other necessary modifications (VQ tokenizer replacement, DinoSigLIP handling, etc.).
Important Notes
Action Tokenizer: The model uses a standard action tokenizer (not VQ) for 4DOF actions. The VQ tokenizer from the base model was replaced during fine-tuning.
Image Transform: The model uses DinoSigLIP image transform which returns a dict with "dino" and "siglip" keys.
Checkpoint Format: Checkpoints are saved in Prismatic format (not HuggingFace format). Use
torch.load()to load the state dict.Resume Training: To resume training from a checkpoint:
--resume_from_checkpoint "checkpoints/step-003000-loss=0.4204.pt"
Citation
If you use this model, please cite:
@misc{minivla-lampe-finetuned,
title={MiniVLA Fine-tuned on LAMPE Dataset},
author={Your Name},
year={2025},
howpublished={\url{https://huggingface.co/YOUR_USERNAME/minivla-lampe-4dof-finetuned}}
}
License
Apache 2.0 (same as base model)
Model tree for kavinrajkrupsurge/minivla-lampe-4dof-finetuned
Base model
Stanford-ILIAD/minivla-vq-bridge-prismatic