MiniVLA Fine-tuned on LAMPE Dataset (4DOF Actions)

This is a fine-tuned version of Stanford-ILIAD/minivla-vq-bridge-prismatic on the LAMPE dataset for 4-degree-of-freedom (4DOF) action prediction.

Model Details

Base Model: Stanford-ILIAD/minivla-vq-bridge-prismatic
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 32
Training Steps: 3000
Final Training Loss: 0.4204
Action Dimensions: 4DOF (Base, Joint2, Joint3, Joint4)
Dataset: LAMPE Dataset (50 trajectories, 3410 transitions)

Training Configuration

Batch Size: 8
Learning Rate: 5e-4
Gradient Accumulation: 1
Max Steps: 3000
Save Steps: 1000
Image Augmentation: Enabled
Optimizer: AdamW

Performance

Training Metrics (Step 3000)

Loss: 0.4204
Action Accuracy: 79.17%
L1 Error: 0.0150

Validation Metrics (on training data)

Mean L1 Error: 0.5250
Per-Joint Errors:
- Base: 1.3281 ± 0.6149
- Joint2: 0.1392 ± 0.1120
- Joint3: 0.1214 ± 0.0715
- Joint4: 0.5114 ± 0.3704

Usage

Loading the Model (Recommended: Using LoRA Adapter)

Option 1: Using LoRA Adapter (Recommended - 68MB, faster to download)

from prismatic.models.load import load_vla
from peft import PeftModel
import torch

# Load base model
vla = load_vla(
    "Stanford-ILIAD/minivla-vq-bridge-prismatic",
    load_for_training=False,
)

# Load LoRA adapter weights
from huggingface_hub import hf_hub_download
adapter_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="adapter-weights/adapter_model.safetensors",
    local_dir="./adapters"
)
# Note: PEFT loading for Prismatic models may need custom handling
# Alternative: Load checkpoint state dict and extract LoRA weights

# Load dataset statistics
import json
from huggingface_hub import hf_hub_download
stats_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
    vla.norm_stats = json.load(f)

vla = vla.to("cuda:0")
vla.eval()

Option 2: Using Full Checkpoint (~5GB)

from prismatic.models.load import load_vla
import torch

# Load base model
vla = load_vla(
    "Stanford-ILIAD/minivla-vq-bridge-prismatic",
    load_for_training=False,
)

# Load fine-tuned checkpoint (large file!)
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="checkpoints/step-003000-loss=0.4204.pt"
)
checkpoint = torch.load(checkpoint_path, map_location="cuda:0")
vla.load_state_dict(checkpoint["model"], strict=False)

# Load dataset statistics
import json
stats_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
    vla.norm_stats = json.load(f)

vla = vla.to("cuda:0")
vla.eval()

Inference

from PIL import Image

# Load image and provide instruction
image = Image.open("path/to/image.jpg")
instruction = "move left"

# Predict action
action = vla.predict_action(
    image=image,
    instruction=instruction,
    unnorm_key="lampe_dataset",
    do_sample=False
)

# Action is a 4DOF array: [Base, Joint2, Joint3, Joint4]
print(f"Predicted action: {action}")

Using the Validation Script

python validate.py

The validation script will:

Auto-detect the latest checkpoint
Load the model
Validate on sample data
Show interactive mode for frame-by-frame inspection

Deployment

The repository includes deployment scripts for running the model as a server and connecting from a Raspberry Pi client.

Server Deployment

Option 1: Using Pinggy Tunnel (Recommended - Works through most firewalls)

python3 start_server_with_pinggy.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000

This will:

Start the FastAPI server on port 8000
Create a Pinggy tunnel using SSH over port 443
Display a public URL (e.g., https://xxx.free.pinggy.link)

Option 2: Using Cloudflare Tunnel

python3 start_server_with_tunnel.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000

Option 3: Direct Server (for local testing)

python3 depoly_lampe.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000   --host 0.0.0.0

Raspberry Pi Client

On your Raspberry Pi, install dependencies:

pip install requests opencv-python-headless numpy json-numpy

Then run the client:

python3 rpi_client.py   --server_url https://xxx.free.pinggy.link   --camera_id 0   --instruction "turn left"   --fps 5.0

The client will:

Capture images from the camera
Send them to the server for inference
Receive predicted actions
Execute actions on the robot (implement execute_action() method)

Testing the Deployment

Test local server:

python3 test_local_server.py --port 8000 --test_act

Test Cloudflare tunnel:

python3 test_cloudflare_connection.py --url https://xxx.trycloudflare.com --test_act

Test Pinggy tunnel:

python3 test_pinggy_connection.py --url https://xxx.free.pinggy.link --test_act

Deployment Files

depoly_lampe.py: Main deployment server (FastAPI)
rpi_client.py: Raspberry Pi client for camera capture and robot control
start_server_with_pinggy.py: Server + Pinggy tunnel automation
start_server_with_tunnel.py: Server + Cloudflare tunnel automation
test_local_server.py: Test script for local server
test_cloudflare_connection.py: Test script for Cloudflare tunnel
test_pinggy_connection.py: Test script for Pinggy tunnel

Files Included

checkpoints/: Model checkpoints saved during training
- step-001000-loss=X.XXXX.pt
- step-002000-loss=X.XXXX.pt
- step-003000-loss=0.4204.pt (final checkpoint)
adapter-tmp/: LoRA adapter weights (if using LoRA)
dataset_statistics.json: Dataset statistics for action normalization
validate.py: Validation script for testing the model
finetune.py: Fine-tuning script (for reference)
Deployment Scripts:
- depoly_lampe.py: FastAPI deployment server
- rpi_client.py: Raspberry Pi client for robot control
- start_server_with_pinggy.py: Server + Pinggy tunnel automation
- start_server_with_tunnel.py: Server + Cloudflare tunnel automation
- test_local_server.py: Local server testing
- test_cloudflare_connection.py: Cloudflare tunnel testing
- test_pinggy_connection.py: Pinggy tunnel testing

Fine-tuning Details

The model was fine-tuned using the following command:

python3 -m torch.distributed.run   --standalone   --nnodes 1   --nproc-per-node 1   vla-scripts/finetune.py   --vla_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --data_root_dir "dataset"   --dataset_name "lampe_dataset"   --dataset_statistics_path "dataset/dataset_statistics.json"   --run_root_dir "runs"   --adapter_tmp_dir "adapter-tmp"   --lora_rank 32   --batch_size 8   --grad_accumulation_steps 1   --learning_rate 5e-4   --max_steps 3000   --save_steps 1000   --image_aug True   --wandb_mode "offline"

Dataset

The model was fine-tuned on the LAMPE dataset with:

Format: RLDS (RL Dataset Specification)
Action Encoding: JOINT_POS (4DOF)
Normalization: BOUNDS_Q99 (maps [q01, q99] to [-1, 1])
Number of Trajectories: 50
Number of Transitions: 3410

Codebase Modifications Required

To replicate this fine-tuning, you need to modify the OpenVLA-Mini codebase to support the lampe_dataset:

prismatic/vla/datasets/rlds/oxe/configs.py: Add lampe_dataset entry with ActionEncoding.JOINT_POS (4DOF)
prismatic/vla/datasets/rlds/oxe/transforms.py: Add lampe_dataset_transform function
prismatic/vla/datasets/rlds/oxe/mixtures.py: Add lampe_dataset to OXE_NAMED_MIXTURES
prismatic/vla/datasets/rlds/oxe/materialize.py: Modify to accept dataset_statistics parameter
prismatic/vla/datasets/datasets.py: Modify RLDSDataset to pass dataset_statistics_dict

The finetune.py script already includes all other necessary modifications (VQ tokenizer replacement, DinoSigLIP handling, etc.).

Important Notes

Action Tokenizer: The model uses a standard action tokenizer (not VQ) for 4DOF actions. The VQ tokenizer from the base model was replaced during fine-tuning.
Image Transform: The model uses DinoSigLIP image transform which returns a dict with "dino" and "siglip" keys.
Checkpoint Format: Checkpoints are saved in Prismatic format (not HuggingFace format). Use torch.load() to load the state dict.

Resume Training: To resume training from a checkpoint:

--resume_from_checkpoint "checkpoints/step-003000-loss=0.4204.pt"

Citation

If you use this model, please cite:

@misc{minivla-lampe-finetuned,
  title={MiniVLA Fine-tuned on LAMPE Dataset},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/minivla-lampe-4dof-finetuned}}
}

License

Apache 2.0 (same as base model)

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Model tree for kavinrajkrupsurge/minivla-lampe-4dof-finetuned

Base model

Stanford-ILIAD/minivla-vq-bridge-prismatic

Finetuned

(1)

this model