MiniVLA Fine-tuned on LAMPE Dataset (4DOF Actions)

This is a fine-tuned version of Stanford-ILIAD/minivla-vq-bridge-prismatic on the LAMPE dataset for 4-degree-of-freedom (4DOF) action prediction.

Model Details

  • Base Model: Stanford-ILIAD/minivla-vq-bridge-prismatic
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • LoRA Rank: 32
  • Training Steps: 3000
  • Final Training Loss: 0.4204
  • Action Dimensions: 4DOF (Base, Joint2, Joint3, Joint4)
  • Dataset: LAMPE Dataset (50 trajectories, 3410 transitions)

Training Configuration

  • Batch Size: 8
  • Learning Rate: 5e-4
  • Gradient Accumulation: 1
  • Max Steps: 3000
  • Save Steps: 1000
  • Image Augmentation: Enabled
  • Optimizer: AdamW

Performance

Training Metrics (Step 3000)

  • Loss: 0.4204
  • Action Accuracy: 79.17%
  • L1 Error: 0.0150

Validation Metrics (on training data)

  • Mean L1 Error: 0.5250
  • Per-Joint Errors:
    • Base: 1.3281 ± 0.6149
    • Joint2: 0.1392 ± 0.1120
    • Joint3: 0.1214 ± 0.0715
    • Joint4: 0.5114 ± 0.3704

Usage

Loading the Model (Recommended: Using LoRA Adapter)

Option 1: Using LoRA Adapter (Recommended - 68MB, faster to download)

from prismatic.models.load import load_vla
from peft import PeftModel
import torch

# Load base model
vla = load_vla(
    "Stanford-ILIAD/minivla-vq-bridge-prismatic",
    load_for_training=False,
)

# Load LoRA adapter weights
from huggingface_hub import hf_hub_download
adapter_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="adapter-weights/adapter_model.safetensors",
    local_dir="./adapters"
)
# Note: PEFT loading for Prismatic models may need custom handling
# Alternative: Load checkpoint state dict and extract LoRA weights

# Load dataset statistics
import json
from huggingface_hub import hf_hub_download
stats_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
    vla.norm_stats = json.load(f)

vla = vla.to("cuda:0")
vla.eval()

Option 2: Using Full Checkpoint (~5GB)

from prismatic.models.load import load_vla
import torch

# Load base model
vla = load_vla(
    "Stanford-ILIAD/minivla-vq-bridge-prismatic",
    load_for_training=False,
)

# Load fine-tuned checkpoint (large file!)
from huggingface_hub import hf_hub_download
checkpoint_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="checkpoints/step-003000-loss=0.4204.pt"
)
checkpoint = torch.load(checkpoint_path, map_location="cuda:0")
vla.load_state_dict(checkpoint["model"], strict=False)

# Load dataset statistics
import json
stats_path = hf_hub_download(
    repo_id="YOUR_USERNAME/minivla-lampe-4dof-finetuned",
    filename="dataset_statistics.json"
)
with open(stats_path, "r") as f:
    vla.norm_stats = json.load(f)

vla = vla.to("cuda:0")
vla.eval()

Inference

from PIL import Image

# Load image and provide instruction
image = Image.open("path/to/image.jpg")
instruction = "move left"

# Predict action
action = vla.predict_action(
    image=image,
    instruction=instruction,
    unnorm_key="lampe_dataset",
    do_sample=False
)

# Action is a 4DOF array: [Base, Joint2, Joint3, Joint4]
print(f"Predicted action: {action}")

Using the Validation Script

python validate.py

The validation script will:

  • Auto-detect the latest checkpoint
  • Load the model
  • Validate on sample data
  • Show interactive mode for frame-by-frame inspection

Deployment

The repository includes deployment scripts for running the model as a server and connecting from a Raspberry Pi client.

Server Deployment

Option 1: Using Pinggy Tunnel (Recommended - Works through most firewalls)

python3 start_server_with_pinggy.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000

This will:

  • Start the FastAPI server on port 8000
  • Create a Pinggy tunnel using SSH over port 443
  • Display a public URL (e.g., https://xxx.free.pinggy.link)

Option 2: Using Cloudflare Tunnel

python3 start_server_with_tunnel.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000

Option 3: Direct Server (for local testing)

python3 depoly_lampe.py   --checkpoint_dir "runs/minivla-vq-bridge-prismatic+lampe_dataset+b8+lr-0.0005+lora-r32+dropout-0.0--image_aug"   --base_model_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --port 8000   --host 0.0.0.0

Raspberry Pi Client

On your Raspberry Pi, install dependencies:

pip install requests opencv-python-headless numpy json-numpy

Then run the client:

python3 rpi_client.py   --server_url https://xxx.free.pinggy.link   --camera_id 0   --instruction "turn left"   --fps 5.0

The client will:

  • Capture images from the camera
  • Send them to the server for inference
  • Receive predicted actions
  • Execute actions on the robot (implement execute_action() method)

Testing the Deployment

Test local server:

python3 test_local_server.py --port 8000 --test_act

Test Cloudflare tunnel:

python3 test_cloudflare_connection.py --url https://xxx.trycloudflare.com --test_act

Test Pinggy tunnel:

python3 test_pinggy_connection.py --url https://xxx.free.pinggy.link --test_act

Deployment Files

  • depoly_lampe.py: Main deployment server (FastAPI)
  • rpi_client.py: Raspberry Pi client for camera capture and robot control
  • start_server_with_pinggy.py: Server + Pinggy tunnel automation
  • start_server_with_tunnel.py: Server + Cloudflare tunnel automation
  • test_local_server.py: Test script for local server
  • test_cloudflare_connection.py: Test script for Cloudflare tunnel
  • test_pinggy_connection.py: Test script for Pinggy tunnel

Files Included

  • checkpoints/: Model checkpoints saved during training
    • step-001000-loss=X.XXXX.pt
    • step-002000-loss=X.XXXX.pt
    • step-003000-loss=0.4204.pt (final checkpoint)
  • adapter-tmp/: LoRA adapter weights (if using LoRA)
  • dataset_statistics.json: Dataset statistics for action normalization
  • validate.py: Validation script for testing the model
  • finetune.py: Fine-tuning script (for reference)
  • Deployment Scripts:
    • depoly_lampe.py: FastAPI deployment server
    • rpi_client.py: Raspberry Pi client for robot control
    • start_server_with_pinggy.py: Server + Pinggy tunnel automation
    • start_server_with_tunnel.py: Server + Cloudflare tunnel automation
    • test_local_server.py: Local server testing
    • test_cloudflare_connection.py: Cloudflare tunnel testing
    • test_pinggy_connection.py: Pinggy tunnel testing

Fine-tuning Details

The model was fine-tuned using the following command:

python3 -m torch.distributed.run   --standalone   --nnodes 1   --nproc-per-node 1   vla-scripts/finetune.py   --vla_path "Stanford-ILIAD/minivla-vq-bridge-prismatic"   --data_root_dir "dataset"   --dataset_name "lampe_dataset"   --dataset_statistics_path "dataset/dataset_statistics.json"   --run_root_dir "runs"   --adapter_tmp_dir "adapter-tmp"   --lora_rank 32   --batch_size 8   --grad_accumulation_steps 1   --learning_rate 5e-4   --max_steps 3000   --save_steps 1000   --image_aug True   --wandb_mode "offline"

Dataset

The model was fine-tuned on the LAMPE dataset with:

  • Format: RLDS (RL Dataset Specification)
  • Action Encoding: JOINT_POS (4DOF)
  • Normalization: BOUNDS_Q99 (maps [q01, q99] to [-1, 1])
  • Number of Trajectories: 50
  • Number of Transitions: 3410

Codebase Modifications Required

To replicate this fine-tuning, you need to modify the OpenVLA-Mini codebase to support the lampe_dataset:

  1. prismatic/vla/datasets/rlds/oxe/configs.py: Add lampe_dataset entry with ActionEncoding.JOINT_POS (4DOF)
  2. prismatic/vla/datasets/rlds/oxe/transforms.py: Add lampe_dataset_transform function
  3. prismatic/vla/datasets/rlds/oxe/mixtures.py: Add lampe_dataset to OXE_NAMED_MIXTURES
  4. prismatic/vla/datasets/rlds/oxe/materialize.py: Modify to accept dataset_statistics parameter
  5. prismatic/vla/datasets/datasets.py: Modify RLDSDataset to pass dataset_statistics_dict

The finetune.py script already includes all other necessary modifications (VQ tokenizer replacement, DinoSigLIP handling, etc.).

Important Notes

  1. Action Tokenizer: The model uses a standard action tokenizer (not VQ) for 4DOF actions. The VQ tokenizer from the base model was replaced during fine-tuning.

  2. Image Transform: The model uses DinoSigLIP image transform which returns a dict with "dino" and "siglip" keys.

  3. Checkpoint Format: Checkpoints are saved in Prismatic format (not HuggingFace format). Use torch.load() to load the state dict.

  4. Resume Training: To resume training from a checkpoint:

    --resume_from_checkpoint "checkpoints/step-003000-loss=0.4204.pt"
    

Citation

If you use this model, please cite:

@misc{minivla-lampe-finetuned,
  title={MiniVLA Fine-tuned on LAMPE Dataset},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/minivla-lampe-4dof-finetuned}}
}

License

Apache 2.0 (same as base model)

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for kavinrajkrupsurge/minivla-lampe-4dof-finetuned

Finetuned
(1)
this model