VLA-JEPA Fine-tuned with Unfolding Robotics dataset

Model Description

This model is a VLA-JEPA policy fine-tuned for bimanual shirt folding on the OpenArm robot.

Training Details

Slurm Scripts and training config for job submission on LANTA are already provided in the repository.

  • Cross-embodiment transfer: DROID (7D single-arm) → OpenArm (16D bimanual)
  • Re-initialized layers: action_encoder, action_decoder, state_encoder
  • Frozen backbone: Qwen3-VL-2B (inference only)
  • Trainable params: 155M / 2.3B total
  • Optimizer: AdamW, lr=3.75e-5, weight_decay=0.01
  • Schedule: Cosine decay with warmup
  • Batch size: 128
  • Steps: 40000
  • Precision: BF16
  • RABC: Enabled (kappa=0.0265, SARM progress scores)
  • Normalization: QUANTILES for state and action
  • Training time: ~48-49 hours on 4x LANTA GPU Node (4xA100 40GB SXM)

Loss Curve

loss_curve

Usage

from lerobot.policies import make_policy

policy = make_policy(pretrained_name_or_path="chalkp/vla-jepa-folding")
Downloads last month
43
Safetensors
Model size
2B params
Tensor type
F32
·
BF16
·
Video Preview
loading

Model tree for chalkp/vla-jepa-folding

Finetuned
(2)
this model

Dataset used to train chalkp/vla-jepa-folding