license: mit
tags:
- robotics
- vision-language-action
- lora
- memoryvla
base_model: openvla/openvla-7b-prismatic
MemoryVLA — RealPushMultiT LoRA fine-tune
LoRA-only checkpoints from a fine-tune of MemoryVLA (siglip-224px+mx-bridge,
backbone prism-dinosiglip-224px+7b, initialised from
openvla/openvla-7b-prismatic step-295000) on the harrywang01/RealPushMultiT
dataset (240 demos / 341 077 timesteps).
Contents
Each step-NNNNNN-epoch-EE-loss=L.LLLL.pt is a compact subset of the full
training checkpoint, containing only the 40.83 M trainable parameters:
- LoRA adapters
- LLaMA-2-7B (LLM backbone): r=8, α=16 on
q_proj,v_proj - SigLIP (vision): r=8, α=16 on fused
qkv - DiT action model: r=24, α=48 on attention
qkvand perceiver cross-attentionq/v - Cognitive memory bank retrieval cross-attn: r=24, α=48 on
q_proj/k_proj/v_proj(withlora_cog_gate=True)
- LLaMA-2-7B (LLM backbone): r=8, α=16 on
modules_to_save(full small modules, trained outright)action_model:x_embedder,t_embedder,z_embedder,final_layercog_mem_bank:timestep_encoderper_mem_bank: entire moduleper_compr(BottleneckSE): entire module
Each file is ~163 MB (fp32). The full original checkpoint was ~33.5 GB; the
frozen base weights (LLaMA + SigLIP + DINOv2 + projector + non-trainable
linears) are not redistributed and must be loaded from
openvla/openvla-7b-prismatic.
File layout matches the training-time save format:
state = torch.load(path, map_location="cpu", weights_only=False)
# state == {"model": {"per_compr": {...}, "cog_mem_bank": {...}, ...}}
To merge back into a freshly built MemoryVLA, load the full base checkpoint
first, then state_dict.update() each submodule with the matching keys from
this file.
Training
- per_device_bs=12 × grad_accum=4 × 2 GPUs → global_bs=96
- max_steps=60 000 (LR=3e-4, sqrt-scaled from 2e-4 @ bs=32; cosine decay after 3 000 warmup steps)
- save_interval=500
- Instruction (constant per episode): "Push the T-shaped block to visit three different target locations on the tabletop, without visiting the same target more than once"
Hardware: 2× H100 80GB SXM5 (NVLink).