MemoryVLA Multi-Task LoRA Adapters โ€” libero-100 base (Plan B)

LoRA-only fine-tune of shihao1895/memvla-libero-100 (a fully-trained MemoryVLA with DiT-L action expert on LIBERO-100 data) for 7 robosuite_pomdp manipulation tasks.

This is the Plan-B recipe โ€” starting from a domain-matched MemoryVLA ckpt (libero == robosuite + MuJoCo + Panda + 7-DoF EEF delta + same gripper, 20 Hz) means LoRA only does small-domain adaptation rather than learning action generation from scratch. Replaces the prior Wr3ck1Am/memoryvla-multitask-lora recipe (which started from bare OpenVLA-7B and never got the DiT action heads to converge).

Recipe

  • Base: shihao1895/memvla-libero-100 (fully-trained MemoryVLA, 33.5 GB)
  • LoRA:
    • DiT-L attn: [q, v, out] r=16 ฮฑ=32
    • LLaMA: [q_proj, v_proj] r=8 ฮฑ=16
    • SigLIP vision: off
    • CogMem cross-attn: off
    • CogMem GateFusion: off
  • modules_to_save: [per_compr] only (robosuite_pomdp BottleneckSE)
  • Trainable: 8.62 M (0.103 % of 8.4 B)
  • Optimizer: AdamW lr 2e-4, constant scheduler, no warmup (matches official memvla recipe)
  • Hardware: 4ร— A100 80GB, FSDP full-shard, BF16 mixed precision + grad checkpointing
  • Global batch: 384 (per-device 96 ร— 4 GPU ร— accum 1)
  • Dataset: 330 k samples ร— 7 tasks, balanced sampling, image_aug=True
  • Save interval: every 500 steps

Tasks

Task Instruction Image key
fruit_swap swap magenta/blue blocks via empty colored region peg_focus_view_image
button_lightbulb press buttons Lโ†’R; turn off non-target; only target lit agentview_image
find_soda open drawer; if no soda close; else place on orange target drawerview_image
insert_peg try holes in random order without repetition until peg inserted peg_focus_view_image
lego_stacking try-stack to find stackable on top, non-stackable on bottom peg_focus_view_image
uncover_block lift covers without repetition until hidden red block found agentview_image
open_doors pull doors in random order to find openable one doorview_image

Dataset: harrywang01/image-tasks-all + standalone image-findsoda-fixed + image-legostacking-pegfocus.

Files

  • step-*.adapter โ€” LoRA-only ckpts (~150 MB each), saved every 500 steps
  • config.yaml โ€” full training config
  • dataset_statistics.json โ€” per-task action mean/std + min/max

Quick Use (rollout)

import torch, sys
sys.path.insert(0, "third_party/MemoryVLA")
from vla import load_vla
from memory_diffusion_policy.policy.memoryvla_lora import (
    MemoryVLALoRAConfig, apply_memoryvla_lora,
)

# 1. Load base
vla = load_vla(
    model_id_or_path="<path to memvla-libero-100.pt>",
    load_for_training=False,
    future_action_window_size=15, action_model_type="DiT-L",
    per_token_size=256, mem_length=16, retrieval_layers=2,
    fusion_type="gate", consolidate_type="tome",
)

# 2. Apply LoRA wrap (matches this checkpoint's recipe)
cfg = MemoryVLALoRAConfig(
    enabled=True, r=16, alpha=32.0, dropout=0.05,
    dit_attn_targets=["q", "v", "out"],
    lora_llama=True, llama_r=8, llama_alpha=16, llama_targets=["q_proj", "v_proj"],
    lora_cog_cross=False, lora_cog_gate=False, lora_vision=False,
    modules_to_save_list=["per_compr"],
)
apply_memoryvla_lora(vla, cfg, log=...)

# 3. Load adapter
adapter = torch.load("step-XXXXX-...adapter", map_location="cpu")
vla.load_state_dict(adapter["adapter"], strict=False)

# 4. Inference โ€” see MemoryVLA repo eval scripts

Get the base ckpt: huggingface-cli download shihao1895/memvla-libero-100.

Downloads last month
4
Video Preview
loading

Model tree for Wr3ck1Am/memoryvla-multitask-lora-libero-base

Adapter
(1)
this model