pi0.5 Build Block Tower โ€” RLT Stage 1 (Encoder-Decoder)

RL Token encoder-decoder trained on top of a frozen pi0.5 baseline VLA for building a block tower. Implements Stage 1 of the RL Token method (Xu et al., 2026): a lightweight transformer encoder-decoder compresses VLA prefix embeddings into a single RL token via autoregressive reconstruction.

Experiment

  • Objective: Train RLT encoder-decoder to produce a compact RL token representation from frozen VLA prefix embeddings.
  • VLA backbone: Baseline 55k checkpoint (pravsels/pi05-build-block-tower-baseline), frozen (rl_vla_loss_weight=0.0).
  • Encoder-decoder: 2-layer transformer, 8 heads, dim=2048, SwiGLU FFN.
  • Loss: Autoregressive reconstruction of VLA prefix embeddings (L2).
  • Steps: 10,000

Config

  • Config name: pi05_rl_token_build_block_tower
  • Model: Pi0RLConfig (pi05=True, action_horizon=50, rl_vla_loss_weight=0.0)
  • Batch size: 36
  • Learning rate: 5e-5 cosine decay (1k warmup)
  • Optimizer: AdamW (gradient clip norm 1.0)
  • EMA decay: 0.999
  • Delta actions: enabled
  • State/action space: 7D joint-space

Dataset

  • villekuosmanen/build_block_tower (200 episodes, LeRobot v2.1)

Checkpoint Hashes

Verify integrity with find params -type f | sort | xargs cat | sha256sum.

Step Loss SHA-256
9,999 ~218 214f3473fba0339779276528ff618b3a88cd7df5bdb4c1560bf0c13459fe3454

W&B

Repo Structure

assets/                      # Norm stats for inference (from baseline)
checkpoints/9999/params/     # Model weights (params only)
README.md                    # This file
TRAINING_LOG.md              # Training log
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading