UnitreeG1_putawaytoolsV2_minmax_500step β€” LingBot-VA G1 post-trained transformer

Fine-tuned transformer for LingBot-VA on Unitree G1 (Dex1), task XiaoweiLinXL/pi05-unitree-g1-put-away-tools-v2.1: "Put the battery on the shelf labeled 'battery' and put the screwdriver on the shelf labeled 'Philips'."

Same data, same recipe as the rndchnk series β€” only difference: action normalization is MIN/MAX (not q01/q99 quantile). See "Why min/max" below.

  • Base: robbyant/lingbot-va-base
  • Post-training: 70 demos (43,851 frames), lr 1e-5, FDM v2 recipe β€” mutually-exclusive per-microstep regime (fdm_prob=0.5, lambda_fdm=1.0). Per-step randomized chunk_size ∈ {1..4} and window_size ∈ {4..64}.
  • 4 GPUs Γ— grad_accum=4 = effective batch 16, optimizer step 500 of a 5000-step schedule (very early β€” uploaded specifically for A/B testing against the matching rndchnk_500step quantile-normalized ckpt).
  • Action normalization: dataset min/max β€” every training target bounded strictly to [-1, +1]. (Codebase variable names are still q01/q99 because that's all the loader supports; the values stored there are min/max β€” drop-in replacement.)
  • This repo contains only transformer/ β€” vae/, text_encoder/, tokenizer/ are unchanged from robbyant/lingbot-va-base.

Why min/max (the v21 quantile series underperformed)

The earlier v21 5k training under quantile normalization had its right-arm joints overflow: R-wrist-roll absmax was 4.11, R-shoulder-roll 3.55, R-wrist-yaw 3.55. The model's bounded prediction range ([~-1.5, ~+1.5]) cannot match those targets β†’ during deployment the model under-predicts the precise reach-extension moments β†’ arm under-extends β†’ misses the shelves. Min/max normalization bounds every target to Β±1 (verified absmax = 1.0000 over all 43,851 training rows), eliminating out-of-range targets and restoring deployment quality.

Loss curves under min/max are higher than the quantile run by design β€” the quantile run's suspiciously-low video loss (0.0072 at step 5000) was the signature of fitting a compressed bulk distribution while ignoring unreachable extremes. The min/max run's loss (0.0347 video at step 5000) reflects the model now learning a wider, fully-reachable target range.

Assemble an eval-ready checkpoint

hf download robbyant/lingbot-va-base                              --local-dir lingbot-va-base
hf download EmbodyX/UnitreeG1_putawaytoolsV2_minmax_500step        --local-dir g1_pat_v2_mm_500_dl

mkdir -p g1_pat_v2_mm_500
ln -sf $(realpath g1_pat_v2_mm_500_dl/transformer)  g1_pat_v2_mm_500/transformer
ln -sf $(realpath lingbot-va-base/vae)              g1_pat_v2_mm_500/vae
ln -sf $(realpath lingbot-va-base/text_encoder)     g1_pat_v2_mm_500/text_encoder
ln -sf $(realpath lingbot-va-base/tokenizer)        g1_pat_v2_mm_500/tokenizer

Serve with CONFIG_NAME=g1_putawaytools_v21 MODEL_PATH=g1_pat_v2_mm_500. transformer/config.json has attn_mode: torch (inference-ready).

IMPORTANT β€” config must match training: the inference config's norm_stat must contain the same MIN/MAX values used during training (NOT the original quantile values). The va_g1_putawaytools_v21_cfg.py in the lingbot-va repo has been updated in lockstep β€” using the original quantile config at inference with this checkpoint would denormalize wrong.

Downloads last month
-
Video Preview
loading