UnitreeG1_putawaytoolsV2_minmax_500step — LingBot-VA G1 post-trained transformer

Fine-tuned transformer for LingBot-VA on Unitree G1 (Dex1), task XiaoweiLinXL/pi05-unitree-g1-put-away-tools-v2.1: "Put the battery on the shelf labeled 'battery' and put the screwdriver on the shelf labeled 'Philips'."

Same data, same recipe as the rndchnk series — only difference: action normalization is MIN/MAX (not q01/q99 quantile). See "Why min/max" below.

Base: robbyant/lingbot-va-base
Post-training: 70 demos (43,851 frames), lr 1e-5, FDM v2 recipe — mutually-exclusive per-microstep regime (fdm_prob=0.5, lambda_fdm=1.0). Per-step randomized chunk_size ∈ {1..4} and window_size ∈ {4..64}.
4 GPUs × grad_accum=4 = effective batch 16, optimizer step 500 of a 5000-step schedule (very early — uploaded specifically for A/B testing against the matching rndchnk_500step quantile-normalized ckpt).
Action normalization: dataset min/max — every training target bounded strictly to [-1, +1]. (Codebase variable names are still q01/q99 because that's all the loader supports; the values stored there are min/max — drop-in replacement.)
This repo contains only transformer/ — vae/, text_encoder/, tokenizer/ are unchanged from robbyant/lingbot-va-base.

Why min/max (the v21 quantile series underperformed)

The earlier v21 5k training under quantile normalization had its right-arm joints overflow: R-wrist-roll absmax was 4.11, R-shoulder-roll 3.55, R-wrist-yaw 3.55. The model's bounded prediction range ([~-1.5, ~+1.5]) cannot match those targets → during deployment the model under-predicts the precise reach-extension moments → arm under-extends → misses the shelves. Min/max normalization bounds every target to ±1 (verified absmax = 1.0000 over all 43,851 training rows), eliminating out-of-range targets and restoring deployment quality.

Loss curves under min/max are higher than the quantile run by design — the quantile run's suspiciously-low video loss (0.0072 at step 5000) was the signature of fitting a compressed bulk distribution while ignoring unreachable extremes. The min/max run's loss (0.0347 video at step 5000) reflects the model now learning a wider, fully-reachable target range.

Assemble an eval-ready checkpoint

hf download robbyant/lingbot-va-base                              --local-dir lingbot-va-base
hf download EmbodyX/UnitreeG1_putawaytoolsV2_minmax_500step        --local-dir g1_pat_v2_mm_500_dl

mkdir -p g1_pat_v2_mm_500
ln -sf $(realpath g1_pat_v2_mm_500_dl/transformer)  g1_pat_v2_mm_500/transformer
ln -sf $(realpath lingbot-va-base/vae)              g1_pat_v2_mm_500/vae
ln -sf $(realpath lingbot-va-base/text_encoder)     g1_pat_v2_mm_500/text_encoder
ln -sf $(realpath lingbot-va-base/tokenizer)        g1_pat_v2_mm_500/tokenizer

Serve with CONFIG_NAME=g1_putawaytools_v21 MODEL_PATH=g1_pat_v2_mm_500. transformer/config.json has attn_mode: torch (inference-ready).

IMPORTANT — config must match training: the inference config's norm_stat must contain the same MIN/MAX values used during training (NOT the original quantile values). The va_g1_putawaytools_v21_cfg.py in the lingbot-va repo has been updated in lockstep — using the original quantile config at inference with this checkpoint would denormalize wrong.

Downloads last month: -

Video Preview

Robotics