How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("EmbodyX/UnitreeG1_putawaytoolsV2_minmax_2000step", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

UnitreeG1_putawaytoolsV2_minmax_2000step β€” LingBot-VA G1 post-trained transformer

Fine-tuned transformer for LingBot-VA on Unitree G1 (Dex1), task XiaoweiLinXL/pi05-unitree-g1-put-away-tools-v2.1: "Put the battery on the shelf labeled 'battery' and put the screwdriver on the shelf labeled 'Philips'."

Same data, same recipe as the rndchnk series β€” only difference: action normalization is MIN/MAX (not q01/q99 quantile). See "Why min/max" below.

  • Base: robbyant/lingbot-va-base
  • Post-training: 70 demos (43,851 frames), lr 1e-5, FDM v2 recipe β€” mutually-exclusive per-microstep regime (fdm_prob=0.5, lambda_fdm=1.0). Per-step randomized chunk_size ∈ {1..4} and window_size ∈ {4..64}.
  • 4 GPUs Γ— grad_accum=4 = effective batch 16, optimizer step 2000 of a 5000-step schedule (mid-training; the _500step ckpt deployed weakly so this checkpoint exists for the next deployment test).
  • Action normalization: dataset min/max β€” every training target bounded strictly to [-1, +1]. (Codebase variable names are still q01/q99 because that's all the loader supports; the values stored there are min/max β€” drop-in replacement.)
  • This repo contains only transformer/ β€” vae/, text_encoder/, tokenizer/ are unchanged from robbyant/lingbot-va-base.

Why min/max (the v21 quantile series underperformed)

The earlier v21 5k training under quantile normalization had its right-arm joints overflow: R-wrist-roll absmax was 4.11, R-shoulder-roll 3.55, R-wrist-yaw 3.55. The model's bounded prediction range ([~-1.5, ~+1.5]) cannot match those targets β†’ during deployment the model under-predicts the precise reach-extension moments β†’ arm under-extends β†’ misses the shelves. Min/max normalization bounds every target to Β±1 (verified absmax = 1.0000 over all 43,851 training rows), eliminating out-of-range targets and restoring deployment quality.

Assemble an eval-ready checkpoint

hf download robbyant/lingbot-va-base                              --local-dir lingbot-va-base
hf download EmbodyX/UnitreeG1_putawaytoolsV2_minmax_2000step       --local-dir g1_pat_v2_mm_2000_dl

mkdir -p g1_pat_v2_mm_2000
ln -sf $(realpath g1_pat_v2_mm_2000_dl/transformer)  g1_pat_v2_mm_2000/transformer
ln -sf $(realpath lingbot-va-base/vae)               g1_pat_v2_mm_2000/vae
ln -sf $(realpath lingbot-va-base/text_encoder)      g1_pat_v2_mm_2000/text_encoder
ln -sf $(realpath lingbot-va-base/tokenizer)         g1_pat_v2_mm_2000/tokenizer

Serve with CONFIG_NAME=g1_putawaytools_v21 MODEL_PATH=g1_pat_v2_mm_2000. transformer/config.json has attn_mode: torch (inference-ready).

IMPORTANT β€” config must match training: the inference config's norm_stat must contain the same MIN/MAX values used during training (NOT the original quantile values). The va_g1_putawaytools_v21_cfg.py in the lingbot-va repo has been updated in lockstep β€” using the original quantile config at inference with this checkpoint would denormalize wrong. Quick check: grep "1.178246855736" wan_va/configs/va_g1_putawaytools_v21_cfg.py should return a hit.

Downloads last month
-
Video Preview
loading