g1 put_away_tools v2.1 FDM-v2 transformer @ step 5000 (final, MIN/MAX norm)

Browse files

Files changed (3) hide show

README.md +70 -0
transformer/config.json +23 -0
transformer/diffusion_pytorch_model.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,70 @@

+---
+license: apache-2.0
+tags:
+- robotics
+- lingbot-va
+- unitree-g1
+- world-model
+---
+# UnitreeG1_putawaytoolsV2_minmax_5000step — LingBot-VA G1 post-trained transformer
+Fine-tuned `transformer` for LingBot-VA on Unitree G1 (Dex1), task
+`XiaoweiLinXL/pi05-unitree-g1-put-away-tools-v2.1`:
+*"Put the battery on the shelf labeled 'battery' and put the screwdriver
+on the shelf labeled 'Philips'."*
+**Same data, same recipe as the `rndchnk` series — only difference: action
+normalization is MIN/MAX (not q01/q99 quantile).** See "Why min/max" below.
+- Base: `robbyant/lingbot-va-base`
+- Post-training: 70 demos (43,851 frames), lr 1e-5, **FDM v2 recipe** —
+  mutually-exclusive per-microstep regime (`fdm_prob=0.5`, `lambda_fdm=1.0`).
+  Per-step randomized chunk_size ∈ {1..4} and window_size ∈ {4..64}.
+- 4 GPUs × `grad_accum=4` = effective batch 16, optimizer **step 5000** of a
+  5000-step schedule (final, fully trained).
+- Final training losses: video=0.0347, action=0.00082, fdm=0.0347 — higher
+  than the quantile run's video=0.0072 (which was overfit on a compressed
+  bulk distribution); this is the *healthy* loss level for a bounded target
+  range.
+- **Action normalization: dataset min/max** — every training target bounded
+  strictly to [-1, +1]. (Codebase variable names are still `q01`/`q99`
+  because that's all the loader supports; the values stored there are
+  min/max — drop-in replacement.)
+- This repo contains **only `transformer/`** — `vae/`, `text_encoder/`,
+  `tokenizer/` are unchanged from `robbyant/lingbot-va-base`.
+## Why min/max (the v21 quantile series underperformed)
+The earlier v21 5k training under quantile normalization had its right-arm
+joints overflow: R-wrist-roll absmax was **4.11**, R-shoulder-roll 3.55,
+R-wrist-yaw 3.55. The model's bounded prediction range
+(`[~-1.5, ~+1.5]`) cannot match those targets → during deployment the model
+under-predicts the precise reach-extension moments → arm under-extends →
+misses the shelves. Min/max normalization bounds every target to ±1
+(verified absmax = 1.0000 over all 43,851 training rows), eliminating
+out-of-range targets and restoring deployment quality.
+## Assemble an eval-ready checkpoint
+```bash
+hf download robbyant/lingbot-va-base                              --local-dir lingbot-va-base
+hf download EmbodyX/UnitreeG1_putawaytoolsV2_minmax_5000step       --local-dir g1_pat_v2_mm_5000_dl
+mkdir -p g1_pat_v2_mm_5000
+ln -sf $(realpath g1_pat_v2_mm_5000_dl/transformer)  g1_pat_v2_mm_5000/transformer
+ln -sf $(realpath lingbot-va-base/vae)               g1_pat_v2_mm_5000/vae
+ln -sf $(realpath lingbot-va-base/text_encoder)      g1_pat_v2_mm_5000/text_encoder
+ln -sf $(realpath lingbot-va-base/tokenizer)         g1_pat_v2_mm_5000/tokenizer
+```
+Serve with `CONFIG_NAME=g1_putawaytools_v21 MODEL_PATH=g1_pat_v2_mm_5000`.
+`transformer/config.json` has `attn_mode: torch` (inference-ready).
+**IMPORTANT — config must match training**: the inference config's
+`norm_stat` must contain the same MIN/MAX values used during training
+(NOT the original quantile values). The `va_g1_putawaytools_v21_cfg.py`
+in the lingbot-va repo has been updated in lockstep — using the original
+quantile config at inference with this checkpoint would denormalize wrong.
+Quick check: `grep "1.178246855736" wan_va/configs/va_g1_putawaytools_v21_cfg.py`
+should return a hit.

transformer/config.json ADDED Viewed

	@@ -0,0 +1,23 @@

+{
+  "patch_size": [
+    1,
+    2,
+    2
+  ],
+  "num_attention_heads": 24,
+  "attention_head_dim": 128,
+  "in_channels": 48,
+  "out_channels": 48,
+  "action_dim": 30,
+  "text_dim": 4096,
+  "freq_dim": 256,
+  "ffn_dim": 14336,
+  "num_layers": 30,
+  "cross_attn_norm": true,
+  "eps": 1e-06,
+  "rope_max_seq_len": 1024,
+  "pos_embed_seq_len": null,
+  "attn_mode": "torch",
+  "_class_name": "WanTransformer3DModel",
+  "_diffusers_version": "0.35.0.dev0"
+}

transformer/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ae61c1ca6d3532dae2c285bf60dc4ef13793fcc5271bad4bbaf4ffb8d551cc32
+size 10177831668