armanakbari4 commited on
Commit
03c2fa0
Β·
verified Β·
1 Parent(s): 4fa9e3c

g1 put_away_tools v2.1 FDM-v2 transformer @ step 5000 (final, MIN/MAX norm)

Browse files
README.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - robotics
5
+ - lingbot-va
6
+ - unitree-g1
7
+ - world-model
8
+ ---
9
+
10
+ # UnitreeG1_putawaytoolsV2_minmax_5000step β€” LingBot-VA G1 post-trained transformer
11
+
12
+ Fine-tuned `transformer` for LingBot-VA on Unitree G1 (Dex1), task
13
+ `XiaoweiLinXL/pi05-unitree-g1-put-away-tools-v2.1`:
14
+ *"Put the battery on the shelf labeled 'battery' and put the screwdriver
15
+ on the shelf labeled 'Philips'."*
16
+
17
+ **Same data, same recipe as the `rndchnk` series β€” only difference: action
18
+ normalization is MIN/MAX (not q01/q99 quantile).** See "Why min/max" below.
19
+
20
+ - Base: `robbyant/lingbot-va-base`
21
+ - Post-training: 70 demos (43,851 frames), lr 1e-5, **FDM v2 recipe** β€”
22
+ mutually-exclusive per-microstep regime (`fdm_prob=0.5`, `lambda_fdm=1.0`).
23
+ Per-step randomized chunk_size ∈ {1..4} and window_size ∈ {4..64}.
24
+ - 4 GPUs Γ— `grad_accum=4` = effective batch 16, optimizer **step 5000** of a
25
+ 5000-step schedule (final, fully trained).
26
+ - Final training losses: video=0.0347, action=0.00082, fdm=0.0347 β€” higher
27
+ than the quantile run's video=0.0072 (which was overfit on a compressed
28
+ bulk distribution); this is the *healthy* loss level for a bounded target
29
+ range.
30
+ - **Action normalization: dataset min/max** β€” every training target bounded
31
+ strictly to [-1, +1]. (Codebase variable names are still `q01`/`q99`
32
+ because that's all the loader supports; the values stored there are
33
+ min/max β€” drop-in replacement.)
34
+ - This repo contains **only `transformer/`** β€” `vae/`, `text_encoder/`,
35
+ `tokenizer/` are unchanged from `robbyant/lingbot-va-base`.
36
+
37
+ ## Why min/max (the v21 quantile series underperformed)
38
+
39
+ The earlier v21 5k training under quantile normalization had its right-arm
40
+ joints overflow: R-wrist-roll absmax was **4.11**, R-shoulder-roll 3.55,
41
+ R-wrist-yaw 3.55. The model's bounded prediction range
42
+ (`[~-1.5, ~+1.5]`) cannot match those targets β†’ during deployment the model
43
+ under-predicts the precise reach-extension moments β†’ arm under-extends β†’
44
+ misses the shelves. Min/max normalization bounds every target to Β±1
45
+ (verified absmax = 1.0000 over all 43,851 training rows), eliminating
46
+ out-of-range targets and restoring deployment quality.
47
+
48
+ ## Assemble an eval-ready checkpoint
49
+
50
+ ```bash
51
+ hf download robbyant/lingbot-va-base --local-dir lingbot-va-base
52
+ hf download EmbodyX/UnitreeG1_putawaytoolsV2_minmax_5000step --local-dir g1_pat_v2_mm_5000_dl
53
+
54
+ mkdir -p g1_pat_v2_mm_5000
55
+ ln -sf $(realpath g1_pat_v2_mm_5000_dl/transformer) g1_pat_v2_mm_5000/transformer
56
+ ln -sf $(realpath lingbot-va-base/vae) g1_pat_v2_mm_5000/vae
57
+ ln -sf $(realpath lingbot-va-base/text_encoder) g1_pat_v2_mm_5000/text_encoder
58
+ ln -sf $(realpath lingbot-va-base/tokenizer) g1_pat_v2_mm_5000/tokenizer
59
+ ```
60
+
61
+ Serve with `CONFIG_NAME=g1_putawaytools_v21 MODEL_PATH=g1_pat_v2_mm_5000`.
62
+ `transformer/config.json` has `attn_mode: torch` (inference-ready).
63
+
64
+ **IMPORTANT β€” config must match training**: the inference config's
65
+ `norm_stat` must contain the same MIN/MAX values used during training
66
+ (NOT the original quantile values). The `va_g1_putawaytools_v21_cfg.py`
67
+ in the lingbot-va repo has been updated in lockstep β€” using the original
68
+ quantile config at inference with this checkpoint would denormalize wrong.
69
+ Quick check: `grep "1.178246855736" wan_va/configs/va_g1_putawaytools_v21_cfg.py`
70
+ should return a hit.
transformer/config.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "patch_size": [
3
+ 1,
4
+ 2,
5
+ 2
6
+ ],
7
+ "num_attention_heads": 24,
8
+ "attention_head_dim": 128,
9
+ "in_channels": 48,
10
+ "out_channels": 48,
11
+ "action_dim": 30,
12
+ "text_dim": 4096,
13
+ "freq_dim": 256,
14
+ "ffn_dim": 14336,
15
+ "num_layers": 30,
16
+ "cross_attn_norm": true,
17
+ "eps": 1e-06,
18
+ "rope_max_seq_len": 1024,
19
+ "pos_embed_seq_len": null,
20
+ "attn_mode": "torch",
21
+ "_class_name": "WanTransformer3DModel",
22
+ "_diffusers_version": "0.35.0.dev0"
23
+ }
transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae61c1ca6d3532dae2c285bf60dc4ef13793fcc5271bad4bbaf4ffb8d551cc32
3
+ size 10177831668