BorisGuo commited on
Commit
3a044f8
·
verified ·
1 Parent(s): b21935f

Replace iter 20000 with iter 30000 (eval L1 0.6654 → 0.6545)

Browse files
Files changed (2) hide show
  1. README.md +20 -16
  2. model.pt +1 -1
README.md CHANGED
@@ -16,16 +16,16 @@ library_name: cosmos-policy
16
  # Cosmos-Policy 2B 480p — Romoya Bimanual Crack-Egg
17
 
18
  Single-task fine-tune of [`nvidia/Cosmos-Predict2-2B-Video2World`](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World)
19
- (model-480p-16fps.pt) on the **romoya** bimanual lebai-follower **crack-egg** dataset
20
  (`romoya/B3_Station_crack_egg`, 55 episodes / 118,224 frames).
21
 
22
  Task language: `pick-up an egg and crack into the bowl`.
23
 
24
  ## Training
25
 
26
- - Checkpoint exported at iteration **20,000** (best of {6500, 7000, 10000, 15000, 20000} on offline eval).
27
- - Recipe inherits the ALOHA bimanual ALOHA-Cosmos-Policy schedule (`state_t=11`, `chunk_duration=41`, 3 cameras).
28
- - Batch size 4, num_workers 4, 1× A100 80 GB, ~1.4 s/iter steady-state.
29
  - Trained on the LeRobot v2.1 conversion of the source v3 dataset.
30
 
31
  ## Files
@@ -33,23 +33,27 @@ Task language: `pick-up an egg and crack into the bowl`.
33
  | file | purpose |
34
  |---|---|
35
  | `model.pt` | consolidated PyTorch checkpoint (~3.91 GB; converted from FSDP/DCP shards via `torch.distributed.checkpoint.format_utils.dcp_to_torch_save`) |
36
- | `dataset_statistics.json` | action/proprio normalisation stats used at training |
37
- | `dataset_statistics_post_norm.json` | post-normalisation stats (auxiliary) |
38
- | `t5_embeddings.pkl` | precomputed T5 embeddings for the 4 romoya task commands (only `pick-up an egg and crack into the bowl` is used here) |
39
 
40
  ## Offline evaluation
41
 
42
- On `romoya/eval_pi05_bimanual_crack_egg` (5 episodes × 5 query points, 5 denoising steps):
43
 
44
- | Checkpoint | Mean L1 (action units) ↓ | Cross-step Corr ↑ |
45
  |---|---|---|
46
- | iter 6500 | 0.7575 | 0.130 |
47
- | iter 7000 | 0.7333 | 0.114 |
48
- | iter 10000 | 0.6975 | 0.111 |
49
- | iter 15000 | 0.6830 | 0.154 |
50
- | **iter 20000** | **0.6654** | **0.144** |
 
 
51
 
52
- Action L1 is computed in the unnormalised space — i.e., raw joint-pos / effort / velocity / DO units of the romoya bi-lebai-follower (94-dim action, 166-dim proprio).
 
 
53
 
54
  ## Usage
55
 
@@ -60,7 +64,7 @@ The model expects the ALOHA-style `obs` dict with keys `primary_image`, `left_wr
60
  See `cosmos_policy/experiments/robot/cosmos_utils.py:get_action` (suite="aloha" branch) for the full contract.
61
 
62
  Action / proprio dimensions deviate from ALOHA defaults (ACTION_DIM=94, PROPRIO_DIM=166, NUM_ACTIONS_CHUNK=25);
63
- patch `cosmos_policy.constants` at runtime before importing `cosmos_utils`.
64
 
65
  ```python
66
  import cosmos_policy.constants as _C
 
16
  # Cosmos-Policy 2B 480p — Romoya Bimanual Crack-Egg
17
 
18
  Single-task fine-tune of [`nvidia/Cosmos-Predict2-2B-Video2World`](https://huggingface.co/nvidia/Cosmos-Predict2-2B-Video2World)
19
+ (`model-480p-16fps.pt`) on the **romoya** bimanual lebai-follower **crack-egg** dataset
20
  (`romoya/B3_Station_crack_egg`, 55 episodes / 118,224 frames).
21
 
22
  Task language: `pick-up an egg and crack into the bowl`.
23
 
24
  ## Training
25
 
26
+ - Checkpoint exported at iteration **30,000** (current `model.pt`; this is the plateau see eval table below).
27
+ - Recipe inherits the ALOHA bimanual ALOHA-Cosmos-Policy schedule (`state_t=11`, `chunk_duration=41`, 3 cameras: 1 third-person `base` + 2 wrist).
28
+ - Batch size 4, num_workers 4, 1× A100 80 GB, ~1.2 s/iter steady-state with a 56 GB/worker decoded-video cache.
29
  - Trained on the LeRobot v2.1 conversion of the source v3 dataset.
30
 
31
  ## Files
 
33
  | file | purpose |
34
  |---|---|
35
  | `model.pt` | consolidated PyTorch checkpoint (~3.91 GB; converted from FSDP/DCP shards via `torch.distributed.checkpoint.format_utils.dcp_to_torch_save`) |
36
+ | `dataset_statistics.json` | action / proprio normalization stats used at training time |
37
+ | `dataset_statistics_post_norm.json` | post-normalization stats (auxiliary) |
38
+ | `t5_embeddings.pkl` | precomputed T5 embeddings for the 4 romoya task commands; only `pick-up an egg and crack into the bowl` is used here |
39
 
40
  ## Offline evaluation
41
 
42
+ On `romoya/eval_pi05_bimanual_crack_egg` (5 episodes × 5 query points each, 5 denoising steps, action-chunk L1 in unnormalized units):
43
 
44
+ | Checkpoint | Mean L1 ↓ | Cross-step Corr ↑ |
45
  |---|---|---|
46
+ | iter 6,500 | 0.7575 | 0.130 |
47
+ | iter 7,000 | 0.7333 | 0.114 |
48
+ | iter 10,000 | 0.6975 | 0.111 |
49
+ | iter 15,000 | 0.6830 | 0.154 |
50
+ | iter 20,000 | 0.6654 | 0.144 |
51
+ | iter 25,000 | 0.6563 | 0.152 |
52
+ | **iter 30,000** | **0.6545** | 0.142 |
53
 
54
+ Action L1 is computed in the unnormalized space — i.e., raw joint-pos / effort / velocity / DO units of the romoya bi-lebai-follower. The action vector is 94-D (12 joint pos + 12 effort + 12 vel + 4 DO + ... per arm pair); the proprioceptive state is 166-D.
55
+
56
+ The improvement curve flattens after iter 25,000 (Δ = −0.002 over the last 5,000 iters) — i.e., 55 demos hit a plateau at this resolution. Bigger gains likely require either more demonstrations or task-mixture training.
57
 
58
  ## Usage
59
 
 
64
  See `cosmos_policy/experiments/robot/cosmos_utils.py:get_action` (suite="aloha" branch) for the full contract.
65
 
66
  Action / proprio dimensions deviate from ALOHA defaults (ACTION_DIM=94, PROPRIO_DIM=166, NUM_ACTIONS_CHUNK=25);
67
+ patch `cosmos_policy.constants` at runtime **before** importing `cosmos_utils`:
68
 
69
  ```python
70
  import cosmos_policy.constants as _C
model.pt CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:5c0a470f4a54d6479c51db66552cabd11327b21aab5e18151e8c6bb0cc42c3d3
3
  size 3913008759
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b1624cf4ed9208527f4e28fb1fadb1fe54bdb0caf822fbc71db63e2c01c0c0ad
3
  size 3913008759