HeewonLee7 commited on
Commit
d736096
·
verified ·
1 Parent(s): fd0795b

Upload folder using huggingface_hub

Browse files
checkpoint-10000/config.json ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "action_dim": 66,
3
+ "action_head_cfg": {
4
+ "action_dim": 66,
5
+ "action_horizon": 40,
6
+ "add_pos_embed": true,
7
+ "backbone_embedding_dim": 2048,
8
+ "diffusion_model_cfg": {
9
+ "attention_head_dim": 48,
10
+ "cross_attention_dim": 2048,
11
+ "dropout": 0.2,
12
+ "final_dropout": true,
13
+ "interleave_self_attention": true,
14
+ "norm_type": "ada_norm",
15
+ "num_attention_heads": 32,
16
+ "num_layers": 16,
17
+ "output_dim": 1024,
18
+ "positional_embeddings": null
19
+ },
20
+ "hidden_size": 1024,
21
+ "input_embedding_dim": 1536,
22
+ "max_action_dim": 32,
23
+ "max_state_dim": 66,
24
+ "model_dtype": "float32",
25
+ "noise_beta_alpha": 1.5,
26
+ "noise_beta_beta": 1.0,
27
+ "noise_s": 0.999,
28
+ "num_inference_timesteps": 4,
29
+ "num_target_vision_tokens": 32,
30
+ "num_timestep_buckets": 1000,
31
+ "training_rtc_max_overlap": -1,
32
+ "tune_diffusion_model": true,
33
+ "tune_projector": true,
34
+ "use_vlln": true,
35
+ "vl_self_attention_cfg": {
36
+ "attention_head_dim": 64,
37
+ "dropout": 0.2,
38
+ "final_dropout": true,
39
+ "num_attention_heads": 32,
40
+ "num_layers": 4,
41
+ "positional_embeddings": null
42
+ }
43
+ },
44
+ "action_horizon": 40,
45
+ "architectures": [
46
+ "GR00T_N1_5"
47
+ ],
48
+ "attn_implementation": null,
49
+ "backbone_cfg": {
50
+ "eagle_path": "NVEagle/eagle_er-qwen3_1_7B-Siglip2_400M_stage1_5_128gpu_er_v7_1mlp_nops",
51
+ "load_bf16": false,
52
+ "project_to_dim": null,
53
+ "reproject_vision": false,
54
+ "select_layer": 12,
55
+ "tune_llm": false,
56
+ "tune_visual": true,
57
+ "use_flash_attention": true
58
+ },
59
+ "compute_dtype": "bfloat16",
60
+ "hidden_size": 2048,
61
+ "model_dtype": "float32",
62
+ "model_type": "gr00t_n1_5",
63
+ "torch_dtype": "bfloat16",
64
+ "transformers_version": "4.51.3"
65
+ }
checkpoint-10000/experiment_cfg/data_config.py ADDED
@@ -0,0 +1,1981 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
+ # SPDX-License-Identifier: Apache-2.0
3
+ #
4
+ # Licensed under the Apache License, Version 2.0 (the "License");
5
+ # you may not use this file except in compliance with the License.
6
+ # You may obtain a copy of the License at
7
+ #
8
+ # http://www.apache.org/licenses/LICENSE-2.0
9
+ #
10
+ # Unless required by applicable law or agreed to in writing, software
11
+ # distributed under the License is distributed on an "AS IS" BASIS,
12
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13
+ # See the License for the specific language governing permissions and
14
+ # limitations under the License.
15
+
16
+ from abc import ABC, abstractmethod
17
+ from dataclasses import dataclass
18
+ from typing import Optional
19
+
20
+ import torch
21
+
22
+ from gr00t.data.dataset import ModalityConfig
23
+ from gr00t.data.transform.base import ComposedModalityTransform, ModalityTransform
24
+ from gr00t.data.transform.concat import ConcatTransform
25
+ from gr00t.data.transform.state_action import (
26
+ StateActionSinCosTransform,
27
+ StateActionToTensor,
28
+ StateActionTransform,
29
+ )
30
+ from gr00t.data.transform.video import (
31
+ VideoColorJitter,
32
+ VideoCrop,
33
+ VideoPerspective,
34
+ VideoResize,
35
+ VideoToNumpy,
36
+ VideoToTensor,
37
+ )
38
+ from gr00t.model.transforms import GR00TTransform
39
+
40
+
41
+ class BaseDataConfig(ABC):
42
+ vlash_offset: int = 0
43
+ training_rtc_max_overlap: int = 0
44
+
45
+
46
+ def modality_config(self) -> dict[str, ModalityConfig]:
47
+ video_modality = ModalityConfig(
48
+ delta_indices=self.observation_indices,
49
+ modality_keys=self.video_keys,
50
+ )
51
+ state_modality = ModalityConfig(
52
+ delta_indices=self.observation_indices,
53
+ modality_keys=self.state_keys,
54
+ )
55
+ action_modality = ModalityConfig(
56
+ delta_indices=self.action_indices,
57
+ modality_keys=self.action_keys,
58
+ )
59
+ language_modality = ModalityConfig(
60
+ delta_indices=self.observation_indices,
61
+ modality_keys=self.language_keys,
62
+ )
63
+ return {
64
+ "video": video_modality,
65
+ "state": state_modality,
66
+ "action": action_modality,
67
+ "language": language_modality,
68
+ }
69
+
70
+ @abstractmethod
71
+ def transform(self) -> ModalityTransform:
72
+ pass
73
+
74
+
75
+ #####################################################################################
76
+ # helper functions
77
+ #####################################################################################
78
+
79
+
80
+ def import_external_data_config(data_config_str: str) -> Optional[BaseDataConfig]:
81
+ """
82
+ Import and instantiate an external data configuration class.
83
+
84
+ Format: "module_path:ClassName" (e.g., "my_configs:RobotConfig")
85
+ Supports nested modules like "package.submodule:ClassName"
86
+ """
87
+ if ":" not in data_config_str:
88
+ return None
89
+
90
+ import importlib
91
+ import os
92
+ import sys
93
+ from pathlib import Path
94
+
95
+ # Add current working directory to Python path
96
+ current_dir = str(Path(os.getcwd()).absolute())
97
+ if current_dir not in sys.path:
98
+ sys.path.insert(0, current_dir)
99
+
100
+ try:
101
+ module_path, class_name = data_config_str.split(":", 1)
102
+ if not module_path or not class_name:
103
+ raise ValueError(f"Invalid format: '{data_config_str}'. Use 'module:ClassName'")
104
+
105
+ print(f"Loading external config: {module_path}.{class_name}")
106
+
107
+ module = importlib.import_module(module_path)
108
+ if not hasattr(module, class_name):
109
+ available = [
110
+ n
111
+ for n in dir(module)
112
+ if not n.startswith("_") and isinstance(getattr(module, n), type)
113
+ ]
114
+ raise AttributeError(
115
+ f"Class '{class_name}' not found in '{module_path}'. Available: {available}"
116
+ )
117
+
118
+ # assert if the class has 'transform' and 'modality_config' methods
119
+ if not hasattr(getattr(module, class_name), "transform"):
120
+ raise AttributeError(f"Class '{class_name}' does not have a 'transform' method")
121
+ if not hasattr(getattr(module, class_name), "modality_config"):
122
+ raise AttributeError(f"Class '{class_name}' does not have a 'modality_config' method")
123
+
124
+ return getattr(module, class_name)()
125
+
126
+ except (ModuleNotFoundError, AttributeError, ValueError) as e:
127
+ print(f"Config loading failed: {e}")
128
+ print("Example: my_configs:MyConfig, package.submodule:ClassName")
129
+ raise
130
+
131
+
132
+ def load_data_config(data_config_str: str) -> BaseDataConfig:
133
+ """
134
+ Get a data config class from a string.
135
+ >>> load_data_config("so100")
136
+ >>> get_data_config("dir.subdir.my_configs:RobotConfig")
137
+ """
138
+ if data_config_str in DATA_CONFIG_MAP:
139
+ return DATA_CONFIG_MAP[data_config_str]
140
+ data_config_cls = import_external_data_config(data_config_str)
141
+ if data_config_cls is not None:
142
+ return data_config_cls
143
+ # Yellow warning color
144
+ yellow = "\033[93m"
145
+ reset = "\033[0m"
146
+ raise ValueError(
147
+ f"{yellow}Invalid data_config '{data_config_str}'. "
148
+ f"Available options: {list(DATA_CONFIG_MAP.keys())}, "
149
+ f"or use 'module:ClassName' for external configs{reset}"
150
+ )
151
+
152
+
153
+ ###########################################################################################
154
+
155
+
156
+ class FourierGr1ArmsOnlyDataConfig(BaseDataConfig):
157
+ video_keys = ["video.ego_view"]
158
+ state_keys = [
159
+ "state.left_arm",
160
+ "state.right_arm",
161
+ "state.left_hand",
162
+ "state.right_hand",
163
+ ]
164
+ action_keys = [
165
+ "action.left_arm",
166
+ "action.right_arm",
167
+ "action.left_hand",
168
+ "action.right_hand",
169
+ ]
170
+ language_keys = ["annotation.human.action.task_description"]
171
+ observation_indices = [0]
172
+ action_indices = list(range(16))
173
+
174
+ def transform(self) -> ModalityTransform:
175
+ transforms = [
176
+ # video transforms
177
+ VideoToTensor(apply_to=self.video_keys),
178
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
179
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
180
+ VideoColorJitter(
181
+ apply_to=self.video_keys,
182
+ brightness=0.3,
183
+ contrast=0.4,
184
+ saturation=0.5,
185
+ hue=0.08,
186
+ ),
187
+ VideoToNumpy(apply_to=self.video_keys),
188
+ # state transforms
189
+ StateActionToTensor(apply_to=self.state_keys),
190
+ StateActionSinCosTransform(apply_to=self.state_keys),
191
+ # action transforms
192
+ StateActionToTensor(apply_to=self.action_keys),
193
+ StateActionTransform(
194
+ apply_to=self.action_keys,
195
+ normalization_modes={key: "min_max" for key in self.action_keys},
196
+ ),
197
+ # concat transforms
198
+ ConcatTransform(
199
+ video_concat_order=self.video_keys,
200
+ state_concat_order=self.state_keys,
201
+ action_concat_order=self.action_keys,
202
+ ),
203
+ # model-specific transform
204
+ GR00TTransform(
205
+ state_horizon=len(self.observation_indices),
206
+ action_horizon=len(self.action_indices),
207
+ max_state_dim=64,
208
+ max_action_dim=32,
209
+ ),
210
+ ]
211
+ return ComposedModalityTransform(transforms=transforms)
212
+
213
+
214
+ ###########################################################################################
215
+
216
+
217
+ class So100DataConfig(BaseDataConfig):
218
+ video_keys = ["video.webcam"]
219
+ state_keys = ["state.single_arm", "state.gripper"]
220
+ action_keys = ["action.single_arm", "action.gripper"]
221
+ language_keys = ["annotation.human.task_description"]
222
+ observation_indices = [0]
223
+ action_indices = list(range(16))
224
+
225
+ def transform(self) -> ModalityTransform:
226
+ transforms = [
227
+ # video transforms
228
+ VideoToTensor(apply_to=self.video_keys),
229
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
230
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
231
+ VideoColorJitter(
232
+ apply_to=self.video_keys,
233
+ brightness=0.3,
234
+ contrast=0.4,
235
+ saturation=0.5,
236
+ hue=0.08,
237
+ ),
238
+ VideoToNumpy(apply_to=self.video_keys),
239
+ # state transforms
240
+ StateActionToTensor(apply_to=self.state_keys),
241
+ StateActionTransform(
242
+ apply_to=self.state_keys,
243
+ normalization_modes={key: "min_max" for key in self.state_keys},
244
+ ),
245
+ # action transforms
246
+ StateActionToTensor(apply_to=self.action_keys),
247
+ StateActionTransform(
248
+ apply_to=self.action_keys,
249
+ normalization_modes={key: "min_max" for key in self.action_keys},
250
+ ),
251
+ # concat transforms
252
+ ConcatTransform(
253
+ video_concat_order=self.video_keys,
254
+ state_concat_order=self.state_keys,
255
+ action_concat_order=self.action_keys,
256
+ ),
257
+ # model-specific transform
258
+ GR00TTransform(
259
+ state_horizon=len(self.observation_indices),
260
+ action_horizon=len(self.action_indices),
261
+ max_state_dim=64,
262
+ max_action_dim=32,
263
+ ),
264
+ ]
265
+ return ComposedModalityTransform(transforms=transforms)
266
+
267
+
268
+ ###########################################################################################
269
+
270
+
271
+ class So100DualCamDataConfig(So100DataConfig):
272
+ video_keys = ["video.front", "video.wrist"]
273
+ state_keys = ["state.single_arm", "state.gripper"]
274
+ action_keys = ["action.single_arm", "action.gripper"]
275
+ language_keys = ["annotation.human.task_description"]
276
+ observation_indices = [0]
277
+ action_indices = list(range(16))
278
+
279
+
280
+ ###########################################################################################
281
+
282
+
283
+ class UnitreeG1DataConfig(BaseDataConfig):
284
+ video_keys = ["video.rs_view"]
285
+ state_keys = ["state.left_arm", "state.right_arm", "state.left_hand", "state.right_hand"]
286
+ action_keys = ["action.left_arm", "action.right_arm", "action.left_hand", "action.right_hand"]
287
+ language_keys = ["annotation.human.task_description"]
288
+ observation_indices = [0]
289
+ action_indices = list(range(16))
290
+
291
+ def transform(self) -> ModalityTransform:
292
+ transforms = [
293
+ # video transforms
294
+ VideoToTensor(apply_to=self.video_keys),
295
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
296
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
297
+ VideoColorJitter(
298
+ apply_to=self.video_keys,
299
+ brightness=0.3,
300
+ contrast=0.4,
301
+ saturation=0.5,
302
+ hue=0.08,
303
+ ),
304
+ VideoToNumpy(apply_to=self.video_keys),
305
+ # state transforms
306
+ StateActionToTensor(apply_to=self.state_keys),
307
+ StateActionTransform(
308
+ apply_to=self.state_keys,
309
+ normalization_modes={key: "min_max" for key in self.state_keys},
310
+ ),
311
+ # action transforms
312
+ StateActionToTensor(apply_to=self.action_keys),
313
+ StateActionTransform(
314
+ apply_to=self.action_keys,
315
+ normalization_modes={key: "min_max" for key in self.action_keys},
316
+ ),
317
+ # concat transforms
318
+ ConcatTransform(
319
+ video_concat_order=self.video_keys,
320
+ state_concat_order=self.state_keys,
321
+ action_concat_order=self.action_keys,
322
+ ),
323
+ # model-specific transform
324
+ GR00TTransform(
325
+ state_horizon=len(self.observation_indices),
326
+ action_horizon=len(self.action_indices),
327
+ max_state_dim=64,
328
+ max_action_dim=32,
329
+ ),
330
+ ]
331
+ return ComposedModalityTransform(transforms=transforms)
332
+
333
+
334
+ class UnitreeG1FullBodyDataConfig(UnitreeG1DataConfig):
335
+ video_keys = ["video.rs_view"]
336
+ state_keys = [
337
+ "state.left_leg",
338
+ "state.right_leg",
339
+ "state.waist",
340
+ "state.left_arm",
341
+ "state.right_arm",
342
+ "state.left_hand",
343
+ "state.right_hand",
344
+ ]
345
+ action_keys = ["action.left_arm", "action.right_arm", "action.left_hand", "action.right_hand"]
346
+ language_keys = ["annotation.human.task_description"]
347
+ observation_indices = [0]
348
+ action_indices = list(range(16))
349
+
350
+
351
+ ###########################################################################################
352
+
353
+
354
+ class FourierGr1FullUpperBodyDataConfig(BaseDataConfig):
355
+ video_keys = ["video.front_view"]
356
+ state_keys = [
357
+ "state.left_arm",
358
+ "state.right_arm",
359
+ "state.left_hand",
360
+ "state.right_hand",
361
+ "state.waist",
362
+ "state.neck",
363
+ ]
364
+ action_keys = [
365
+ "action.left_arm",
366
+ "action.right_arm",
367
+ "action.left_hand",
368
+ "action.right_hand",
369
+ "action.waist",
370
+ "action.neck",
371
+ ]
372
+ language_keys = ["annotation.human.action.task_description"]
373
+ observation_indices = [0]
374
+ action_indices = list(range(16))
375
+
376
+ def transform(self):
377
+ transforms = [
378
+ # video transforms
379
+ VideoToTensor(apply_to=self.video_keys),
380
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
381
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
382
+ VideoColorJitter(
383
+ apply_to=self.video_keys,
384
+ brightness=0.3,
385
+ contrast=0.4,
386
+ saturation=0.5,
387
+ hue=0.08,
388
+ ),
389
+ VideoToNumpy(apply_to=self.video_keys),
390
+ # state transforms
391
+ StateActionToTensor(apply_to=self.state_keys),
392
+ StateActionTransform(
393
+ apply_to=self.state_keys,
394
+ normalization_modes={key: "min_max" for key in self.state_keys},
395
+ ),
396
+ # action transforms
397
+ StateActionToTensor(apply_to=self.action_keys),
398
+ StateActionTransform(
399
+ apply_to=self.action_keys,
400
+ normalization_modes={key: "min_max" for key in self.action_keys},
401
+ ),
402
+ # concat transforms
403
+ ConcatTransform(
404
+ video_concat_order=self.video_keys,
405
+ state_concat_order=self.state_keys,
406
+ action_concat_order=self.action_keys,
407
+ ),
408
+ GR00TTransform(
409
+ state_horizon=len(self.observation_indices),
410
+ action_horizon=len(self.action_indices),
411
+ max_state_dim=64,
412
+ max_action_dim=32,
413
+ ),
414
+ ]
415
+
416
+ return ComposedModalityTransform(transforms=transforms)
417
+
418
+
419
+ ###########################################################################################
420
+
421
+
422
+ class BimanualPandaGripperDataConfig(BaseDataConfig):
423
+ video_keys = [
424
+ "video.right_wrist_view",
425
+ "video.left_wrist_view",
426
+ "video.front_view",
427
+ ]
428
+ state_keys = [
429
+ "state.right_arm_eef_pos",
430
+ "state.right_arm_eef_quat",
431
+ "state.right_gripper_qpos",
432
+ "state.left_arm_eef_pos",
433
+ "state.left_arm_eef_quat",
434
+ "state.left_gripper_qpos",
435
+ ]
436
+ action_keys = [
437
+ "action.right_arm_eef_pos",
438
+ "action.right_arm_eef_rot",
439
+ "action.right_gripper_close",
440
+ "action.left_arm_eef_pos",
441
+ "action.left_arm_eef_rot",
442
+ "action.left_gripper_close",
443
+ ]
444
+
445
+ language_keys = ["annotation.human.action.task_description"]
446
+ observation_indices = [0]
447
+ action_indices = list(range(16))
448
+
449
+ # Used in StateActionTransform for normalization and target rotations
450
+ state_normalization_modes = {
451
+ "state.right_arm_eef_pos": "min_max",
452
+ "state.right_gripper_qpos": "min_max",
453
+ "state.left_arm_eef_pos": "min_max",
454
+ "state.left_gripper_qpos": "min_max",
455
+ }
456
+ state_target_rotations = {
457
+ "state.right_arm_eef_quat": "rotation_6d",
458
+ "state.left_arm_eef_quat": "rotation_6d",
459
+ }
460
+ action_normalization_modes = {
461
+ "action.right_gripper_close": "binary",
462
+ "action.left_gripper_close": "binary",
463
+ }
464
+
465
+ def transform(self):
466
+ transforms = [
467
+ # video transforms
468
+ VideoToTensor(apply_to=self.video_keys),
469
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
470
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
471
+ VideoColorJitter(
472
+ apply_to=self.video_keys,
473
+ brightness=0.3,
474
+ contrast=0.4,
475
+ saturation=0.5,
476
+ hue=0.08,
477
+ ),
478
+ VideoToNumpy(apply_to=self.video_keys),
479
+ # state transforms
480
+ StateActionToTensor(apply_to=self.state_keys),
481
+ StateActionTransform(
482
+ apply_to=self.state_keys,
483
+ normalization_modes=self.state_normalization_modes,
484
+ target_rotations=self.state_target_rotations,
485
+ ),
486
+ # action transforms
487
+ StateActionToTensor(apply_to=self.action_keys),
488
+ StateActionTransform(
489
+ apply_to=self.action_keys,
490
+ normalization_modes=self.action_normalization_modes,
491
+ ),
492
+ # concat transforms
493
+ ConcatTransform(
494
+ video_concat_order=self.video_keys,
495
+ state_concat_order=self.state_keys,
496
+ action_concat_order=self.action_keys,
497
+ ),
498
+ GR00TTransform(
499
+ state_horizon=len(self.observation_indices),
500
+ action_horizon=len(self.action_indices),
501
+ max_state_dim=64,
502
+ max_action_dim=32,
503
+ ),
504
+ ]
505
+
506
+ return ComposedModalityTransform(transforms=transforms)
507
+
508
+
509
+ ###########################################################################################
510
+
511
+
512
+ class BimanualPandaHandDataConfig(BimanualPandaGripperDataConfig):
513
+ video_keys = [
514
+ "video.right_wrist_view",
515
+ "video.left_wrist_view",
516
+ "video.ego_view",
517
+ ]
518
+ state_keys = [
519
+ "state.right_arm_eef_pos",
520
+ "state.right_arm_eef_quat",
521
+ "state.right_hand",
522
+ "state.left_arm_eef_pos",
523
+ "state.left_arm_eef_quat",
524
+ "state.left_hand",
525
+ ]
526
+ action_keys = [
527
+ "action.right_arm_eef_pos",
528
+ "action.right_arm_eef_rot",
529
+ "action.right_hand",
530
+ "action.left_arm_eef_pos",
531
+ "action.left_arm_eef_rot",
532
+ "action.left_hand",
533
+ ]
534
+ language_keys = ["annotation.human.action.task_description"]
535
+ observation_indices = [0]
536
+ action_indices = list(range(16))
537
+
538
+ # Used in StateActionTransform for normalization and target rotations
539
+ state_normalization_modes = {
540
+ "state.right_arm_eef_pos": "min_max",
541
+ "state.right_hand": "min_max",
542
+ "state.left_arm_eef_pos": "min_max",
543
+ "state.left_hand": "min_max",
544
+ }
545
+ action_normalization_modes = {
546
+ "action.right_hand": "min_max",
547
+ "action.left_hand": "min_max",
548
+ }
549
+ state_target_rotations = {
550
+ "state.right_arm_eef_quat": "rotation_6d",
551
+ "state.left_arm_eef_quat": "rotation_6d",
552
+ }
553
+
554
+
555
+ ###########################################################################################
556
+
557
+
558
+ class SinglePandaGripperDataConfig(BimanualPandaGripperDataConfig):
559
+ video_keys = [
560
+ "video.left_view",
561
+ "video.right_view",
562
+ "video.wrist_view",
563
+ ]
564
+ state_keys = [
565
+ "state.end_effector_position_relative",
566
+ "state.end_effector_rotation_relative",
567
+ "state.gripper_qpos",
568
+ "state.base_position",
569
+ "state.base_rotation",
570
+ ]
571
+ action_keys = [
572
+ "action.end_effector_position",
573
+ "action.end_effector_rotation",
574
+ "action.gripper_close",
575
+ "action.base_motion",
576
+ "action.control_mode",
577
+ ]
578
+
579
+ language_keys = ["annotation.human.action.task_description"]
580
+ observation_indices = [0]
581
+ action_indices = list(range(16))
582
+
583
+ # Used in StateActionTransform for normalization and target rotations
584
+ state_normalization_modes = {
585
+ "state.end_effector_position_relative": "min_max",
586
+ "state.end_effector_rotation_relative": "min_max",
587
+ "state.gripper_qpos": "min_max",
588
+ "state.base_position": "min_max",
589
+ "state.base_rotation": "min_max",
590
+ }
591
+ state_target_rotations = {
592
+ "state.end_effector_rotation_relative": "rotation_6d",
593
+ "state.base_rotation": "rotation_6d",
594
+ }
595
+ action_normalization_modes = {
596
+ "action.end_effector_position": "min_max",
597
+ "action.end_effector_rotation": "min_max",
598
+ "action.gripper_close": "binary",
599
+ "action.base_motion": "min_max",
600
+ "action.control_mode": "binary",
601
+ }
602
+
603
+
604
+ ###########################################################################################
605
+
606
+
607
+ class FourierGr1ArmsWaistDataConfig(FourierGr1ArmsOnlyDataConfig):
608
+ video_keys = ["video.ego_view"]
609
+ state_keys = [
610
+ "state.left_arm",
611
+ "state.right_arm",
612
+ "state.left_hand",
613
+ "state.right_hand",
614
+ "state.waist",
615
+ ]
616
+ action_keys = [
617
+ "action.left_arm",
618
+ "action.right_arm",
619
+ "action.left_hand",
620
+ "action.right_hand",
621
+ "action.waist",
622
+ ]
623
+ language_keys = ["annotation.human.coarse_action"]
624
+ observation_indices = [0]
625
+ action_indices = list(range(16))
626
+
627
+ def transform(self):
628
+ return super().transform()
629
+
630
+
631
+ class FourierGr1ArmsWaistWithMANODataConfig(FourierGr1ArmsOnlyDataConfig):
632
+ video_keys = ["video.ego_view"]
633
+ state_keys = [
634
+ "state.left_arm",
635
+ "state.right_arm",
636
+ "state.left_hand",
637
+ "state.right_hand",
638
+ "state.waist",
639
+ ]
640
+ action_keys = [
641
+ "action.left_arm",
642
+ "action.right_arm",
643
+ "action.left_hand",
644
+ "action.right_hand",
645
+ "action.waist",
646
+ ]
647
+ language_keys = ["annotation.human.coarse_action"]
648
+ observation_indices = [0]
649
+ action_indices = list(range(16))
650
+ action_dim = 144
651
+
652
+ def transform(self):
653
+ # Get parent transform
654
+ parent_transform = super().transform()
655
+
656
+ # Extract the transforms list
657
+ transforms = parent_transform.transforms
658
+
659
+ # Find ConcatTransform index and insert tensor-to-numpy conversion after it
660
+ concat_idx = None
661
+ for i, t in enumerate(transforms):
662
+ if isinstance(t, ConcatTransform):
663
+ concat_idx = i
664
+ break
665
+
666
+ if concat_idx is not None:
667
+ # Create a simple transform to convert torch tensors to numpy for state/action
668
+ class TensorToNumpyTransform(ModalityTransform):
669
+ def apply(self, data: dict) -> dict:
670
+ for key in ["state", "action"]:
671
+ if key in data:
672
+ value = data[key]
673
+ # Check if it's a torch tensor by checking for torch tensor methods
674
+ if (
675
+ hasattr(value, "detach")
676
+ and hasattr(value, "cpu")
677
+ and hasattr(value, "numpy")
678
+ ):
679
+ data[key] = value.detach().cpu().numpy()
680
+ return data
681
+
682
+ # Insert after ConcatTransform, before GR00TTransform
683
+ transforms.insert(concat_idx + 1, TensorToNumpyTransform(apply_to=[]))
684
+
685
+ return ComposedModalityTransform(transforms=transforms)
686
+
687
+
688
+ ###########################################################################################
689
+
690
+
691
+ class OxeDroidDataConfig(BaseDataConfig):
692
+ video_keys = [
693
+ "video.exterior_image_1",
694
+ "video.exterior_image_2",
695
+ "video.wrist_image",
696
+ ]
697
+ state_keys = [
698
+ "state.eef_position",
699
+ "state.eef_rotation",
700
+ "state.gripper_position",
701
+ ]
702
+ action_keys = [
703
+ "action.eef_position_delta",
704
+ "action.eef_rotation_delta",
705
+ "action.gripper_position",
706
+ ]
707
+ language_keys = ["annotation.language.language_instruction"]
708
+ observation_indices = [0]
709
+ action_indices = list(range(16))
710
+
711
+ def transform(self):
712
+ transforms = [
713
+ # video transforms
714
+ VideoToTensor(apply_to=self.video_keys),
715
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
716
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
717
+ VideoColorJitter(
718
+ apply_to=self.video_keys,
719
+ brightness=0.3,
720
+ contrast=0.4,
721
+ saturation=0.5,
722
+ hue=0.08,
723
+ ),
724
+ VideoToNumpy(apply_to=self.video_keys),
725
+ # state transforms
726
+ StateActionToTensor(apply_to=self.state_keys),
727
+ StateActionTransform(
728
+ apply_to=self.state_keys,
729
+ normalization_modes={
730
+ "state.eef_position": "min_max",
731
+ "state.gripper_position": "min_max",
732
+ },
733
+ target_rotations={
734
+ "state.eef_rotation": "rotation_6d",
735
+ },
736
+ ),
737
+ # action transforms
738
+ StateActionToTensor(apply_to=self.action_keys),
739
+ StateActionTransform(
740
+ apply_to=self.action_keys,
741
+ normalization_modes={
742
+ "action.gripper_position": "binary",
743
+ },
744
+ target_rotations={"action.eef_rotation_delta": "axis_angle"},
745
+ ),
746
+ # concat transforms
747
+ ConcatTransform(
748
+ video_concat_order=self.video_keys,
749
+ state_concat_order=self.state_keys,
750
+ action_concat_order=self.action_keys,
751
+ ),
752
+ GR00TTransform(
753
+ state_horizon=len(self.observation_indices),
754
+ action_horizon=len(self.action_indices),
755
+ max_state_dim=64,
756
+ max_action_dim=32,
757
+ ),
758
+ ]
759
+
760
+ return ComposedModalityTransform(transforms=transforms)
761
+
762
+
763
+ ###########################################################################################
764
+
765
+
766
+ class AgibotGenie1DataConfig(BaseDataConfig):
767
+ video_keys = [
768
+ "video.top_head",
769
+ "video.hand_left",
770
+ "video.hand_right",
771
+ ]
772
+ state_keys = [
773
+ "state.left_arm_joint_position",
774
+ "state.right_arm_joint_position",
775
+ "state.left_effector_position",
776
+ "state.right_effector_position",
777
+ "state.head_position",
778
+ "state.waist_position",
779
+ ]
780
+ action_keys = [
781
+ "action.left_arm_joint_position",
782
+ "action.right_arm_joint_position",
783
+ "action.left_effector_position",
784
+ "action.right_effector_position",
785
+ "action.head_position",
786
+ "action.waist_position",
787
+ "action.robot_velocity",
788
+ ]
789
+ language_keys = ["annotation.language.action_text"]
790
+ observation_indices = [0]
791
+ action_indices = list(range(16))
792
+
793
+ def transform(self):
794
+ transforms = [
795
+ # video transforms
796
+ VideoToTensor(apply_to=self.video_keys),
797
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
798
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
799
+ VideoColorJitter(
800
+ apply_to=self.video_keys,
801
+ brightness=0.3,
802
+ contrast=0.4,
803
+ saturation=0.5,
804
+ hue=0.08,
805
+ ),
806
+ VideoToNumpy(apply_to=self.video_keys),
807
+ # state transforms
808
+ StateActionToTensor(apply_to=self.state_keys),
809
+ StateActionTransform(
810
+ apply_to=self.state_keys,
811
+ normalization_modes={key: "min_max" for key in self.state_keys},
812
+ ),
813
+ # action transforms
814
+ StateActionToTensor(apply_to=self.action_keys),
815
+ StateActionTransform(
816
+ apply_to=self.action_keys,
817
+ normalization_modes={key: "min_max" for key in self.action_keys},
818
+ ),
819
+ # concat transforms
820
+ ConcatTransform(
821
+ video_concat_order=self.video_keys,
822
+ state_concat_order=self.state_keys,
823
+ action_concat_order=self.action_keys,
824
+ ),
825
+ GR00TTransform(
826
+ state_horizon=len(self.observation_indices),
827
+ action_horizon=len(self.action_indices),
828
+ max_state_dim=64,
829
+ max_action_dim=32,
830
+ ),
831
+ ]
832
+
833
+ return ComposedModalityTransform(transforms=transforms)
834
+
835
+
836
+ ###########################################################################################
837
+
838
+
839
+ class Gr1DataConfig(BaseDataConfig):
840
+ video_keys = ["video.camera_ego", "video.camera_ext"]
841
+ state_keys = [
842
+ "state.torso_joints",
843
+ "state.head_joints",
844
+ "state.right_arm_joints",
845
+ "state.left_arm_joints",
846
+ "state.right_hand_joints",
847
+ "state.left_hand_joints",
848
+ ]
849
+ action_keys = [
850
+ "action.right_arm_eef_pos",
851
+ "action.left_arm_eef_pos",
852
+ "action.right_finger_joints",
853
+ "action.left_finger_joints",
854
+ ]
855
+ language_keys = ["annotation.human.task_description"]
856
+ observation_indices = [0]
857
+ action_indices = list(range(16))
858
+
859
+ def modality_config(self) -> dict[str, ModalityConfig]:
860
+ video_modality = ModalityConfig(
861
+ delta_indices=self.observation_indices,
862
+ modality_keys=self.video_keys,
863
+ )
864
+
865
+ state_modality = ModalityConfig(
866
+ delta_indices=self.observation_indices,
867
+ modality_keys=self.state_keys,
868
+ )
869
+
870
+ action_modality = ModalityConfig(
871
+ delta_indices=self.action_indices,
872
+ modality_keys=self.action_keys,
873
+ )
874
+
875
+ language_modality = ModalityConfig(
876
+ delta_indices=self.observation_indices,
877
+ modality_keys=self.language_keys,
878
+ )
879
+
880
+ modality_configs = {
881
+ "video": video_modality,
882
+ "state": state_modality,
883
+ "action": action_modality,
884
+ "language": language_modality,
885
+ }
886
+
887
+ return modality_configs
888
+
889
+ def transform(self) -> ModalityTransform:
890
+ transforms = [
891
+ # video transforms
892
+ VideoToTensor(apply_to=self.video_keys),
893
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
894
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
895
+ VideoColorJitter(
896
+ apply_to=self.video_keys,
897
+ brightness=0.3,
898
+ contrast=0.4,
899
+ saturation=0.5,
900
+ hue=0.08,
901
+ ),
902
+ VideoToNumpy(apply_to=self.video_keys),
903
+ # state transforms
904
+ StateActionToTensor(apply_to=self.state_keys),
905
+ StateActionTransform(
906
+ apply_to=self.state_keys,
907
+ normalization_modes={key: "min_max" for key in self.state_keys},
908
+ ),
909
+ # action transforms
910
+ StateActionToTensor(apply_to=self.action_keys),
911
+ StateActionTransform(
912
+ apply_to=self.action_keys,
913
+ normalization_modes={key: "min_max" for key in self.action_keys},
914
+ ),
915
+ # concat transforms
916
+ ConcatTransform(
917
+ video_concat_order=self.video_keys,
918
+ state_concat_order=self.state_keys,
919
+ action_concat_order=self.action_keys,
920
+ ),
921
+ # model-specific transform
922
+ GR00TTransform(
923
+ state_horizon=len(self.observation_indices),
924
+ action_horizon=len(self.action_indices),
925
+ max_state_dim=64,
926
+ max_action_dim=32,
927
+ ),
928
+ ]
929
+ return ComposedModalityTransform(transforms=transforms)
930
+
931
+
932
+ ###########################################################################################
933
+
934
+
935
+ class Gr1NoImageDataConfig(BaseDataConfig):
936
+ video_keys = []
937
+ state_keys = [
938
+ "state.torso_joints",
939
+ "state.head_joints",
940
+ "state.right_arm_joints",
941
+ "state.left_arm_joints",
942
+ "state.right_hand_joints",
943
+ "state.left_hand_joints",
944
+ ]
945
+ action_keys = [
946
+ "action.right_arm_eef_pos",
947
+ "action.left_arm_eef_pos",
948
+ "action.right_finger_joints",
949
+ "action.left_finger_joints",
950
+ ]
951
+ language_keys = ["annotation.human.task_description"]
952
+ observation_indices = [0]
953
+ action_indices = list(range(16))
954
+
955
+ def modality_config(self) -> dict[str, ModalityConfig]:
956
+ video_modality = ModalityConfig(
957
+ delta_indices=self.observation_indices,
958
+ modality_keys=self.video_keys,
959
+ )
960
+
961
+ state_modality = ModalityConfig(
962
+ delta_indices=self.observation_indices,
963
+ modality_keys=self.state_keys,
964
+ )
965
+
966
+ action_modality = ModalityConfig(
967
+ delta_indices=self.action_indices,
968
+ modality_keys=self.action_keys,
969
+ )
970
+
971
+ language_modality = ModalityConfig(
972
+ delta_indices=self.observation_indices,
973
+ modality_keys=self.language_keys,
974
+ )
975
+
976
+ modality_configs = {
977
+ "video": video_modality,
978
+ "state": state_modality,
979
+ "action": action_modality,
980
+ "language": language_modality,
981
+ }
982
+
983
+ return modality_configs
984
+
985
+ def transform(self) -> ModalityTransform:
986
+ transforms = []
987
+
988
+ # video transforms - only add if video_keys is not empty
989
+ if self.video_keys:
990
+ transforms.extend(
991
+ [
992
+ VideoToTensor(apply_to=self.video_keys),
993
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
994
+ VideoResize(
995
+ apply_to=self.video_keys, height=224, width=224, interpolation="linear"
996
+ ),
997
+ VideoColorJitter(
998
+ apply_to=self.video_keys,
999
+ brightness=0.3,
1000
+ contrast=0.4,
1001
+ saturation=0.5,
1002
+ hue=0.08,
1003
+ ),
1004
+ VideoToNumpy(apply_to=self.video_keys),
1005
+ ]
1006
+ )
1007
+
1008
+ # state transforms
1009
+ transforms.extend(
1010
+ [
1011
+ StateActionToTensor(apply_to=self.state_keys),
1012
+ StateActionTransform(
1013
+ apply_to=self.state_keys,
1014
+ normalization_modes={key: "min_max" for key in self.state_keys},
1015
+ ),
1016
+ ]
1017
+ )
1018
+
1019
+ # action transforms
1020
+ transforms.extend(
1021
+ [
1022
+ StateActionToTensor(apply_to=self.action_keys),
1023
+ StateActionTransform(
1024
+ apply_to=self.action_keys,
1025
+ normalization_modes={key: "min_max" for key in self.action_keys},
1026
+ ),
1027
+ ]
1028
+ )
1029
+
1030
+ # concat transforms
1031
+ transforms.append(
1032
+ ConcatTransform(
1033
+ video_concat_order=self.video_keys,
1034
+ state_concat_order=self.state_keys,
1035
+ action_concat_order=self.action_keys,
1036
+ )
1037
+ )
1038
+
1039
+ # model-specific transform
1040
+ transforms.append(
1041
+ GR00TTransform(
1042
+ state_horizon=len(self.observation_indices),
1043
+ action_horizon=len(self.action_indices),
1044
+ max_state_dim=64,
1045
+ max_action_dim=32,
1046
+ )
1047
+ )
1048
+
1049
+ return ComposedModalityTransform(transforms=transforms)
1050
+
1051
+ ###########################################################################################
1052
+
1053
+
1054
+ class egodex_naive_config(BaseDataConfig):
1055
+ video_keys = ["video.camera"]
1056
+ state_keys = [
1057
+ "state.left_hand",
1058
+ "state.left_hand_rotation",
1059
+ "state.left_hand_fingertips",
1060
+ "state.right_hand",
1061
+ "state.right_hand_rotation",
1062
+ "state.right_hand_fingertips",
1063
+ ]
1064
+ action_keys = [
1065
+ "action.left_hand",
1066
+ "action.left_hand_rotation",
1067
+ "action.left_hand_fingertips",
1068
+ "action.right_hand",
1069
+ "action.right_hand_rotation",
1070
+ "action.right_hand_fingertips",
1071
+ ]
1072
+ language_keys = ["annotation.language_instruction"]
1073
+ observation_indices = [0]
1074
+ action_indices = list(range(16))
1075
+ action_dim = 48
1076
+
1077
+ def modality_config(self) -> dict[str, ModalityConfig]:
1078
+ video_modality = ModalityConfig(
1079
+ delta_indices=self.observation_indices,
1080
+ modality_keys=self.video_keys,
1081
+ )
1082
+
1083
+ state_modality = ModalityConfig(
1084
+ delta_indices=self.observation_indices,
1085
+ modality_keys=self.state_keys,
1086
+ )
1087
+
1088
+ action_modality = ModalityConfig(
1089
+ delta_indices=self.action_indices,
1090
+ modality_keys=self.action_keys,
1091
+ )
1092
+
1093
+ language_modality = ModalityConfig(
1094
+ delta_indices=self.observation_indices,
1095
+ modality_keys=self.language_keys,
1096
+ )
1097
+
1098
+ modality_configs = {
1099
+ "video": video_modality,
1100
+ "state": state_modality,
1101
+ "action": action_modality,
1102
+ "language": language_modality,
1103
+ }
1104
+
1105
+ return modality_configs
1106
+
1107
+ def transform(self) -> ModalityTransform:
1108
+ transforms = [
1109
+ # video transforms
1110
+ VideoToTensor(apply_to=self.video_keys),
1111
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1112
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1113
+ VideoColorJitter(
1114
+ apply_to=self.video_keys,
1115
+ brightness=0.2,
1116
+ contrast=0.2,
1117
+ saturation=0.1,
1118
+ hue=0.0,
1119
+ ),
1120
+ VideoToNumpy(apply_to=self.video_keys),
1121
+ # state transforms
1122
+ StateActionToTensor(apply_to=self.state_keys),
1123
+ StateActionTransform(
1124
+ apply_to=self.state_keys,
1125
+ normalization_modes={key: "q99" for key in self.state_keys},
1126
+ ),
1127
+ # action transforms
1128
+ StateActionToTensor(apply_to=self.action_keys),
1129
+ StateActionTransform(
1130
+ apply_to=self.action_keys,
1131
+ normalization_modes={key: "q99" for key in self.action_keys},
1132
+ ),
1133
+ # concat transforms
1134
+ ConcatTransform(
1135
+ video_concat_order=self.video_keys,
1136
+ state_concat_order=self.state_keys,
1137
+ action_concat_order=self.action_keys,
1138
+ ),
1139
+ # model-specific transform
1140
+ GR00TTransform(
1141
+ state_horizon=len(self.observation_indices),
1142
+ action_horizon=len(self.action_indices),
1143
+ max_state_dim=64,
1144
+ max_action_dim=self.action_dim,
1145
+ ),
1146
+ ]
1147
+ return ComposedModalityTransform(transforms=transforms)
1148
+
1149
+
1150
+ class egodex_mano_config(BaseDataConfig):
1151
+ video_keys = ["video.camera"]
1152
+ state_keys = [
1153
+ "state.left_hand",
1154
+ "state.left_hand_rotation",
1155
+ "state.left_hand_mano21_joints",
1156
+ "state.right_hand",
1157
+ "state.right_hand_rotation",
1158
+ "state.right_hand_mano21_joints",
1159
+ ]
1160
+ action_keys = [
1161
+ "action.left_hand",
1162
+ "action.left_hand_rotation",
1163
+ "action.left_hand_mano21_joints",
1164
+ "action.right_hand",
1165
+ "action.right_hand_rotation",
1166
+ "action.right_hand_mano21_joints",
1167
+ ]
1168
+ language_keys = ["annotation.language_instruction"]
1169
+ observation_indices = [0]
1170
+ action_indices = list(range(16))
1171
+ action_dim = 144
1172
+
1173
+ def modality_config(self) -> dict[str, ModalityConfig]:
1174
+ video_modality = ModalityConfig(
1175
+ delta_indices=self.observation_indices,
1176
+ modality_keys=self.video_keys,
1177
+ )
1178
+
1179
+ state_modality = ModalityConfig(
1180
+ delta_indices=self.observation_indices,
1181
+ modality_keys=self.state_keys,
1182
+ )
1183
+
1184
+ action_modality = ModalityConfig(
1185
+ delta_indices=self.action_indices,
1186
+ modality_keys=self.action_keys,
1187
+ )
1188
+
1189
+ language_modality = ModalityConfig(
1190
+ delta_indices=self.observation_indices,
1191
+ modality_keys=self.language_keys,
1192
+ )
1193
+
1194
+ modality_configs = {
1195
+ "video": video_modality,
1196
+ "state": state_modality,
1197
+ "action": action_modality,
1198
+ "language": language_modality,
1199
+ }
1200
+
1201
+ return modality_configs
1202
+
1203
+ def transform(self) -> ModalityTransform:
1204
+ transforms = [
1205
+ # video transforms
1206
+ VideoToTensor(apply_to=self.video_keys),
1207
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1208
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1209
+ VideoColorJitter(
1210
+ apply_to=self.video_keys,
1211
+ brightness=0.2,
1212
+ contrast=0.2,
1213
+ saturation=0.1,
1214
+ hue=0.0,
1215
+ ),
1216
+ VideoToNumpy(apply_to=self.video_keys),
1217
+ # state transforms
1218
+ StateActionToTensor(apply_to=self.state_keys),
1219
+ StateActionTransform(
1220
+ apply_to=self.state_keys,
1221
+ normalization_modes={key: "q99" for key in self.state_keys},
1222
+ ),
1223
+ # action transforms
1224
+ StateActionToTensor(apply_to=self.action_keys),
1225
+ StateActionTransform(
1226
+ apply_to=self.action_keys,
1227
+ normalization_modes={key: "q99" for key in self.action_keys},
1228
+ ),
1229
+ # concat transforms
1230
+ ConcatTransform(
1231
+ video_concat_order=self.video_keys,
1232
+ state_concat_order=self.state_keys,
1233
+ action_concat_order=self.action_keys,
1234
+ ),
1235
+ # model-specific transform
1236
+ GR00TTransform(
1237
+ state_horizon=len(self.observation_indices),
1238
+ action_horizon=len(self.action_indices),
1239
+ max_state_dim=64,
1240
+ max_action_dim=self.action_dim,
1241
+ ),
1242
+ ]
1243
+ return ComposedModalityTransform(transforms=transforms)
1244
+
1245
+
1246
+ class agibot_naive_config(BaseDataConfig):
1247
+ video_keys = ["video.top_head", "video.hand_left", "video.hand_right"]
1248
+ state_keys = ["state.observation_state"]
1249
+ action_keys = ["action.action"]
1250
+ language_keys = ["annotation.language_instruction"]
1251
+ observation_indices = [0]
1252
+ action_indices = list(range(22)) # 22-dimensional action
1253
+ action_dim = 22
1254
+
1255
+ def modality_config(self) -> dict[str, ModalityConfig]:
1256
+ video_modality = ModalityConfig(
1257
+ delta_indices=self.observation_indices,
1258
+ modality_keys=self.video_keys,
1259
+ )
1260
+
1261
+ state_modality = ModalityConfig(
1262
+ delta_indices=self.observation_indices,
1263
+ modality_keys=self.state_keys,
1264
+ )
1265
+
1266
+ action_modality = ModalityConfig(
1267
+ delta_indices=self.action_indices,
1268
+ modality_keys=self.action_keys,
1269
+ )
1270
+
1271
+ language_modality = ModalityConfig(
1272
+ delta_indices=self.observation_indices,
1273
+ modality_keys=self.language_keys,
1274
+ )
1275
+
1276
+ modality_configs = {
1277
+ "video": video_modality,
1278
+ "state": state_modality,
1279
+ "action": action_modality,
1280
+ "language": language_modality,
1281
+ }
1282
+
1283
+ return modality_configs
1284
+
1285
+ def transform(self) -> ModalityTransform:
1286
+ transforms = [
1287
+ # video transforms
1288
+ VideoToTensor(apply_to=self.video_keys),
1289
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1290
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1291
+ VideoColorJitter(
1292
+ apply_to=self.video_keys,
1293
+ brightness=0.2,
1294
+ contrast=0.2,
1295
+ saturation=0.1,
1296
+ hue=0.0,
1297
+ ),
1298
+ VideoToNumpy(apply_to=self.video_keys),
1299
+ # state transforms
1300
+ StateActionToTensor(apply_to=self.state_keys),
1301
+ StateActionTransform(
1302
+ apply_to=self.state_keys,
1303
+ normalization_modes={key: "q99" for key in self.state_keys},
1304
+ ),
1305
+ # action transforms
1306
+ StateActionToTensor(apply_to=self.action_keys),
1307
+ StateActionTransform(
1308
+ apply_to=self.action_keys,
1309
+ normalization_modes={key: "q99" for key in self.action_keys},
1310
+ ),
1311
+ # concat transforms
1312
+ ConcatTransform(
1313
+ video_concat_order=self.video_keys,
1314
+ state_concat_order=self.state_keys,
1315
+ action_concat_order=self.action_keys,
1316
+ ),
1317
+ # model-specific transform
1318
+ GR00TTransform(
1319
+ state_horizon=len(self.observation_indices),
1320
+ action_horizon=len(self.action_indices),
1321
+ max_state_dim=64,
1322
+ max_action_dim=self.action_dim,
1323
+ ),
1324
+ ]
1325
+ return ComposedModalityTransform(transforms=transforms)
1326
+
1327
+ class allex_thetwo_ck40_egostereo_config(BaseDataConfig):
1328
+ video_keys = ["video.camera_ego_left", "video.camera_ego_right"]
1329
+ state_keys = [
1330
+ "state.right_arm_joints",
1331
+ "state.left_arm_joints",
1332
+ "state.right_hand_joints",
1333
+ "state.left_hand_joints",
1334
+ "state.neck_joints",
1335
+ "state.waist_joints",
1336
+ ]
1337
+ action_keys = [
1338
+ "action.right_arm_joints",
1339
+ "action.left_arm_joints",
1340
+ "action.right_hand_joints",
1341
+ "action.left_hand_joints",
1342
+ "action.neck_joints",
1343
+ "action.waist_joints",
1344
+ ]
1345
+ language_keys = ["annotation.human.task_description"]
1346
+ observation_indices = [0]
1347
+ action_indices = list(range(40))
1348
+ action_dim = 48
1349
+
1350
+ def modality_config(self) -> dict[str, ModalityConfig]:
1351
+ video_modality = ModalityConfig(
1352
+ delta_indices=self.observation_indices,
1353
+ modality_keys=self.video_keys,
1354
+ )
1355
+
1356
+ state_modality = ModalityConfig(
1357
+ delta_indices=self.observation_indices,
1358
+ modality_keys=self.state_keys,
1359
+ )
1360
+
1361
+ action_modality = ModalityConfig(
1362
+ delta_indices=self.action_indices,
1363
+ modality_keys=self.action_keys,
1364
+ )
1365
+
1366
+ language_modality = ModalityConfig(
1367
+ delta_indices=self.observation_indices,
1368
+ modality_keys=self.language_keys,
1369
+ )
1370
+
1371
+ modality_configs = {
1372
+ "video": video_modality,
1373
+ "state": state_modality,
1374
+ "action": action_modality,
1375
+ "language": language_modality,
1376
+ }
1377
+
1378
+ return modality_configs
1379
+
1380
+ def transform(self) -> ModalityTransform:
1381
+ transforms = [
1382
+ # video transforms
1383
+ VideoToTensor(apply_to=self.video_keys),
1384
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1385
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1386
+ VideoColorJitter(
1387
+ apply_to=self.video_keys,
1388
+ brightness=0.2,
1389
+ contrast=0.2,
1390
+ saturation=0.2,
1391
+ hue=0.1,
1392
+ ),
1393
+ VideoToNumpy(apply_to=self.video_keys),
1394
+ # state transforms
1395
+ StateActionToTensor(apply_to=self.state_keys),
1396
+ StateActionTransform(
1397
+ apply_to=self.state_keys,
1398
+ normalization_modes={key: "q99" for key in self.state_keys},
1399
+ ),
1400
+ # action transforms
1401
+ StateActionToTensor(apply_to=self.action_keys),
1402
+ StateActionTransform(
1403
+ apply_to=self.action_keys,
1404
+ normalization_modes={key: "q99" for key in self.action_keys},
1405
+ ),
1406
+ # concat transforms
1407
+ ConcatTransform(
1408
+ video_concat_order=self.video_keys,
1409
+ state_concat_order=self.state_keys,
1410
+ action_concat_order=self.action_keys,
1411
+ ),
1412
+ # model-specific transform
1413
+ GR00TTransform(
1414
+ state_horizon=len(self.observation_indices),
1415
+ action_horizon=len(self.action_indices),
1416
+ max_state_dim=64,
1417
+ max_action_dim=self.action_dim,
1418
+ ),
1419
+ ]
1420
+ return ComposedModalityTransform(transforms=transforms)
1421
+
1422
+
1423
+ ###########################################################################################
1424
+
1425
+ class openarm_ck40_egostereo_config(BaseDataConfig):
1426
+ video_keys = ["video.camera_ego_left", "video.camera_ego_right"]
1427
+ state_keys = [
1428
+ "state.right_arm_joints",
1429
+ "state.left_arm_joints",
1430
+ "state.right_hand_joints",
1431
+ "state.left_hand_joints",
1432
+ "state.neck_joints",
1433
+ ]
1434
+ action_keys = [
1435
+ "action.right_arm_joints",
1436
+ "action.left_arm_joints",
1437
+ "action.right_hand_joints",
1438
+ "action.left_hand_joints",
1439
+ "action.neck_joints",
1440
+ ]
1441
+ language_keys = ["annotation.human.task_description"]
1442
+ observation_indices = [0]
1443
+ action_indices = list(range(40))
1444
+ action_dim = 28
1445
+
1446
+ def modality_config(self) -> dict[str, ModalityConfig]:
1447
+ video_modality = ModalityConfig(
1448
+ delta_indices=self.observation_indices,
1449
+ modality_keys=self.video_keys,
1450
+ )
1451
+
1452
+ state_modality = ModalityConfig(
1453
+ delta_indices=self.observation_indices,
1454
+ modality_keys=self.state_keys,
1455
+ )
1456
+
1457
+ action_modality = ModalityConfig(
1458
+ delta_indices=self.action_indices,
1459
+ modality_keys=self.action_keys,
1460
+ )
1461
+
1462
+ language_modality = ModalityConfig(
1463
+ delta_indices=self.observation_indices,
1464
+ modality_keys=self.language_keys,
1465
+ )
1466
+
1467
+ modality_configs = {
1468
+ "video": video_modality,
1469
+ "state": state_modality,
1470
+ "action": action_modality,
1471
+ "language": language_modality,
1472
+ }
1473
+
1474
+ return modality_configs
1475
+
1476
+ def transform(self) -> ModalityTransform:
1477
+ transforms = [
1478
+ # video transforms
1479
+ VideoToTensor(apply_to=self.video_keys),
1480
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1481
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1482
+ VideoColorJitter(
1483
+ apply_to=self.video_keys,
1484
+ brightness=0.2,
1485
+ contrast=0.2,
1486
+ saturation=0.2,
1487
+ hue=0.1,
1488
+ ),
1489
+ VideoToNumpy(apply_to=self.video_keys),
1490
+ # state transforms
1491
+ StateActionToTensor(apply_to=self.state_keys),
1492
+ StateActionTransform(
1493
+ apply_to=self.state_keys,
1494
+ normalization_modes={key: "q99" for key in self.state_keys},
1495
+ ),
1496
+ # action transforms
1497
+ StateActionToTensor(apply_to=self.action_keys),
1498
+ StateActionTransform(
1499
+ apply_to=self.action_keys,
1500
+ normalization_modes={key: "q99" for key in self.action_keys},
1501
+ ),
1502
+ # concat transforms
1503
+ ConcatTransform(
1504
+ video_concat_order=self.video_keys,
1505
+ state_concat_order=self.state_keys,
1506
+ action_concat_order=self.action_keys,
1507
+ ),
1508
+ # model-specific transform
1509
+ GR00TTransform(
1510
+ state_horizon=len(self.observation_indices),
1511
+ action_horizon=len(self.action_indices),
1512
+ max_state_dim=64,
1513
+ max_action_dim=self.action_dim,
1514
+ ),
1515
+ ]
1516
+ return ComposedModalityTransform(transforms=transforms)
1517
+
1518
+
1519
+ class AgibotBetaDataConfig(BaseDataConfig):
1520
+ video_keys = [
1521
+ "video.top_head",
1522
+ ]
1523
+ state_keys = [
1524
+ "state.left_arm_joint_position",
1525
+ "state.right_arm_joint_position",
1526
+ "state.left_effector_position",
1527
+ "state.right_effector_position",
1528
+ "state.head_position",
1529
+ "state.waist_position",
1530
+ ]
1531
+ action_keys = [
1532
+ "action.left_arm_joint_position",
1533
+ "action.right_arm_joint_position",
1534
+ "action.left_effector_position",
1535
+ "action.right_effector_position",
1536
+ "action.head_position",
1537
+ "action.waist_position",
1538
+ ]
1539
+ language_keys = ["annotation.language.action_text"]
1540
+ observation_indices = [0]
1541
+ action_indices = list(range(16))
1542
+
1543
+ def transform(self):
1544
+ transforms = [
1545
+ # video transforms
1546
+ VideoToTensor(apply_to=self.video_keys),
1547
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1548
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1549
+ VideoColorJitter(
1550
+ apply_to=self.video_keys,
1551
+ brightness=0.3,
1552
+ contrast=0.4,
1553
+ saturation=0.5,
1554
+ hue=0.08,
1555
+ ),
1556
+ VideoToNumpy(apply_to=self.video_keys),
1557
+ # state transforms
1558
+ StateActionToTensor(apply_to=self.state_keys),
1559
+ StateActionTransform(
1560
+ apply_to=self.state_keys,
1561
+ normalization_modes={key: "min_max" for key in self.state_keys},
1562
+ ),
1563
+ # action transforms
1564
+ StateActionToTensor(apply_to=self.action_keys),
1565
+ StateActionTransform(
1566
+ apply_to=self.action_keys,
1567
+ normalization_modes={key: "min_max" for key in self.action_keys},
1568
+ ),
1569
+ # concat transforms
1570
+ ConcatTransform(
1571
+ video_concat_order=self.video_keys,
1572
+ state_concat_order=self.state_keys,
1573
+ action_concat_order=self.action_keys,
1574
+ ),
1575
+ GR00TTransform(
1576
+ state_horizon=len(self.observation_indices),
1577
+ action_horizon=len(self.action_indices),
1578
+ max_state_dim=64,
1579
+ max_action_dim=32,
1580
+ ),
1581
+ ]
1582
+
1583
+ return ComposedModalityTransform(transforms=transforms)
1584
+
1585
+
1586
+ class egodex_naive_config(BaseDataConfig):
1587
+ video_keys = ["video.camera"]
1588
+ state_keys = [
1589
+ "state.left_hand",
1590
+ "state.left_hand_rotation",
1591
+ "state.left_hand_fingertips",
1592
+ "state.right_hand",
1593
+ "state.right_hand_rotation",
1594
+ "state.right_hand_fingertips",
1595
+ ]
1596
+ action_keys = [
1597
+ "action.left_hand",
1598
+ "action.left_hand_rotation",
1599
+ "action.left_hand_fingertips",
1600
+ "action.right_hand",
1601
+ "action.right_hand_rotation",
1602
+ "action.right_hand_fingertips",
1603
+ ]
1604
+ language_keys = ["annotation.language_instruction"]
1605
+ observation_indices = [0]
1606
+ action_indices = list(range(16))
1607
+ action_dim = 48
1608
+
1609
+ def modality_config(self) -> dict[str, ModalityConfig]:
1610
+ video_modality = ModalityConfig(
1611
+ delta_indices=self.observation_indices,
1612
+ modality_keys=self.video_keys,
1613
+ )
1614
+
1615
+ state_modality = ModalityConfig(
1616
+ delta_indices=self.observation_indices,
1617
+ modality_keys=self.state_keys,
1618
+ )
1619
+
1620
+ action_modality = ModalityConfig(
1621
+ delta_indices=self.action_indices,
1622
+ modality_keys=self.action_keys,
1623
+ )
1624
+
1625
+ language_modality = ModalityConfig(
1626
+ delta_indices=self.observation_indices,
1627
+ modality_keys=self.language_keys,
1628
+ )
1629
+
1630
+ modality_configs = {
1631
+ "video": video_modality,
1632
+ "state": state_modality,
1633
+ "action": action_modality,
1634
+ "language": language_modality,
1635
+ }
1636
+
1637
+ return modality_configs
1638
+
1639
+ def transform(self) -> ModalityTransform:
1640
+ transforms = [
1641
+ # video transforms
1642
+ VideoToTensor(apply_to=self.video_keys),
1643
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1644
+ VideoResize(apply_to=self.video_keys, height=224, width=224, interpolation="linear"),
1645
+ VideoColorJitter(
1646
+ apply_to=self.video_keys,
1647
+ brightness=0.2,
1648
+ contrast=0.2,
1649
+ saturation=0.1,
1650
+ hue=0.0,
1651
+ ),
1652
+ VideoToNumpy(apply_to=self.video_keys),
1653
+ # state transforms
1654
+ StateActionToTensor(apply_to=self.state_keys),
1655
+ StateActionTransform(
1656
+ apply_to=self.state_keys,
1657
+ normalization_modes={key: "q99" for key in self.state_keys},
1658
+ ),
1659
+ # action transforms
1660
+ StateActionToTensor(apply_to=self.action_keys),
1661
+ StateActionTransform(
1662
+ apply_to=self.action_keys,
1663
+ normalization_modes={key: "q99" for key in self.action_keys},
1664
+ ),
1665
+ # concat transforms
1666
+ ConcatTransform(
1667
+ video_concat_order=self.video_keys,
1668
+ state_concat_order=self.state_keys,
1669
+ action_concat_order=self.action_keys,
1670
+ ),
1671
+ # model-specific transform
1672
+ GR00TTransform(
1673
+ state_horizon=len(self.observation_indices),
1674
+ action_horizon=len(self.action_indices),
1675
+ max_state_dim=64,
1676
+ max_action_dim=self.action_dim,
1677
+ ),
1678
+ ]
1679
+ return ComposedModalityTransform(transforms=transforms)
1680
+
1681
+
1682
+ ###########################################################################################
1683
+
1684
+
1685
+ @dataclass
1686
+ class AllexSimDataConfig(BaseDataConfig):
1687
+ video_keys = ["video.robot_pov_left_cam"]
1688
+ state_keys = [
1689
+ "state.left_eef_pos",
1690
+ "state.left_eef_quat",
1691
+ "state.right_eef_pos",
1692
+ "state.right_eef_quat",
1693
+ "state.hand_joint_state",
1694
+ "state.head_joint_state",
1695
+ ]
1696
+ action_keys = [
1697
+ "action.left_eef_pos",
1698
+ "action.left_eef_quat",
1699
+ "action.right_eef_pos",
1700
+ "action.right_eef_quat",
1701
+ "action.hand_joint_state",
1702
+ "action.head_joint_state",
1703
+ ]
1704
+ language_keys = ["annotation.human.task_description"]
1705
+ observation_indices = [0]
1706
+ action_indices = list(range(32))
1707
+ action_dim = 46
1708
+
1709
+ # For a remote client which sends image with 224x224 resolution
1710
+ # For _remote option, images are already resized to 224x224 by the environment
1711
+ is_remote: bool = False
1712
+
1713
+ def transform(self):
1714
+ transforms: list[ModalityTransform] = [
1715
+ # video transforms
1716
+ VideoToTensor(apply_to=self.video_keys, check_resolution=not self.is_remote),
1717
+ ]
1718
+ if not self.is_remote:
1719
+ transforms.extend(
1720
+ [
1721
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1722
+ VideoResize(
1723
+ apply_to=self.video_keys, height=224, width=224, interpolation="linear"
1724
+ ),
1725
+ ]
1726
+ )
1727
+ transforms.extend(
1728
+ [
1729
+ VideoColorJitter(
1730
+ apply_to=self.video_keys,
1731
+ brightness=0.3,
1732
+ contrast=0.4,
1733
+ saturation=0.5,
1734
+ hue=0.08,
1735
+ ),
1736
+ VideoToNumpy(apply_to=self.video_keys),
1737
+ # state transforms
1738
+ StateActionToTensor(apply_to=self.state_keys),
1739
+ StateActionTransform(
1740
+ apply_to=self.state_keys,
1741
+ normalization_modes={key: "min_max" for key in self.state_keys},
1742
+ ),
1743
+ # action transforms
1744
+ StateActionToTensor(apply_to=self.action_keys),
1745
+ StateActionTransform(
1746
+ apply_to=self.action_keys,
1747
+ normalization_modes={key: "min_max" for key in self.action_keys},
1748
+ ),
1749
+ # concat transforms
1750
+ ConcatTransform(
1751
+ video_concat_order=self.video_keys,
1752
+ state_concat_order=self.state_keys,
1753
+ action_concat_order=self.action_keys,
1754
+ ),
1755
+ GR00TTransform(
1756
+ state_horizon=len(self.observation_indices),
1757
+ action_horizon=len(self.action_indices),
1758
+ max_state_dim=64,
1759
+ max_action_dim=self.action_dim,
1760
+ ),
1761
+ ]
1762
+ )
1763
+
1764
+ return ComposedModalityTransform(transforms=transforms)
1765
+
1766
+
1767
+ @dataclass
1768
+ class AllexSimWithoutHeadDataConfig(AllexSimDataConfig):
1769
+ state_keys = [
1770
+ "state.left_eef_pos",
1771
+ "state.left_eef_quat",
1772
+ "state.right_eef_pos",
1773
+ "state.right_eef_quat",
1774
+ "state.hand_joint_state",
1775
+ ]
1776
+ action_keys = [
1777
+ "action.left_eef_pos",
1778
+ "action.left_eef_quat",
1779
+ "action.right_eef_pos",
1780
+ "action.right_eef_quat",
1781
+ "action.hand_joint_state",
1782
+ ]
1783
+ action_dim = 44
1784
+
1785
+
1786
+ @dataclass
1787
+ class AllexSimWithoutHeadDataConfig(AllexSimDataConfig):
1788
+ state_keys = [
1789
+ "state.left_eef_pos",
1790
+ "state.left_eef_quat",
1791
+ "state.right_eef_pos",
1792
+ "state.right_eef_quat",
1793
+ "state.hand_joint_state",
1794
+ ]
1795
+ action_keys = [
1796
+ "action.left_eef_pos",
1797
+ "action.left_eef_quat",
1798
+ "action.right_eef_pos",
1799
+ "action.right_eef_quat",
1800
+ "action.hand_joint_state",
1801
+ ]
1802
+ action_dim = 44
1803
+
1804
+
1805
+ ## Allex Real Data Configs
1806
+ # 1. Mono vs Stereo
1807
+ @dataclass
1808
+ class AllexRealMonoConfig(BaseDataConfig):
1809
+ video_keys = ["video.camera_ego_left"]
1810
+ state_keys = [
1811
+ "state.right_arm_joints",
1812
+ "state.left_arm_joints",
1813
+ "state.right_hand_joints",
1814
+ "state.left_hand_joints",
1815
+ "state.neck_joints",
1816
+ "state.waist_joints",
1817
+ ]
1818
+ action_keys = [
1819
+ "action.right_arm_joints",
1820
+ "action.left_arm_joints",
1821
+ "action.right_hand_joints",
1822
+ "action.left_hand_joints",
1823
+ "action.neck_joints",
1824
+ "action.waist_joints",
1825
+ ]
1826
+ language_keys = ["annotation.human.task_description"]
1827
+
1828
+ observation_indices = [0]
1829
+ action_indices = list(range(16))
1830
+ action_dim = 48
1831
+
1832
+ is_remote: bool = False
1833
+
1834
+
1835
+ def transform(self):
1836
+ transforms = [
1837
+ # video transforms
1838
+ VideoToTensor(apply_to=self.video_keys, check_resolution=not self.is_remote),
1839
+ ]
1840
+ if not self.is_remote:
1841
+ transforms.extend(
1842
+ [
1843
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1844
+ VideoResize(
1845
+ apply_to=self.video_keys, height=224, width=224, interpolation="linear"
1846
+ ),
1847
+ ]
1848
+ )
1849
+ transforms.extend(
1850
+ [
1851
+ VideoColorJitter(
1852
+ apply_to=self.video_keys,
1853
+ brightness=0.3,
1854
+ contrast=0.4,
1855
+ saturation=0.5,
1856
+ hue=0.08,
1857
+ ),
1858
+ VideoToNumpy(apply_to=self.video_keys),
1859
+ # state transforms
1860
+ StateActionToTensor(apply_to=self.state_keys),
1861
+ StateActionTransform(
1862
+ apply_to=self.state_keys,
1863
+ normalization_modes={key: "min_max" for key in self.state_keys},
1864
+ ),
1865
+ # action transforms
1866
+ StateActionToTensor(apply_to=self.action_keys),
1867
+ StateActionTransform(
1868
+ apply_to=self.action_keys,
1869
+ normalization_modes={key: "min_max" for key in self.action_keys},
1870
+ ),
1871
+ # concat transforms
1872
+ ConcatTransform(
1873
+ video_concat_order=self.video_keys,
1874
+ state_concat_order=self.state_keys,
1875
+ action_concat_order=self.action_keys,
1876
+ ),
1877
+ GR00TTransform(
1878
+ state_horizon=len(self.observation_indices),
1879
+ action_horizon=len(self.action_indices),
1880
+ max_state_dim=66,
1881
+ max_action_dim=self.action_dim,
1882
+ ),
1883
+ ]
1884
+ )
1885
+
1886
+ return ComposedModalityTransform(transforms=transforms)
1887
+
1888
+ class AllexRealStereoConfig(AllexRealMonoConfig):
1889
+ video_keys = ["video.camera_ego_left", "video.camera_ego_right"]
1890
+
1891
+
1892
+ @dataclass
1893
+ class Rby1WujiDataConfig(BaseDataConfig):
1894
+ video_keys = ["video.zed_left", "video.zed_right"]
1895
+ state_keys = ["state.joint_position"]
1896
+ action_keys = ["action.joint_position"]
1897
+ language_keys = ["annotation.human.task_description"]
1898
+ observation_indices = [0]
1899
+ action_indices = list(range(40))
1900
+ action_dim = 66
1901
+
1902
+ def transform(self):
1903
+ transforms = [
1904
+ VideoToTensor(apply_to=self.video_keys),
1905
+ VideoCrop(apply_to=self.video_keys, scale=0.95),
1906
+ VideoResize(
1907
+ apply_to=self.video_keys, height=224, width=224, interpolation="linear"
1908
+ ),
1909
+ ]
1910
+ transforms.extend(
1911
+ [
1912
+ VideoColorJitter(
1913
+ apply_to=self.video_keys,
1914
+ brightness=0.5,
1915
+ contrast=0.5,
1916
+ saturation=0.5,
1917
+ hue=0.2,
1918
+ ),
1919
+ VideoToNumpy(apply_to=self.video_keys),
1920
+ StateActionToTensor(apply_to=self.state_keys),
1921
+ StateActionTransform(
1922
+ apply_to=self.state_keys,
1923
+ normalization_modes={key: "min_max" for key in self.state_keys},
1924
+ ),
1925
+ StateActionToTensor(apply_to=self.action_keys),
1926
+ StateActionTransform(
1927
+ apply_to=self.action_keys,
1928
+ normalization_modes={key: "min_max" for key in self.action_keys},
1929
+ ),
1930
+ ConcatTransform(
1931
+ video_concat_order=self.video_keys,
1932
+ state_concat_order=self.state_keys,
1933
+ action_concat_order=self.action_keys,
1934
+ ),
1935
+ GR00TTransform(
1936
+ state_horizon=len(self.observation_indices),
1937
+ action_horizon=len(self.action_indices),
1938
+ max_state_dim=66,
1939
+ max_action_dim=self.action_dim,
1940
+ ),
1941
+ ]
1942
+ )
1943
+ return ComposedModalityTransform(transforms=transforms)
1944
+
1945
+
1946
+ ###########################################################################################
1947
+
1948
+
1949
+ DATA_CONFIG_MAP = {
1950
+ "fourier_gr1_arms_waist": FourierGr1ArmsWaistDataConfig(),
1951
+ "fourier_gr1_arms_waist_with_mano": FourierGr1ArmsWaistWithMANODataConfig(),
1952
+ "fourier_gr1_arms_only": FourierGr1ArmsOnlyDataConfig(),
1953
+ "fourier_gr1_full_upper_body": FourierGr1FullUpperBodyDataConfig(),
1954
+ "bimanual_panda_gripper": BimanualPandaGripperDataConfig(),
1955
+ "bimanual_panda_hand": BimanualPandaHandDataConfig(),
1956
+ "single_panda_gripper": SinglePandaGripperDataConfig(),
1957
+ "so100": So100DataConfig(),
1958
+ "so100_dualcam": So100DualCamDataConfig(),
1959
+ "unitree_g1": UnitreeG1DataConfig(),
1960
+ "unitree_g1_full_body": UnitreeG1FullBodyDataConfig(),
1961
+ "oxe_droid": OxeDroidDataConfig(),
1962
+ "agibot_genie1": AgibotGenie1DataConfig(),
1963
+ "gr1": Gr1DataConfig(),
1964
+ "gr1_no_image": Gr1NoImageDataConfig(),
1965
+
1966
+ "allex_thetwo_ck40_egostereo": allex_thetwo_ck40_egostereo_config(),
1967
+ "openarm_ck40_egostereo" : openarm_ck40_egostereo_config(),
1968
+ "egodex_naive": egodex_naive_config(),
1969
+ "egodex_mano": egodex_mano_config(),
1970
+ "agibot_naive": agibot_naive_config(),
1971
+ "agibot_beta1": AgibotBetaDataConfig(),
1972
+ "allex_sim": AllexSimDataConfig(),
1973
+ "allex_sim_remote": AllexSimDataConfig(is_remote=True),
1974
+ "allex_real_mono": AllexRealMonoConfig(),
1975
+ "allex_real_stereo": AllexRealStereoConfig(),
1976
+ "allex_sim_mono": AllexRealMonoConfig(is_remote=True),
1977
+ "allex_sim_stereo": AllexRealStereoConfig(is_remote=True),
1978
+ "allex_sim_without_head": AllexSimWithoutHeadDataConfig(),
1979
+ "allex_sim_without_head_remote": AllexSimWithoutHeadDataConfig(is_remote=True),
1980
+ "rby1_wuji": Rby1WujiDataConfig(),
1981
+ }
checkpoint-10000/experiment_cfg/metadata.json ADDED
@@ -0,0 +1,871 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "new_embodiment": {
3
+ "statistics": {
4
+ "state": {
5
+ "joint_position": {
6
+ "max": [
7
+ 0.0,
8
+ 0.0,
9
+ 0.0,
10
+ 0.0,
11
+ 0.0540158674120903,
12
+ 0.008619149215519428,
13
+ -0.01737499050796032,
14
+ 0.011375758796930313,
15
+ 0.009750650264322758,
16
+ 0.24627606570720673,
17
+ 1.0723133087158203,
18
+ -0.03839508444070816,
19
+ 0.9127033948898315,
20
+ 0.00562712736427784,
21
+ 1.8438678979873657,
22
+ 0.9284341931343079,
23
+ 0.01166592352092266,
24
+ 1.668798804283142,
25
+ 0.44089144468307495,
26
+ 0.9650468230247498,
27
+ 1.6112595796585083,
28
+ 1.2180935144424438,
29
+ 0.2822831869125366,
30
+ 1.6737557649612427,
31
+ 1.6050000190734863,
32
+ 1.4051698446273804,
33
+ 0.08793547004461288,
34
+ 1.6089346408843994,
35
+ 1.5881896018981934,
36
+ 1.6058990955352783,
37
+ 0.04430322349071503,
38
+ 1.642262578010559,
39
+ 1.6696302890777588,
40
+ 1.5714726448059082,
41
+ 0.15678425133228302,
42
+ 1.597379446029663,
43
+ 1.6042011976242065,
44
+ 0.5594549179077148,
45
+ 0.6180081963539124,
46
+ 0.7730085849761963,
47
+ -0.004590551368892193,
48
+ 1.4090687036514282,
49
+ 0.8553937077522278,
50
+ 2.8069164752960205,
51
+ 1.5063496828079224,
52
+ 0.3504810333251953,
53
+ 0.3728719651699066,
54
+ 1.5741111040115356,
55
+ 0.8213971853256226,
56
+ 0.04377385973930359,
57
+ 0.019891871139407158,
58
+ 0.7548595666885376,
59
+ 0.7196071743965149,
60
+ 0.10614115744829178,
61
+ 0.03523240610957146,
62
+ 0.7286884784698486,
63
+ 0.7055163383483887,
64
+ 0.1200566291809082,
65
+ 0.015419094823300838,
66
+ 0.32709723711013794,
67
+ 0.36708328127861023,
68
+ 0.03174339234828949,
69
+ 0.2119518369436264,
70
+ 1.3741862773895264,
71
+ 3.834952167380834e-06,
72
+ -0.5998210310935974
73
+ ],
74
+ "min": [
75
+ 0.0,
76
+ 0.0,
77
+ 0.0,
78
+ 0.0,
79
+ -0.01123147364705801,
80
+ -0.07895824313163757,
81
+ -0.05312357842922211,
82
+ -0.011371961794793606,
83
+ -0.009332981891930103,
84
+ -1.4737378358840942,
85
+ -0.7509519457817078,
86
+ -0.8685862421989441,
87
+ -0.7078827619552612,
88
+ -2.4324493408203125,
89
+ -1.2745846509933472,
90
+ -1.5293941497802734,
91
+ -2.6313905715942383,
92
+ -0.006295211613178253,
93
+ -0.21899408102035522,
94
+ -0.00438659219071269,
95
+ -0.06925009936094284,
96
+ -0.15651625394821167,
97
+ -0.10530298203229904,
98
+ -0.025816796347498894,
99
+ -0.15664206445217133,
100
+ -0.1108212023973465,
101
+ -0.3239299952983856,
102
+ -0.12454989552497864,
103
+ -0.013239393942058086,
104
+ -0.13757586479187012,
105
+ -0.21419131755828857,
106
+ -0.029919717460870743,
107
+ -0.16278579831123352,
108
+ -0.24725651741027832,
109
+ -0.20329144597053528,
110
+ -0.040407828986644745,
111
+ -0.45288756489753723,
112
+ -0.5406219363212585,
113
+ 0.05294891446828842,
114
+ -1.610653281211853,
115
+ -1.3624293804168701,
116
+ -1.4776990413665771,
117
+ -1.1493887901306152,
118
+ 0.5057151317596436,
119
+ -0.23366180062294006,
120
+ -0.005929233506321907,
121
+ -0.01660371571779251,
122
+ -0.01076052337884903,
123
+ -0.004445623606443405,
124
+ -0.0933440625667572,
125
+ -0.00807812251150608,
126
+ -0.0057681952603161335,
127
+ -0.006203831639140844,
128
+ -0.06876560300588608,
129
+ -0.030899088829755783,
130
+ -0.004242096561938524,
131
+ -0.005740335676819086,
132
+ -0.02193913422524929,
133
+ -0.01582074724137783,
134
+ -0.1694127470254898,
135
+ -0.13835637271404266,
136
+ -0.16616317629814148,
137
+ -0.11304554343223572,
138
+ -0.020096570253372192,
139
+ -0.00011888350854860619,
140
+ -0.6000050902366638
141
+ ],
142
+ "mean": [
143
+ 0.0,
144
+ 0.0,
145
+ 0.0,
146
+ 0.0,
147
+ 0.028436615020000926,
148
+ -0.03310545571542166,
149
+ -0.02831603293043386,
150
+ -0.002489809312055913,
151
+ 0.006085372967137075,
152
+ -0.3857839695392187,
153
+ -0.014985148514616058,
154
+ -0.20059175430304702,
155
+ 0.020276488908730312,
156
+ -1.0443872404524919,
157
+ 0.023258577339383097,
158
+ -0.1142688779204578,
159
+ -1.3937374825429794,
160
+ 0.6558063841791327,
161
+ -0.08927425675725988,
162
+ 0.5556008032503482,
163
+ 0.3315836355578498,
164
+ 0.46150773638786774,
165
+ 0.07591597844533991,
166
+ 0.14104579194652192,
167
+ 0.5127873260199133,
168
+ 0.4920027032391028,
169
+ -0.10307410965098776,
170
+ 0.07758055805280992,
171
+ 0.7382627254586968,
172
+ 0.47719562014059813,
173
+ -0.07254886993992268,
174
+ 0.030039534178359563,
175
+ 0.5418144791439871,
176
+ 0.4868134670421312,
177
+ -0.0349460562227702,
178
+ 0.050387433762820816,
179
+ 0.7191163128647389,
180
+ 0.12979891806355587,
181
+ 0.19467059752776283,
182
+ -0.32792739139429206,
183
+ -0.5835160602302355,
184
+ -0.08403944264891663,
185
+ 0.1390610603205583,
186
+ 1.8909284911263653,
187
+ 0.4950957846158421,
188
+ 0.2381789575086811,
189
+ 0.22612730576569545,
190
+ 0.18883140589784606,
191
+ 0.3740392055027783,
192
+ -0.04032462804929162,
193
+ 0.008036058131466807,
194
+ 0.37591508523796946,
195
+ 0.32645487851920024,
196
+ 0.03241679756760204,
197
+ 0.004207118530478635,
198
+ 0.2470434163746666,
199
+ 0.3192236355839802,
200
+ 0.04590547429864797,
201
+ 0.0011438596929481235,
202
+ 0.026781440791798408,
203
+ 0.15448183264024357,
204
+ -0.07243912484280461,
205
+ -0.07909043829547209,
206
+ 0.8210403514325914,
207
+ -6.785587383775982e-07,
208
+ -0.5999860812434155
209
+ ],
210
+ "std": [
211
+ 0.0,
212
+ 0.0,
213
+ 0.0,
214
+ 0.0,
215
+ 0.009296067132398588,
216
+ 0.01086359387947464,
217
+ 0.006314352853837057,
218
+ 0.002925380853794191,
219
+ 0.0020385652110806902,
220
+ 0.5772557183512236,
221
+ 0.2241035469436585,
222
+ 0.15809088377394892,
223
+ 0.22598680879665564,
224
+ 0.4978266314448067,
225
+ 0.3561884996721128,
226
+ 0.3675321177585772,
227
+ 0.42166286761675736,
228
+ 0.36183275641474494,
229
+ 0.10626282067482194,
230
+ 0.12482011798152456,
231
+ 0.26866527570065346,
232
+ 0.30810642814769373,
233
+ 0.03734363734743939,
234
+ 0.2114097732846818,
235
+ 0.3669871051808872,
236
+ 0.32013936266443427,
237
+ 0.03549505877621235,
238
+ 0.22845467340942432,
239
+ 0.36432746939207455,
240
+ 0.36440028820190384,
241
+ 0.03600196760548027,
242
+ 0.13540730582439536,
243
+ 0.4923920481254162,
244
+ 0.36305092071697675,
245
+ 0.04654938606664865,
246
+ 0.13235952485277672,
247
+ 0.544977801560137,
248
+ 0.19417089262103823,
249
+ 0.15282655265234688,
250
+ 0.39691415272888464,
251
+ 0.36750595534209435,
252
+ 0.2645354107326195,
253
+ 0.29782887812728026,
254
+ 0.3383129775685425,
255
+ 0.24432653890860784,
256
+ 0.08139908079444687,
257
+ 0.07271124468975532,
258
+ 0.10743092529527191,
259
+ 0.14195018048311717,
260
+ 0.020357388766090088,
261
+ 0.003097414753963125,
262
+ 0.13929327335560418,
263
+ 0.13106278870888993,
264
+ 0.026678754705602218,
265
+ 0.00694399976100214,
266
+ 0.10023644500237974,
267
+ 0.12793384947018274,
268
+ 0.02815470361234968,
269
+ 0.003935586811798535,
270
+ 0.0694829081784891,
271
+ 0.07363449848839962,
272
+ 0.03472060782374721,
273
+ 0.026450936545833404,
274
+ 0.2973850084596476,
275
+ 8.890689092689295e-06,
276
+ 5.75834431548448e-05
277
+ ],
278
+ "q01": [
279
+ 0.0,
280
+ 0.0,
281
+ 0.0,
282
+ 0.0,
283
+ 0.021795967778423478,
284
+ -0.05696901685593958,
285
+ -0.03935664577863019,
286
+ -0.006686112227699354,
287
+ 0.0034296106734705363,
288
+ -1.3782285479010636,
289
+ -0.4497670105132723,
290
+ -0.6115826922007739,
291
+ -0.471112083088,
292
+ -1.8056593314667733,
293
+ -0.8440315607185341,
294
+ -1.1620166715406945,
295
+ -2.2094820753430424,
296
+ 0.004424640155203718,
297
+ -0.20477636585097375,
298
+ 0.002928900425599604,
299
+ 0.0015383478993190457,
300
+ -0.01212095601448354,
301
+ -0.008005311749701495,
302
+ 0.11194821075318842,
303
+ 0.0034959941221639835,
304
+ -0.0065810612466863224,
305
+ -0.16402095284923815,
306
+ -0.002347375323843718,
307
+ 0.004417142143493669,
308
+ -0.02706049374707401,
309
+ -0.15492323031416538,
310
+ -0.006735840077794811,
311
+ -0.016644652454015817,
312
+ -0.05916948924771991,
313
+ -0.14416287514004364,
314
+ -0.006907277723886991,
315
+ -0.14870543771038225,
316
+ -0.31457522036852636,
317
+ 0.0667847342818645,
318
+ -1.2423695252790405,
319
+ -1.2364301253427723,
320
+ -0.676006123363979,
321
+ -0.6129555665262777,
322
+ 1.3263865136350677,
323
+ -0.05522501892142832,
324
+ 0.0018785531286670443,
325
+ 0.005422156973198853,
326
+ 0.0055092103669175715,
327
+ 0.002438621935963896,
328
+ -0.06136239921327097,
329
+ -0.0003404852201681719,
330
+ 0.00407590085360591,
331
+ 0.0025376104864900176,
332
+ -0.02240152342086038,
333
+ -0.011012942661455737,
334
+ 0.0030068742157323704,
335
+ 0.002395644428253512,
336
+ -0.0053736540469160685,
337
+ -0.006330441670826642,
338
+ -0.0917777787511835,
339
+ -0.009702130489274563,
340
+ -0.13127400019864632,
341
+ -0.09008882077147813,
342
+ 0.010362566773705529,
343
+ -2.4414687376485298e-06,
344
+ -0.5999960111901739
345
+ ],
346
+ "q99": [
347
+ 0.0,
348
+ 0.0,
349
+ 0.0,
350
+ 0.0,
351
+ 0.04437459776898255,
352
+ -0.006718148228764021,
353
+ -0.018936048997243122,
354
+ 0.004192877528633489,
355
+ 0.007735770262043791,
356
+ 0.12165437145979416,
357
+ 0.4240362063051912,
358
+ -0.04807744057382159,
359
+ 0.5348454862089645,
360
+ -0.2242267483064094,
361
+ 0.6785139612746233,
362
+ 0.44063327108356026,
363
+ -0.5817907597011159,
364
+ 1.4848808947644698,
365
+ 0.23677060359863844,
366
+ 0.8047608976578217,
367
+ 1.2885358066775385,
368
+ 1.121296234630144,
369
+ 0.1530053280244578,
370
+ 0.47159374137911664,
371
+ 1.4497268026096963,
372
+ 1.209493986111197,
373
+ 0.000607743071994785,
374
+ 1.1285918553517904,
375
+ 1.4852443026351887,
376
+ 1.40864529914145,
377
+ 0.0005045250833275262,
378
+ 0.6529258515675681,
379
+ 1.5684983484657407,
380
+ 1.410548214903808,
381
+ 0.04141691976666532,
382
+ 0.4324983713584415,
383
+ 1.581882411962951,
384
+ 0.4360824992849159,
385
+ 0.536577685217296,
386
+ 0.17271069713926454,
387
+ -0.1923993535500321,
388
+ 0.6603449199506704,
389
+ 0.6146886826649628,
390
+ 2.5299440392122987,
391
+ 0.8734007741005431,
392
+ 0.2945541165562756,
393
+ 0.2660609760662737,
394
+ 0.4397959618181715,
395
+ 0.561333966299661,
396
+ 0.016765962069138527,
397
+ 0.013682085635355636,
398
+ 0.5505121417687537,
399
+ 0.5398925788148042,
400
+ 0.0695841343717276,
401
+ 0.01839643415234168,
402
+ 0.4144001419639858,
403
+ 0.49493986038591653,
404
+ 0.08391836784040793,
405
+ 0.008520232199757685,
406
+ 0.20705310424289292,
407
+ 0.25758608095714447,
408
+ 0.005206322471282864,
409
+ 0.0012976095516473085,
410
+ 1.1263225634065734,
411
+ 7.982910483274839e-07,
412
+ -0.5999537469885612
413
+ ]
414
+ }
415
+ },
416
+ "action": {
417
+ "joint_position": {
418
+ "max": [
419
+ 0.0,
420
+ 0.0,
421
+ 0.0,
422
+ 0.0,
423
+ 0.03857817500829697,
424
+ 0.00860599521547556,
425
+ 0.0,
426
+ 0.011358398012816906,
427
+ 0.008795554749667645,
428
+ 0.2635219991207123,
429
+ 1.1436481475830078,
430
+ -0.03254669904708862,
431
+ 0.9616425037384033,
432
+ 0.005560107994824648,
433
+ 1.9167735576629639,
434
+ 0.9553606510162354,
435
+ 0.14417271316051483,
436
+ 1.6008000373840332,
437
+ 0.42979976534843445,
438
+ 0.9053434133529663,
439
+ 1.5772000551223755,
440
+ 1.234971523284912,
441
+ 0.1844240427017212,
442
+ 1.5772000551223755,
443
+ 1.5772000551223755,
444
+ 1.5085276365280151,
445
+ 0.006402143742889166,
446
+ 1.5772000551223755,
447
+ 1.5772000551223755,
448
+ 1.5859500169754028,
449
+ 0.019420389086008072,
450
+ 1.5772000551223755,
451
+ 1.5772000551223755,
452
+ 1.5859500169754028,
453
+ 0.06968191266059875,
454
+ 1.5772000551223755,
455
+ 1.5772000551223755,
456
+ 0.6093692779541016,
457
+ 0.6748019456863403,
458
+ 0.8214342594146729,
459
+ -0.004697862546890974,
460
+ 1.4369226694107056,
461
+ 0.9396975636482239,
462
+ 2.7821881771087646,
463
+ 1.6008000373840332,
464
+ 0.3493163585662842,
465
+ 0.3661975860595703,
466
+ 1.5772000551223755,
467
+ 0.8197538256645203,
468
+ 0.026058457791805267,
469
+ 0.015132924541831017,
470
+ 0.7475559711456299,
471
+ 0.7178741693496704,
472
+ 0.10578178614377975,
473
+ 0.015168641693890095,
474
+ 0.7295849919319153,
475
+ 0.6987736821174622,
476
+ 0.14835047721862793,
477
+ 0.009813961572945118,
478
+ 0.32408037781715393,
479
+ 0.3626656234264374,
480
+ 0.026074068620800972,
481
+ 0.2016652673482895,
482
+ 1.3755035400390625,
483
+ 3.834952167380834e-06,
484
+ -0.5998210310935974
485
+ ],
486
+ "min": [
487
+ 0.0,
488
+ 0.0,
489
+ 0.0,
490
+ 0.0,
491
+ -0.009972152300179005,
492
+ -0.05403241515159607,
493
+ -0.04206079617142677,
494
+ -0.008545337244868279,
495
+ -0.008585446514189243,
496
+ -1.4895294904708862,
497
+ -0.9075872302055359,
498
+ -0.9320250153541565,
499
+ -0.7501844167709351,
500
+ -2.5271456241607666,
501
+ -1.3021537065505981,
502
+ -1.559999942779541,
503
+ -2.700000047683716,
504
+ -0.004508104640990496,
505
+ -0.11590000241994858,
506
+ -0.002204202115535736,
507
+ -0.06962385773658752,
508
+ -0.1454038769006729,
509
+ -0.004906882997602224,
510
+ -0.0047397250309586525,
511
+ -0.15006792545318604,
512
+ -0.10465795546770096,
513
+ -0.20927758514881134,
514
+ -0.0025141574442386627,
515
+ -0.00471277441829443,
516
+ -0.13493074476718903,
517
+ -0.21192054450511932,
518
+ -0.006507838144898415,
519
+ -0.16269022226333618,
520
+ -0.24437615275382996,
521
+ -0.19966383278369904,
522
+ -0.007696053013205528,
523
+ -0.4431999921798706,
524
+ -0.641176164150238,
525
+ 0.03254669904708862,
526
+ -1.6302410364151,
527
+ -1.4542460441589355,
528
+ -1.5116034746170044,
529
+ -1.1882280111312866,
530
+ 0.38480550050735474,
531
+ -0.04479999840259552,
532
+ -0.004017750732600689,
533
+ -0.007120981812477112,
534
+ -0.008143257349729538,
535
+ -0.0033251529093831778,
536
+ -0.09708409011363983,
537
+ -0.003473518416285515,
538
+ -0.0041289618238806725,
539
+ -0.004991866648197174,
540
+ -0.06705081462860107,
541
+ -0.01580159179866314,
542
+ -0.0027919195126742125,
543
+ -0.004558212589472532,
544
+ -0.01956297643482685,
545
+ -0.00846139620989561,
546
+ -0.16097694635391235,
547
+ -0.1373649537563324,
548
+ -0.21344037353992462,
549
+ -0.09165584295988083,
550
+ -0.011517560109496117,
551
+ -0.00011888350854860619,
552
+ -0.6000050902366638
553
+ ],
554
+ "mean": [
555
+ 0.0,
556
+ 0.0,
557
+ 0.0,
558
+ 0.0,
559
+ 0.0012301454489393134,
560
+ -0.0014178637151902394,
561
+ -0.0011515726267851373,
562
+ -8.99514191918264e-05,
563
+ 0.0002417336261065251,
564
+ -0.385079695781938,
565
+ -0.056512973414585665,
566
+ -0.205284599628657,
567
+ 0.021670858474119963,
568
+ -1.090174144042536,
569
+ 0.02189628348629755,
570
+ -0.11948687818374201,
571
+ -1.3953129200280388,
572
+ 0.7158406793650635,
573
+ -0.07399009379515412,
574
+ 0.5646530928317394,
575
+ 0.3322831628858908,
576
+ 0.4907992098567843,
577
+ 0.09187406823457182,
578
+ 0.022187885216056677,
579
+ 0.5144590627367867,
580
+ 0.5269970564099565,
581
+ -0.10383331986413583,
582
+ 0.07851809721461797,
583
+ 0.7411814242423589,
584
+ 0.5444023280221095,
585
+ -0.0734774868730444,
586
+ 0.03564721708701069,
587
+ 0.5437700061517406,
588
+ 0.5513370593516884,
589
+ -0.036397602421378954,
590
+ 0.05125830440624985,
591
+ 0.7203868271344417,
592
+ 0.11542250590215877,
593
+ 0.1909098924988445,
594
+ -0.32824815124911455,
595
+ -0.6069722333678652,
596
+ -0.08471856718111564,
597
+ 0.13363292776968674,
598
+ 1.914587773657288,
599
+ 0.5648819695933174,
600
+ 0.2560727698616553,
601
+ 0.24167851919247058,
602
+ 0.20062104627304525,
603
+ 0.3984462185737324,
604
+ -0.0685205105848946,
605
+ 0.008445595006489948,
606
+ 0.40068832818889344,
607
+ 0.34675098094776313,
608
+ 0.03585122773362594,
609
+ 0.0048274081093767985,
610
+ 0.2606034069992753,
611
+ 0.33831379298242215,
612
+ 0.05275099718188223,
613
+ 0.0005992490332076127,
614
+ 0.024840247042178354,
615
+ 0.16453840496608627,
616
+ -0.10364606618086784,
617
+ -0.08398711226926625,
618
+ 0.8792984481160785,
619
+ -6.785587383775982e-07,
620
+ -0.5999860812434155
621
+ ],
622
+ "std": [
623
+ 0.0,
624
+ 0.0,
625
+ 0.0,
626
+ 0.0,
627
+ 0.005364887313739734,
628
+ 0.0061500188820826284,
629
+ 0.00482324811058198,
630
+ 0.0006118624749094115,
631
+ 0.0010698437805450641,
632
+ 0.5759163621187422,
633
+ 0.25314037443712,
634
+ 0.1745444488366029,
635
+ 0.23248598496176517,
636
+ 0.5151875566816371,
637
+ 0.363109875655475,
638
+ 0.37494234688690026,
639
+ 0.45561049100001433,
640
+ 0.44951049081390626,
641
+ 0.09446383905433008,
642
+ 0.12122754061324706,
643
+ 0.2648716126033152,
644
+ 0.3501251981903186,
645
+ 0.03332956813047136,
646
+ 0.10150862839363956,
647
+ 0.3652349936001306,
648
+ 0.37369597719425024,
649
+ 0.03457925261671847,
650
+ 0.23152506479876248,
651
+ 0.36023274522277116,
652
+ 0.4740673099425935,
653
+ 0.035694487186065826,
654
+ 0.15188985568739302,
655
+ 0.4924371132376949,
656
+ 0.45327276640155423,
657
+ 0.046409641970985446,
658
+ 0.13415159186384895,
659
+ 0.543731809265607,
660
+ 0.21463303789677357,
661
+ 0.17222252415980077,
662
+ 0.3952559882935496,
663
+ 0.3814561269812472,
664
+ 0.2757738719846831,
665
+ 0.3065243714559272,
666
+ 0.3545466688724501,
667
+ 0.21844354204798888,
668
+ 0.053350533882705435,
669
+ 0.043587100511912974,
670
+ 0.09799630470643801,
671
+ 0.10512987970663229,
672
+ 0.016380302384406024,
673
+ 0.002169456788049435,
674
+ 0.10070984351883364,
675
+ 0.10163106548753786,
676
+ 0.026399777941356704,
677
+ 0.0020461802224458975,
678
+ 0.08102576186444525,
679
+ 0.09971133875885727,
680
+ 0.028058008122129583,
681
+ 0.0017529504319345533,
682
+ 0.07209880855335549,
683
+ 0.06331904047058992,
684
+ 0.04149379483074872,
685
+ 0.01843194410669704,
686
+ 0.21029348506622267,
687
+ 8.890689092689295e-06,
688
+ 5.758346966517874e-05
689
+ ],
690
+ "q01": [
691
+ 0.0,
692
+ 0.0,
693
+ 0.0,
694
+ 0.0,
695
+ -0.00011170007204678288,
696
+ -0.0350740840703673,
697
+ -0.02649646236281921,
698
+ -0.0028050727380635027,
699
+ -9.426515632575396e-05,
700
+ -1.374807900157358,
701
+ -0.5354302153081253,
702
+ -0.6473194788886907,
703
+ -0.49177669041598987,
704
+ -1.8780234500237012,
705
+ -0.85355856517312,
706
+ -1.188833228891968,
707
+ -2.3189886683615177,
708
+ 0.025163513809203716,
709
+ -0.11590000241994858,
710
+ 0.027254616334218617,
711
+ 0.006036052553036485,
712
+ -0.009530988392684247,
713
+ 0.0035405991289481093,
714
+ 0.0007229138612139622,
715
+ 0.01636307029640254,
716
+ -0.0004343032527643275,
717
+ -0.16726862426587313,
718
+ -4.8836113049831653e-05,
719
+ 0.02567418749589847,
720
+ -0.024016833289696688,
721
+ -0.15735250287967065,
722
+ -0.0008375468527870527,
723
+ -0.009714188193280066,
724
+ -0.05747116584319238,
725
+ -0.14910954702118515,
726
+ -0.001929540405106106,
727
+ -0.15039529739554428,
728
+ -0.37968197775437307,
729
+ 0.03254669904708863,
730
+ -1.2543870126546852,
731
+ -1.303943062692862,
732
+ -0.6799536316732782,
733
+ -0.6303896984746518,
734
+ 1.2865705168038077,
735
+ 0.006401516842819814,
736
+ 0.013067307866088613,
737
+ 0.01262725118611539,
738
+ 0.011174036935239103,
739
+ 0.019966230845197767,
740
+ -0.08647492251316174,
741
+ 0.0006215869344666161,
742
+ 0.02203495231208817,
743
+ 0.01725570048370192,
744
+ -0.021439176000207998,
745
+ -0.0023131370850054977,
746
+ 0.014250255564199055,
747
+ 0.01823869361150577,
748
+ -0.003908939982068881,
749
+ -0.0027584949224312687,
750
+ -0.10074178677307907,
751
+ -0.004372001140849619,
752
+ -0.1818862571741591,
753
+ -0.09023238522192853,
754
+ 0.04855890365244546,
755
+ -2.4414687376485298e-06,
756
+ -0.5999960111901739
757
+ ],
758
+ "q99": [
759
+ 0.0,
760
+ 0.0,
761
+ 0.0,
762
+ 0.0,
763
+ 0.029449282894238496,
764
+ 0.00016742971014083042,
765
+ 0.0,
766
+ 0.00031566800135744813,
767
+ 0.0057871130284906916,
768
+ 0.12873437985015881,
769
+ 0.4299298342542301,
770
+ -0.03254669904708863,
771
+ 0.5437001126633425,
772
+ -0.22140740429409098,
773
+ 0.6898871616651259,
774
+ 0.4431390503500722,
775
+ -0.5147745608161652,
776
+ 1.5909486264827468,
777
+ 0.23712931532251702,
778
+ 0.8461429033868172,
779
+ 1.297922698244089,
780
+ 1.1621481931028022,
781
+ 0.15062389224667413,
782
+ 0.36973000228336933,
783
+ 1.4482291357628363,
784
+ 1.3591272694418859,
785
+ -0.004566694395975754,
786
+ 1.1618997356401126,
787
+ 1.4852974791242242,
788
+ 1.5427211015035178,
789
+ -0.0019208977371253182,
790
+ 0.7870620850309203,
791
+ 1.5673701527367643,
792
+ 1.4620851178571561,
793
+ 0.039924607576713726,
794
+ 0.4326375294264343,
795
+ 1.5772000551223755,
796
+ 0.46008667598219477,
797
+ 0.57853004006268,
798
+ 0.19030685767770902,
799
+ -0.18930773311367147,
800
+ 0.6756148051964173,
801
+ 0.6256650970198077,
802
+ 2.5507818254221712,
803
+ 0.9547109888525797,
804
+ 0.3131930388847934,
805
+ 0.2795055230941873,
806
+ 0.45624038111699255,
807
+ 0.5930317936822368,
808
+ 0.0029542181982621656,
809
+ 0.012190042270950462,
810
+ 0.581510587321943,
811
+ 0.5697987599313858,
812
+ 0.07413654923497481,
813
+ 0.008606564642661909,
814
+ 0.43301102078462383,
815
+ 0.523493119847704,
816
+ 0.09345735818405843,
817
+ 0.00552060761151478,
818
+ 0.21234308337554603,
819
+ 0.27454811268900675,
820
+ -0.0007255198397156713,
821
+ 0.0025463204460750163,
822
+ 1.1936579220245007,
823
+ 7.982910483274839e-07,
824
+ -0.5999537469885612
825
+ ]
826
+ }
827
+ }
828
+ },
829
+ "modalities": {
830
+ "video": {
831
+ "zed_left": {
832
+ "resolution": [
833
+ 256,
834
+ 256
835
+ ],
836
+ "channels": 3,
837
+ "fps": 20.0
838
+ },
839
+ "zed_right": {
840
+ "resolution": [
841
+ 256,
842
+ 256
843
+ ],
844
+ "channels": 3,
845
+ "fps": 20.0
846
+ }
847
+ },
848
+ "state": {
849
+ "joint_position": {
850
+ "absolute": true,
851
+ "rotation_type": null,
852
+ "shape": [
853
+ 66
854
+ ],
855
+ "continuous": true
856
+ }
857
+ },
858
+ "action": {
859
+ "joint_position": {
860
+ "absolute": true,
861
+ "rotation_type": null,
862
+ "shape": [
863
+ 66
864
+ ],
865
+ "continuous": true
866
+ }
867
+ }
868
+ },
869
+ "embodiment_tag": "new_embodiment"
870
+ }
871
+ }
checkpoint-10000/experiment_cfg/train_config.yaml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ train:
2
+ datasets:
3
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_0"
4
+ embodiment_tag: "new_embodiment"
5
+ data_config: rby1_wuji
6
+ weight: 1.0
7
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_1"
8
+ embodiment_tag: "new_embodiment"
9
+ data_config: rby1_wuji
10
+ weight: 1.0
11
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_2"
12
+ embodiment_tag: "new_embodiment"
13
+ data_config: rby1_wuji
14
+ weight: 1.0
15
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_3"
16
+ embodiment_tag: "new_embodiment"
17
+ data_config: rby1_wuji
18
+ weight: 1.0
19
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_4"
20
+ embodiment_tag: "new_embodiment"
21
+ data_config: rby1_wuji
22
+ weight: 1.0
23
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_5"
24
+ embodiment_tag: "new_embodiment"
25
+ data_config: rby1_wuji
26
+ weight: 1.0
27
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_6"
28
+ embodiment_tag: "new_embodiment"
29
+ data_config: rby1_wuji
30
+ weight: 1.0
31
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_7_0"
32
+ embodiment_tag: "new_embodiment"
33
+ data_config: rby1_wuji
34
+ weight: 1.0
35
+ - path: "/rlwrld1/home/hazel/ms_demo_v3/260323/ms_demo_ver3_8"
36
+ embodiment_tag: "new_embodiment"
37
+ data_config: rby1_wuji
38
+ weight: 1.0
checkpoint-10000/model-00001-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:14908e04dd5bf0d5746e27c0f81c4cfafa7a89668f998e4bd2b0245c1bde5669
3
+ size 4999367032
checkpoint-10000/model-00002-of-00002.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bebfebd16daa582597b63d9a789384a360f79a8ac38b68aeb9e41b7f1edc2abc
3
+ size 2598112928
checkpoint-10000/model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-10000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:75ee7106b5ed3e22dc9def8f9f833c64f93e2d886bec64a1dda0471299903608
3
+ size 10294076454
checkpoint-10000/rng_state_0.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66f51a2fb291871d931fca0284bd275daa49470f11ca93e3e2a180421934eafa
3
+ size 14512
checkpoint-10000/rng_state_1.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c27c78c066185cc19cb571917f2bd8114e8b6f2a6173ff0a861184a975c508d
3
+ size 14512
checkpoint-10000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:69cc8a263533866a30bbd70c695b627ba4dea5733aa097b95dcfa6e7f21407ca
3
+ size 1064
checkpoint-10000/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
runs/Mar24_21-00-46_worker-node1000/events.out.tfevents.1774353663.worker-node1000.426899.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:77c5b3d34a3c01bd2c12a993adc91657d234d70204c4a3314c32263cf03db3cc
3
- size 158231
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:db725c818899f375fd59945196f2c3272772534b755e745620fc0224ccf50184
3
+ size 229549