Robotics
Safetensors
vision-language-action-model
Jia-Zeng commited on
Commit
6fe7dc4
·
verified ·
1 Parent(s): 84d3781

release InternVLA-A1-3B pretrained on InternData-A1 together with agibot-world

Browse files
Files changed (5) hide show
  1. README.MD +123 -0
  2. config.json +72 -0
  3. model.safetensors +3 -0
  4. stats.json +1189 -0
  5. train_config.json +270 -0
README.MD ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ base_model:
4
+ - Qwen/Qwen3-VL-2B-Instruct
5
+ tags:
6
+ - robotics
7
+ - vision-language-action-model
8
+ library_name: transformers
9
+ ---
10
+
11
+ # InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation
12
+
13
+ <div style="display: flex; justify-content: center; align-items: center; margin: 20px 0;">
14
+ <img src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/teaser_InternVLA-A1.jpg" alt="Teaser Image" style="max-width: 100%; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
15
+ </div>
16
+
17
+ [![Paper](https://img.shields.io/badge/Paper-arXiv-red.svg)](https://internrobotics.github.io/internvla-a1.github.io/paper/InternVLA_A1.pdf)
18
+ [![Code](https://img.shields.io/badge/GitHub-Code-800820?logo=github)](https://github.com/InternRobotics/InternVLA-A1)
19
+ [![Data](https://img.shields.io/badge/Data-HuggingFace-blue?logo=huggingface)](https://huggingface.co/datasets/InternRobotics/InternData-A1)
20
+ [![Website](https://img.shields.io/badge/Website-Pages-blue.svg)](https://internrobotics.github.io/internvla-a1.github.io/)
21
+
22
+
23
+ <strong>InternVLA-A1</strong> integrates understanding, generation, and action experts into a unified
24
+ model, which synergizes MLLMs' semantic reasoning with world-model-style dynamics prediction to guide action execution.
25
+
26
+ Building upon InternVL3 and Qwen3-VL, we instantiate InternVLA-A1 at 2B and 3B parameter scales. Covering different model scales and pre-training data configurations, we release the InternVLA-A1 series:
27
+
28
+ - [x] [InternVLA-A1-3B](https://huggingface.co/InternRobotics/InternVLA-A1-3B): pretrained on the large-scale, high-fidelity simulation data [InternData-A1](https://huggingface.co/datasets/InternRobotics/InternData-A1), together with open-source robot data (e.g. Agibot-World)
29
+ - [ ] [InternVLA-A1-3B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-3B-Pretrain-InternData-A1): pretrained on InternData-A1 only
30
+ - [ ] [InternVLA-A1-2B-Pretrain-InternData-A1](https://huggingface.co/InternRobotics/InternVLA-A1-2B-Pretrain-InternData-A1): pretrained on InternData-A1 only
31
+
32
+ ## 🔑 Key Features
33
+
34
+ Architecturally, InternVLA-A1 employs a Mixture-of-Transformers (MoT) design to unify semantic un-
35
+ derstanding, visual foresight, and action prediction, effectively synergizing high-level reasoning with
36
+ low-level dynamics.
37
+ <div style="display: flex; justify-content: center; align-items: center; margin: 20px 0;">
38
+ <img src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/method_InternVLA-A1.png" alt="Teaser Image" style="max-width: 100%; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
39
+ </div>
40
+
41
+ Our hybrid synthetic-real pre-training strategy combines
42
+ the scene diversity of simulation with the physical fidelity of real-world data.
43
+ <div style="display: flex; justify-content: center; align-items: center; margin: 20px 0;">
44
+ <img src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/data_paramid.jpg" alt="Teaser Image" style="max-width: 100%; border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
45
+ </div>
46
+
47
+ ## Demonstrations
48
+ ### ⚡ Dynamic Manipulation
49
+ <div style="display: flex; flex-direction: column; align-items: center; gap: 10px;">
50
+ <!-- First Row -->
51
+ <div style="display: flex; justify-content: center; align-items: center; gap: 10px;">
52
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
53
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/express_sorting_complete.mp4" type="video/mp4">
54
+ </video>
55
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
56
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/In-motion_Ingredient_Picking_4x.mp4" type="video/mp4">
57
+ </video>
58
+ </div>
59
+ <!-- Second Row -->
60
+ <div style="display: flex; justify-content: center; align-items: center; gap: 10px;">
61
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
62
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/express_sorting_upright_3.mp4" type="video/mp4">
63
+ </video>
64
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
65
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/express_sorting_inverted_2.mp4" type="video/mp4">
66
+ </video>
67
+ </div>
68
+ <!-- Third Row -->
69
+ <div style="display: flex; justify-content: center; align-items: center; gap: 10px;">
70
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
71
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/express_sorting_upright_2.mp4" type="video/mp4">
72
+ </video>
73
+ <video controls autoplay loop muted width="250" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
74
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/express_sorting_upright_1.mp4" type="video/mp4">
75
+ </video>
76
+ </div>
77
+ <p><em>InternVLA-A1 exhibits exceptional robustness in highly dynamic scenarios.</em></p>
78
+ </div>
79
+
80
+
81
+ ### 🤖 Daily tasks
82
+
83
+ <div style="display: flex; flex-direction: column; align-items: center; gap: 10px;">
84
+ <!-- First Row -->
85
+ <div style="display: flex; justify-content: center; align-items: center; gap: 10px;">
86
+ <video controls autoplay loop muted width="200" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
87
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/zig_bag_4x.mp4" type="video/mp4">
88
+ </video>
89
+ <video controls autoplay loop muted width="200" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
90
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/sort_parts_4x.mp4" type="video/mp4">
91
+ </video>
92
+ <video controls autoplay loop muted width="210" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
93
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/unscrew_cap_4x.mp4" type="video/mp4">
94
+ </video>
95
+ </div>
96
+ <!-- Second Row -->
97
+ <div style="display: flex; justify-content: center; align-items: center; gap: 10px;">
98
+ <video controls autoplay loop muted width="200" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
99
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/wipe_stain_4x.mp4" type="video/mp4">
100
+ </video>
101
+ <video controls autoplay loop muted width="210" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
102
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/place_flower_4x.mp4" type="video/mp4">
103
+ </video>
104
+ <video controls autoplay loop muted width="200" style="border-radius: 10px; box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);">
105
+ <source src="https://huggingface.co/spaces/Jia-Zeng/InternVLA_A1_Media/resolve/main/sweep_trash_4x.mp4" type="video/mp4">
106
+ </video>
107
+ </div>
108
+ <p><em>InternVLA-A1 also demonstrates superior proficiency in dexterous and fine-grained manipulation.</em></p>
109
+ </div>
110
+
111
+ ## Usage
112
+ Please refer to our official repo [InternVLA-A1](https://github.com/InternRobotics/InternVLA-A1).
113
+
114
+ ## License and Citation
115
+ All the code within this repo are under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/). Please consider citing our project if it helps your research.
116
+
117
+ ```BibTeX
118
+ @misc{contributors2026internvla_a1,
119
+ title={InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation},
120
+ author={InternVLA-A1 contributors},
121
+ year={2026}
122
+ }
123
+ ```
config.json ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "type": "qwena1",
3
+ "n_obs_steps": 1,
4
+ "input_features": {
5
+ "observation.state": {
6
+ "type": "STATE",
7
+ "shape": [
8
+ 32
9
+ ]
10
+ }
11
+ },
12
+ "output_features": {
13
+ "action": {
14
+ "type": "ACTION",
15
+ "shape": [
16
+ 32
17
+ ]
18
+ }
19
+ },
20
+ "device": "cuda",
21
+ "use_amp": false,
22
+ "push_to_hub": false,
23
+ "repo_id": "jcaiaq/qwena1",
24
+ "private": null,
25
+ "tags": null,
26
+ "license": null,
27
+ "pretrained_path": null,
28
+ "qwen3_vl_variant": "qwen3_vl_28l",
29
+ "action_expert_variant": "qwen3_28l",
30
+ "dtype": "bfloat16",
31
+ "chunk_size": 50,
32
+ "n_action_steps": 50,
33
+ "max_state_dim": 32,
34
+ "max_action_dim": 32,
35
+ "num_inference_steps": 10,
36
+ "time_sampling_beta_alpha": 1.5,
37
+ "time_sampling_beta_beta": 1.0,
38
+ "time_sampling_scale": 0.999,
39
+ "time_sampling_offset": 0.001,
40
+ "min_period": 0.004,
41
+ "max_period": 4.0,
42
+ "image_resolution": [
43
+ 224,
44
+ 224
45
+ ],
46
+ "empty_cameras": 0,
47
+ "normalization_mapping": {
48
+ "VISUAL": "IDENTITY",
49
+ "STATE": "IDENTITY",
50
+ "ACTION": "IDENTITY"
51
+ },
52
+ "gradient_checkpointing": false,
53
+ "compile_model": false,
54
+ "compile_mode": "max-autotune",
55
+ "optimizer_lr": 5e-05,
56
+ "optimizer_betas": [
57
+ 0.9,
58
+ 0.95
59
+ ],
60
+ "optimizer_eps": 1e-08,
61
+ "optimizer_weight_decay": 0.01,
62
+ "optimizer_grad_clip_norm": 1.0,
63
+ "scheduler_warmup_steps": 0,
64
+ "scheduler_decay_steps": 700000,
65
+ "scheduler_decay_lr": 5e-05,
66
+ "tokenizer_max_length": 48,
67
+ "freeze_vision_encoder": false,
68
+ "train_expert_only": false,
69
+ "train_vlm_only": false,
70
+ "scale_factor": 8,
71
+ "lambda_gen": 0.01
72
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c4dfbc04b9783c9e27e492c1df3379292e75884eb9b019b66a73872daa6afdb
3
+ size 6721209208
stats.json ADDED
@@ -0,0 +1,1189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "a2d": {
3
+ "observation.images.head_center_fisheye": {
4
+ "min": [
5
+ [
6
+ [
7
+ 0.0
8
+ ]
9
+ ],
10
+ [
11
+ [
12
+ 0.0
13
+ ]
14
+ ],
15
+ [
16
+ [
17
+ 0.0
18
+ ]
19
+ ]
20
+ ],
21
+ "max": [
22
+ [
23
+ [
24
+ 1.0
25
+ ]
26
+ ],
27
+ [
28
+ [
29
+ 1.0
30
+ ]
31
+ ],
32
+ [
33
+ [
34
+ 1.0
35
+ ]
36
+ ]
37
+ ],
38
+ "mean": [
39
+ [
40
+ [
41
+ 0.2170564526776088
42
+ ]
43
+ ],
44
+ [
45
+ [
46
+ 0.21715563952928688
47
+ ]
48
+ ],
49
+ [
50
+ [
51
+ 0.21329664920334832
52
+ ]
53
+ ]
54
+ ],
55
+ "std": [
56
+ [
57
+ [
58
+ 0.13733952334113325
59
+ ]
60
+ ],
61
+ [
62
+ [
63
+ 0.13933204888565712
64
+ ]
65
+ ],
66
+ [
67
+ [
68
+ 0.13837168358341617
69
+ ]
70
+ ]
71
+ ],
72
+ "count": [
73
+ 377479
74
+ ]
75
+ },
76
+ "observation.images.hand_right": {
77
+ "min": [
78
+ [
79
+ [
80
+ 0.0
81
+ ]
82
+ ],
83
+ [
84
+ [
85
+ 0.0
86
+ ]
87
+ ],
88
+ [
89
+ [
90
+ 0.0
91
+ ]
92
+ ]
93
+ ],
94
+ "max": [
95
+ [
96
+ [
97
+ 1.0
98
+ ]
99
+ ],
100
+ [
101
+ [
102
+ 1.0
103
+ ]
104
+ ],
105
+ [
106
+ [
107
+ 1.0
108
+ ]
109
+ ]
110
+ ],
111
+ "mean": [
112
+ [
113
+ [
114
+ 0.4012361263480749
115
+ ]
116
+ ],
117
+ [
118
+ [
119
+ 0.4128215547931966
120
+ ]
121
+ ],
122
+ [
123
+ [
124
+ 0.41201127556394607
125
+ ]
126
+ ]
127
+ ],
128
+ "std": [
129
+ [
130
+ [
131
+ 0.3006461868623912
132
+ ]
133
+ ],
134
+ [
135
+ [
136
+ 0.2870540652329497
137
+ ]
138
+ ],
139
+ [
140
+ [
141
+ 0.28873655436784607
142
+ ]
143
+ ]
144
+ ],
145
+ "count": [
146
+ 377479
147
+ ]
148
+ },
149
+ "observation.states.end.position": {
150
+ "min": [
151
+ 0.19337546825408936,
152
+ -0.7023321390151978,
153
+ 0.02558809332549572
154
+ ],
155
+ "max": [
156
+ 0.9986844062805176,
157
+ 0.6781377792358398,
158
+ 1.3737616539001465
159
+ ],
160
+ "mean": [
161
+ 0.6642024517059326,
162
+ -0.007329180371016264,
163
+ 0.6810888051986694
164
+ ],
165
+ "std": [
166
+ 0.07928971946239471,
167
+ 0.31414300203323364,
168
+ 0.18238481879234314
169
+ ],
170
+ "count": [
171
+ 412635494
172
+ ]
173
+ },
174
+ "observation.images.back_right_fisheye": {
175
+ "min": [
176
+ [
177
+ [
178
+ 0.0
179
+ ]
180
+ ],
181
+ [
182
+ [
183
+ 0.0
184
+ ]
185
+ ],
186
+ [
187
+ [
188
+ 0.0
189
+ ]
190
+ ]
191
+ ],
192
+ "max": [
193
+ [
194
+ [
195
+ 1.0
196
+ ]
197
+ ],
198
+ [
199
+ [
200
+ 1.0
201
+ ]
202
+ ],
203
+ [
204
+ [
205
+ 1.0
206
+ ]
207
+ ]
208
+ ],
209
+ "mean": [
210
+ [
211
+ [
212
+ 0.24406564106197712
213
+ ]
214
+ ],
215
+ [
216
+ [
217
+ 0.23887354567756203
218
+ ]
219
+ ],
220
+ [
221
+ [
222
+ 0.23895767925496028
223
+ ]
224
+ ]
225
+ ],
226
+ "std": [
227
+ [
228
+ [
229
+ 0.14380076551541654
230
+ ]
231
+ ],
232
+ [
233
+ [
234
+ 0.14275321411059783
235
+ ]
236
+ ],
237
+ [
238
+ [
239
+ 0.1430845503139609
240
+ ]
241
+ ]
242
+ ],
243
+ "count": [
244
+ 377479
245
+ ]
246
+ },
247
+ "observation.states.robot.orientation": {
248
+ "min": [
249
+ 0.0,
250
+ 0.0,
251
+ -0.9999999403953552,
252
+ 0.0
253
+ ],
254
+ "max": [
255
+ 0.0,
256
+ 0.0,
257
+ 0.9999999403953552,
258
+ 1.0
259
+ ],
260
+ "mean": [
261
+ 0.0,
262
+ 0.0,
263
+ -0.0044853053987026215,
264
+ 0.8723011016845703
265
+ ],
266
+ "std": [
267
+ 0.0,
268
+ 0.0,
269
+ 0.17404766380786896,
270
+ 0.3195227086544037
271
+ ],
272
+ "count": [
273
+ 206317747
274
+ ]
275
+ },
276
+ "actions.end.orientation": {
277
+ "min": [
278
+ -0.8659883141517639,
279
+ -0.707099199295044,
280
+ -0.8578947186470032,
281
+ -0.5
282
+ ],
283
+ "max": [
284
+ 0.9999998211860657,
285
+ 0.9999992251396179,
286
+ 0.9986231923103333,
287
+ 0.9996573328971863
288
+ ],
289
+ "mean": [
290
+ -0.0957142785191536,
291
+ 0.33084630966186523,
292
+ -0.16654019057750702,
293
+ 0.21365666389465332
294
+ ],
295
+ "std": [
296
+ 0.5548436045646667,
297
+ 0.47030869126319885,
298
+ 0.3892369866371155,
299
+ 0.35699814558029175
300
+ ],
301
+ "count": [
302
+ 412635494
303
+ ]
304
+ },
305
+ "timestamp": {
306
+ "min": [
307
+ 0.0
308
+ ],
309
+ "max": [
310
+ 371.73333740234375
311
+ ],
312
+ "mean": [
313
+ 37.82756042480469
314
+ ],
315
+ "std": [
316
+ 30.960451126098633
317
+ ],
318
+ "count": [
319
+ 206317747
320
+ ]
321
+ },
322
+ "actions.head.position": {
323
+ "min": [
324
+ -0.3141944110393524,
325
+ -7.989483492565341e-06
326
+ ],
327
+ "max": [
328
+ 0.3491190969944,
329
+ 0.5243231654167175
330
+ ],
331
+ "mean": [
332
+ -0.006390239577740431,
333
+ 0.4214302599430084
334
+ ],
335
+ "std": [
336
+ 0.040385738015174866,
337
+ 0.07512572407722473
338
+ ],
339
+ "count": [
340
+ 206317747
341
+ ]
342
+ },
343
+ "observation.images.head": {
344
+ "min": [
345
+ [
346
+ [
347
+ 0.0
348
+ ]
349
+ ],
350
+ [
351
+ [
352
+ 0.0
353
+ ]
354
+ ],
355
+ [
356
+ [
357
+ 0.0
358
+ ]
359
+ ]
360
+ ],
361
+ "max": [
362
+ [
363
+ [
364
+ 1.0
365
+ ]
366
+ ],
367
+ [
368
+ [
369
+ 1.0
370
+ ]
371
+ ],
372
+ [
373
+ [
374
+ 1.0
375
+ ]
376
+ ]
377
+ ],
378
+ "mean": [
379
+ [
380
+ [
381
+ 0.4342076007237568
382
+ ]
383
+ ],
384
+ [
385
+ [
386
+ 0.43646875224929266
387
+ ]
388
+ ],
389
+ [
390
+ [
391
+ 0.43947909177612193
392
+ ]
393
+ ]
394
+ ],
395
+ "std": [
396
+ [
397
+ [
398
+ 0.238587756768418
399
+ ]
400
+ ],
401
+ [
402
+ [
403
+ 0.23212485938515712
404
+ ]
405
+ ],
406
+ [
407
+ [
408
+ 0.2303160651119787
409
+ ]
410
+ ]
411
+ ],
412
+ "count": [
413
+ 377479
414
+ ]
415
+ },
416
+ "observation.states.head.position": {
417
+ "min": [
418
+ -0.3141944110393524,
419
+ -7.989483492565341e-06
420
+ ],
421
+ "max": [
422
+ 0.3491190969944,
423
+ 0.5243231654167175
424
+ ],
425
+ "mean": [
426
+ -0.006390239577740431,
427
+ 0.4214302599430084
428
+ ],
429
+ "std": [
430
+ 0.040385738015174866,
431
+ 0.07512572407722473
432
+ ],
433
+ "count": [
434
+ 206317747
435
+ ]
436
+ },
437
+ "actions.joint.position": {
438
+ "min": [
439
+ -1.7914215326309204,
440
+ -1.6348680257797241,
441
+ -1.4873474836349487,
442
+ -1.4548759460449219,
443
+ -2.240899085998535,
444
+ -1.8038171529769897,
445
+ -3.4757676124572754,
446
+ -1.6461334228515625,
447
+ -1.3727182149887085,
448
+ -1.444272518157959,
449
+ -1.2959821224212646,
450
+ -1.7712056636810303,
451
+ -1.848980188369751,
452
+ -3.174412727355957
453
+ ],
454
+ "max": [
455
+ 1.6070955991744995,
456
+ 1.1229939460754395,
457
+ 1.2000415325164795,
458
+ 1.2937791347503662,
459
+ 1.663032054901123,
460
+ 1.6665472984313965,
461
+ 3.1848840713500977,
462
+ 1.6074773073196411,
463
+ 1.7747725248336792,
464
+ 1.4743931293487549,
465
+ 1.4139199256896973,
466
+ 1.9228148460388184,
467
+ 1.8182251453399658,
468
+ 2.559840202331543
469
+ ],
470
+ "mean": [
471
+ -0.0019902666099369526,
472
+ 0.0007972356979735196,
473
+ 0.005883503705263138,
474
+ 0.002925095148384571,
475
+ -0.0023837448097765446,
476
+ -0.0032534017227590084,
477
+ 0.0030469002667814493,
478
+ 0.002534338738769293,
479
+ -0.0006564394570887089,
480
+ -0.007297950331121683,
481
+ -0.0034327625762671232,
482
+ 0.003094361163675785,
483
+ 0.0033172282855957747,
484
+ -0.003114244667813182
485
+ ],
486
+ "std": [
487
+ 0.05846413969993591,
488
+ 0.05941038578748703,
489
+ 0.05411318689584732,
490
+ 0.06191148981451988,
491
+ 0.06851422786712646,
492
+ 0.07117217034101486,
493
+ 0.08588860929012299,
494
+ 0.06875154376029968,
495
+ 0.07714387774467468,
496
+ 0.06475178897380829,
497
+ 0.0728577971458435,
498
+ 0.07771088927984238,
499
+ 0.08220776170492172,
500
+ 0.10469666123390198
501
+ ],
502
+ "count": [
503
+ 10039066750
504
+ ]
505
+ },
506
+ "index": {
507
+ "min": [
508
+ 0.0
509
+ ],
510
+ "max": [
511
+ 10437652.0
512
+ ],
513
+ "mean": [
514
+ 1407644.875
515
+ ],
516
+ "std": [
517
+ 1539739.0
518
+ ],
519
+ "count": [
520
+ 206317747
521
+ ]
522
+ },
523
+ "observation.states.robot.position": {
524
+ "min": [
525
+ -10.095999717712402,
526
+ -26.13800048828125,
527
+ 0.0
528
+ ],
529
+ "max": [
530
+ 13.439000129699707,
531
+ 18.18899917602539,
532
+ 0.0
533
+ ],
534
+ "mean": [
535
+ 0.018679631873965263,
536
+ -0.000379027595045045,
537
+ 0.0
538
+ ],
539
+ "std": [
540
+ 0.5366398096084595,
541
+ 0.4929087162017822,
542
+ 0.0
543
+ ],
544
+ "count": [
545
+ 206317747
546
+ ]
547
+ },
548
+ "observation.states.waist.position": {
549
+ "min": [
550
+ -0.00019494339358061552,
551
+ -3.51537273672875e-05
552
+ ],
553
+ "max": [
554
+ 0.8899499773979187,
555
+ 0.6806919574737549
556
+ ],
557
+ "mean": [
558
+ 0.462455153465271,
559
+ 0.2702843248844147
560
+ ],
561
+ "std": [
562
+ 0.13027304410934448,
563
+ 0.1003517434000969
564
+ ],
565
+ "count": [
566
+ 206317747
567
+ ]
568
+ },
569
+ "observation.states.effector.position": {
570
+ "min": [
571
+ 31.68000030517578,
572
+ 32.79999923706055
573
+ ],
574
+ "max": [
575
+ 124.24285888671875,
576
+ 124.28878021240234
577
+ ],
578
+ "mean": [
579
+ 63.92190170288086,
580
+ 67.9267578125
581
+ ],
582
+ "std": [
583
+ 37.45624542236328,
584
+ 38.00613021850586
585
+ ],
586
+ "count": [
587
+ 206317747
588
+ ]
589
+ },
590
+ "observation.states.joint.current_value": {
591
+ "min": [
592
+ -21509.0,
593
+ -21583.0,
594
+ -26576.0,
595
+ -27299.0,
596
+ -14975.0,
597
+ -15064.0,
598
+ -14811.0,
599
+ -21502.0,
600
+ -21507.0,
601
+ -26476.0,
602
+ -26706.0,
603
+ -14990.0,
604
+ -14856.0,
605
+ -14972.0
606
+ ],
607
+ "max": [
608
+ 21311.0,
609
+ 21799.0,
610
+ 27382.0,
611
+ 27585.0,
612
+ 14989.0,
613
+ 15018.0,
614
+ 12276.0,
615
+ 21545.0,
616
+ 21508.0,
617
+ 26707.0,
618
+ 27705.0,
619
+ 15040.0,
620
+ 15010.0,
621
+ 11652.0
622
+ ],
623
+ "mean": [
624
+ -1973.25830078125,
625
+ -594.5382080078125,
626
+ -2118.474853515625,
627
+ -1013.27880859375,
628
+ -1161.8677978515625,
629
+ 125.57772827148438,
630
+ 13.544347763061523,
631
+ 2125.65283203125,
632
+ 572.1343994140625,
633
+ 2160.431640625,
634
+ 930.390625,
635
+ 1165.202880859375,
636
+ -169.16151428222656,
637
+ -20.55805206298828
638
+ ],
639
+ "std": [
640
+ 1171.3841552734375,
641
+ 702.0079956054688,
642
+ 1523.385986328125,
643
+ 1328.933349609375,
644
+ 1126.8544921875,
645
+ 983.2664794921875,
646
+ 243.56626892089844,
647
+ 1217.5579833984375,
648
+ 763.5587158203125,
649
+ 1420.76806640625,
650
+ 1297.6051025390625,
651
+ 944.412109375,
652
+ 955.84423828125,
653
+ 268.05047607421875
654
+ ],
655
+ "count": [
656
+ 206317747
657
+ ]
658
+ },
659
+ "observation.images.back_left_fisheye": {
660
+ "min": [
661
+ [
662
+ [
663
+ 0.0
664
+ ]
665
+ ],
666
+ [
667
+ [
668
+ 0.0
669
+ ]
670
+ ],
671
+ [
672
+ [
673
+ 0.0
674
+ ]
675
+ ]
676
+ ],
677
+ "max": [
678
+ [
679
+ [
680
+ 1.0
681
+ ]
682
+ ],
683
+ [
684
+ [
685
+ 1.0
686
+ ]
687
+ ],
688
+ [
689
+ [
690
+ 1.0
691
+ ]
692
+ ]
693
+ ],
694
+ "mean": [
695
+ [
696
+ [
697
+ 0.2406036947933155
698
+ ]
699
+ ],
700
+ [
701
+ [
702
+ 0.23535502602992275
703
+ ]
704
+ ],
705
+ [
706
+ [
707
+ 0.23411491353305053
708
+ ]
709
+ ]
710
+ ],
711
+ "std": [
712
+ [
713
+ [
714
+ 0.14170669996228788
715
+ ]
716
+ ],
717
+ [
718
+ [
719
+ 0.14011669717959988
720
+ ]
721
+ ],
722
+ [
723
+ [
724
+ 0.1402756774757254
725
+ ]
726
+ ]
727
+ ],
728
+ "count": [
729
+ 377479
730
+ ]
731
+ },
732
+ "actions.effector.position": {
733
+ "min": [
734
+ 0.0,
735
+ 0.0
736
+ ],
737
+ "max": [
738
+ 1.0,
739
+ 1.0
740
+ ],
741
+ "mean": [
742
+ 0.38904863595962524,
743
+ 0.4319799244403839
744
+ ],
745
+ "std": [
746
+ 0.486628919839859,
747
+ 0.4939044713973999
748
+ ],
749
+ "count": [
750
+ 10039066750
751
+ ]
752
+ },
753
+ "actions.end.position": {
754
+ "min": [
755
+ 0.19337546825408936,
756
+ -0.7023321390151978,
757
+ 0.02587788738310337
758
+ ],
759
+ "max": [
760
+ 0.9986844062805176,
761
+ 0.6781377792358398,
762
+ 1.3737616539001465
763
+ ],
764
+ "mean": [
765
+ 0.6642028093338013,
766
+ -0.007328854873776436,
767
+ 0.6810888051986694
768
+ ],
769
+ "std": [
770
+ 0.07928933948278427,
771
+ 0.31414082646369934,
772
+ 0.18238449096679688
773
+ ],
774
+ "count": [
775
+ 412635494
776
+ ]
777
+ },
778
+ "observation.images.head_right_fisheye": {
779
+ "min": [
780
+ [
781
+ [
782
+ 0.0
783
+ ]
784
+ ],
785
+ [
786
+ [
787
+ 0.0
788
+ ]
789
+ ],
790
+ [
791
+ [
792
+ 0.0
793
+ ]
794
+ ]
795
+ ],
796
+ "max": [
797
+ [
798
+ [
799
+ 1.0
800
+ ]
801
+ ],
802
+ [
803
+ [
804
+ 1.0
805
+ ]
806
+ ],
807
+ [
808
+ [
809
+ 1.0
810
+ ]
811
+ ]
812
+ ],
813
+ "mean": [
814
+ [
815
+ [
816
+ 0.2315717310162484
817
+ ]
818
+ ],
819
+ [
820
+ [
821
+ 0.23159259189816572
822
+ ]
823
+ ],
824
+ [
825
+ [
826
+ 0.22936478737541055
827
+ ]
828
+ ]
829
+ ],
830
+ "std": [
831
+ [
832
+ [
833
+ 0.14033539618767438
834
+ ]
835
+ ],
836
+ [
837
+ [
838
+ 0.14262738424953927
839
+ ]
840
+ ],
841
+ [
842
+ [
843
+ 0.14294225543343958
844
+ ]
845
+ ]
846
+ ],
847
+ "count": [
848
+ 377479
849
+ ]
850
+ },
851
+ "actions.robot.velocity": {
852
+ "min": [
853
+ -1.600000023841858,
854
+ -1.0
855
+ ],
856
+ "max": [
857
+ 1.600000023841858,
858
+ 1.0
859
+ ],
860
+ "mean": [
861
+ 0.003452081000432372,
862
+ -0.0004149683518335223
863
+ ],
864
+ "std": [
865
+ 0.1737053096294403,
866
+ 0.06619741767644882
867
+ ],
868
+ "count": [
869
+ 206317747
870
+ ]
871
+ },
872
+ "observation.images.head_left_fisheye": {
873
+ "min": [
874
+ [
875
+ [
876
+ 0.0
877
+ ]
878
+ ],
879
+ [
880
+ [
881
+ 0.0
882
+ ]
883
+ ],
884
+ [
885
+ [
886
+ 0.0
887
+ ]
888
+ ]
889
+ ],
890
+ "max": [
891
+ [
892
+ [
893
+ 1.0
894
+ ]
895
+ ],
896
+ [
897
+ [
898
+ 1.0
899
+ ]
900
+ ],
901
+ [
902
+ [
903
+ 1.0
904
+ ]
905
+ ]
906
+ ],
907
+ "mean": [
908
+ [
909
+ [
910
+ 0.2273635830751999
911
+ ]
912
+ ],
913
+ [
914
+ [
915
+ 0.22639874164807958
916
+ ]
917
+ ],
918
+ [
919
+ [
920
+ 0.22391866267180333
921
+ ]
922
+ ]
923
+ ],
924
+ "std": [
925
+ [
926
+ [
927
+ 0.13988676701500716
928
+ ]
929
+ ],
930
+ [
931
+ [
932
+ 0.1404964710507028
933
+ ]
934
+ ],
935
+ [
936
+ [
937
+ 0.1400171912642899
938
+ ]
939
+ ]
940
+ ],
941
+ "count": [
942
+ 377479
943
+ ]
944
+ },
945
+ "episode_index": {
946
+ "min": [
947
+ 0.0
948
+ ],
949
+ "max": [
950
+ 8232.0
951
+ ],
952
+ "mean": [
953
+ 767.5787963867188
954
+ ],
955
+ "std": [
956
+ 1093.5506591796875
957
+ ],
958
+ "count": [
959
+ 206317747
960
+ ]
961
+ },
962
+ "observation.states.joint.position": {
963
+ "min": [
964
+ -3.1426401138305664,
965
+ -1.8865216970443726,
966
+ -2.9275848865509033,
967
+ -1.5048705339431763,
968
+ -3.08754301071167,
969
+ -1.7474952936172485,
970
+ -3.140000104904175,
971
+ -2.8576574325561523,
972
+ -2.033689260482788,
973
+ -3.140000104904175,
974
+ -1.484890341758728,
975
+ -3.1418726444244385,
976
+ -1.7603733539581299,
977
+ -3.140075206756592
978
+ ],
979
+ "max": [
980
+ 2.8758695125579834,
981
+ 2.0117151737213135,
982
+ 3.138101577758789,
983
+ 1.4812432527542114,
984
+ 3.1430766582489014,
985
+ 1.7610539197921753,
986
+ 3.1404590606689453,
987
+ 3.140000104904175,
988
+ 1.9619559049606323,
989
+ 3.1351795196533203,
990
+ 1.5224602222442627,
991
+ 3.1516618728637695,
992
+ 1.741370439529419,
993
+ 3.140075206756592
994
+ ],
995
+ "mean": [
996
+ -1.2326810359954834,
997
+ 0.7677901387214661,
998
+ 0.8066206574440002,
999
+ -0.8516032695770264,
1000
+ 0.5997862815856934,
1001
+ 1.1834211349487305,
1002
+ 0.06289777904748917,
1003
+ 1.2573060989379883,
1004
+ -0.7288170456886292,
1005
+ -0.9002650380134583,
1006
+ 0.8033254742622375,
1007
+ -0.568104088306427,
1008
+ -1.1562278270721436,
1009
+ 0.0007565367268398404
1010
+ ],
1011
+ "std": [
1012
+ 0.5751113891601562,
1013
+ 0.3921157717704773,
1014
+ 0.6091160178184509,
1015
+ 0.4113500714302063,
1016
+ 0.5058683156967163,
1017
+ 0.3978140950202942,
1018
+ 0.6982151865959167,
1019
+ 0.6329915523529053,
1020
+ 0.44166192412376404,
1021
+ 0.6600342392921448,
1022
+ 0.41723132133483887,
1023
+ 0.4813174903392792,
1024
+ 0.3783360421657562,
1025
+ 0.7129143476486206
1026
+ ],
1027
+ "count": [
1028
+ 206317747
1029
+ ]
1030
+ },
1031
+ "observation.images.hand_left": {
1032
+ "min": [
1033
+ [
1034
+ [
1035
+ 0.0
1036
+ ]
1037
+ ],
1038
+ [
1039
+ [
1040
+ 0.0
1041
+ ]
1042
+ ],
1043
+ [
1044
+ [
1045
+ 0.0
1046
+ ]
1047
+ ]
1048
+ ],
1049
+ "max": [
1050
+ [
1051
+ [
1052
+ 1.0
1053
+ ]
1054
+ ],
1055
+ [
1056
+ [
1057
+ 1.0
1058
+ ]
1059
+ ],
1060
+ [
1061
+ [
1062
+ 1.0
1063
+ ]
1064
+ ]
1065
+ ],
1066
+ "mean": [
1067
+ [
1068
+ [
1069
+ 0.41786597235305717
1070
+ ]
1071
+ ],
1072
+ [
1073
+ [
1074
+ 0.4284313766342145
1075
+ ]
1076
+ ],
1077
+ [
1078
+ [
1079
+ 0.42881415371165427
1080
+ ]
1081
+ ]
1082
+ ],
1083
+ "std": [
1084
+ [
1085
+ [
1086
+ 0.2625953225261703
1087
+ ]
1088
+ ],
1089
+ [
1090
+ [
1091
+ 0.24836191157221135
1092
+ ]
1093
+ ],
1094
+ [
1095
+ [
1096
+ 0.2543553211379638
1097
+ ]
1098
+ ]
1099
+ ],
1100
+ "count": [
1101
+ 377479
1102
+ ]
1103
+ },
1104
+ "observation.states.end.orientation": {
1105
+ "min": [
1106
+ -0.8659883141517639,
1107
+ -0.707099199295044,
1108
+ -0.8578947186470032,
1109
+ -0.5
1110
+ ],
1111
+ "max": [
1112
+ 0.9999998211860657,
1113
+ 0.9999992251396179,
1114
+ 0.9986231923103333,
1115
+ 0.9996573328971863
1116
+ ],
1117
+ "mean": [
1118
+ -0.09570971131324768,
1119
+ 0.33084073662757874,
1120
+ -0.16653458774089813,
1121
+ 0.21365401148796082
1122
+ ],
1123
+ "std": [
1124
+ 0.5548442006111145,
1125
+ 0.4703128933906555,
1126
+ 0.38923993706703186,
1127
+ 0.3569991886615753
1128
+ ],
1129
+ "count": [
1130
+ 412635494
1131
+ ]
1132
+ },
1133
+ "frame_index": {
1134
+ "min": [
1135
+ 0.0
1136
+ ],
1137
+ "max": [
1138
+ 11152.0
1139
+ ],
1140
+ "mean": [
1141
+ 1134.8265380859375
1142
+ ],
1143
+ "std": [
1144
+ 928.8145751953125
1145
+ ],
1146
+ "count": [
1147
+ 206317747
1148
+ ]
1149
+ },
1150
+ "actions.waist.position": {
1151
+ "min": [
1152
+ -0.00019494339358061552,
1153
+ -3.51537273672875e-05
1154
+ ],
1155
+ "max": [
1156
+ 0.8899499773979187,
1157
+ 0.6806919574737549
1158
+ ],
1159
+ "mean": [
1160
+ 0.462455153465271,
1161
+ 0.2702843248844147
1162
+ ],
1163
+ "std": [
1164
+ 0.13027304410934448,
1165
+ 0.1003517434000969
1166
+ ],
1167
+ "count": [
1168
+ 206317747
1169
+ ]
1170
+ },
1171
+ "task_index": {
1172
+ "min": [
1173
+ 0.0
1174
+ ],
1175
+ "max": [
1176
+ 0.0
1177
+ ],
1178
+ "mean": [
1179
+ 0.0
1180
+ ],
1181
+ "std": [
1182
+ 0.0
1183
+ ],
1184
+ "count": [
1185
+ 206317747
1186
+ ]
1187
+ }
1188
+ }
1189
+ }
train_config.json ADDED
@@ -0,0 +1,270 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dataset": {
3
+ "type": "qwena1",
4
+ "repo_id": "agibotworld/task_480 agibotworld/task_544 agibotworld/task_501 agibotworld/task_372 agibotworld/task_422 agibotworld/task_537 agibotworld/task_366 agibotworld/task_561 agibotworld/task_568 agibotworld/task_421 agibotworld/task_361 agibotworld/task_551 agibotworld/task_363 agibotworld/task_582 agibotworld/task_445 agibotworld/task_444 agibotworld/task_373 agibotworld/task_464 agibotworld/task_542 agibotworld/task_470 agibotworld/task_541 agibotworld/task_487 agibotworld/task_454 agibotworld/task_540 agibotworld/task_431 agibotworld/task_596 agibotworld/task_545 agibotworld/task_362 agibotworld/task_587 agibotworld/task_468 agibotworld/task_616 agibotworld/task_566 agibotworld/task_609 agibotworld/task_365 agibotworld/task_455 agibotworld/task_590 agibotworld/task_683 agibotworld/task_600 agibotworld/task_390 agibotworld/task_613 agibotworld/task_563 agibotworld/task_369 agibotworld/task_567 agibotworld/task_573 agibotworld/task_392 agibotworld/task_368 agibotworld/task_360 agibotworld/task_532 agibotworld/task_351 agibotworld/task_491 agibotworld/task_438 agibotworld/task_357 agibotworld/task_528 agibotworld/task_498 agibotworld/task_511 agibotworld/task_604 agibotworld/task_602 agibotworld/task_429 agibotworld/task_509 agibotworld/task_385 agibotworld/task_521 agibotworld/task_619 agibotworld/task_503 agibotworld/task_424 agibotworld/task_398 agibotworld/task_453 agibotworld/task_374 agibotworld/task_486 agibotworld/task_529 agibotworld/task_520 agibotworld/task_471 agibotworld/task_446 agibotworld/task_465 agibotworld/task_485 agibotworld/task_377 agibotworld/task_512 agibotworld/task_440 agibotworld/task_555 agibotworld/task_688 agibotworld/task_352 agibotworld/task_570 agibotworld/task_507 agibotworld/task_575 agibotworld/task_389 agibotworld/task_494 agibotworld/task_356 agibotworld/task_378 agibotworld/task_558 agibotworld/task_376 agibotworld/task_358 agibotworld/task_506 agibotworld/task_451 agibotworld/task_463 agibotworld/task_550 agibotworld/task_589 agibotworld/task_621 agibotworld/task_466 agibotworld/task_574 agibotworld/task_414 agibotworld/task_504 agibotworld/task_692 agibotworld/task_327 agibotworld/task_588 agibotworld/task_533 agibotworld/task_508 agibotworld/task_525 agibotworld/task_515 agibotworld/task_664 agibotworld/task_375 agibotworld/task_388 agibotworld/task_527 agibotworld/task_359 agibotworld/task_593 agibotworld/task_543 agibotworld/task_483 agibotworld/task_434 agibotworld/task_607 agibotworld/task_425 agibotworld/task_695 agibotworld/task_474 agibotworld/task_689 agibotworld/task_433 agibotworld/task_580 agibotworld/task_410 agibotworld/task_478 agibotworld/task_367 agibotworld/task_597 agibotworld/task_534 agibotworld/task_452 agibotworld/task_522 agibotworld/task_535 agibotworld/task_658 agibotworld/task_462 agibotworld/task_492 agibotworld/task_556 agibotworld/task_603 agibotworld/task_682 agibotworld/task_477 agibotworld/task_599 agibotworld/task_681 agibotworld/task_354 agibotworld/task_524 agibotworld/task_497 agibotworld/task_584 agibotworld/task_598",
5
+ "root": null,
6
+ "episodes": null,
7
+ "image_transforms": {
8
+ "enable": false,
9
+ "max_num_transforms": 3,
10
+ "random_order": false,
11
+ "tfs": {
12
+ "brightness": {
13
+ "weight": 1.0,
14
+ "type": "ColorJitter",
15
+ "kwargs": {
16
+ "brightness": [
17
+ 0.8,
18
+ 1.2
19
+ ]
20
+ }
21
+ },
22
+ "contrast": {
23
+ "weight": 1.0,
24
+ "type": "ColorJitter",
25
+ "kwargs": {
26
+ "contrast": [
27
+ 0.8,
28
+ 1.2
29
+ ]
30
+ }
31
+ },
32
+ "saturation": {
33
+ "weight": 1.0,
34
+ "type": "ColorJitter",
35
+ "kwargs": {
36
+ "saturation": [
37
+ 0.5,
38
+ 1.5
39
+ ]
40
+ }
41
+ },
42
+ "hue": {
43
+ "weight": 1.0,
44
+ "type": "ColorJitter",
45
+ "kwargs": {
46
+ "hue": [
47
+ -0.05,
48
+ 0.05
49
+ ]
50
+ }
51
+ },
52
+ "sharpness": {
53
+ "weight": 1.0,
54
+ "type": "SharpnessJitter",
55
+ "kwargs": {
56
+ "sharpness": [
57
+ 0.5,
58
+ 1.5
59
+ ]
60
+ }
61
+ },
62
+ "affine": {
63
+ "weight": 1.0,
64
+ "type": "RandomAffine",
65
+ "kwargs": {
66
+ "degrees": [
67
+ -5.0,
68
+ 5.0
69
+ ],
70
+ "translate": [
71
+ 0.05,
72
+ 0.05
73
+ ]
74
+ }
75
+ }
76
+ }
77
+ },
78
+ "revision": null,
79
+ "use_imagenet_stats": true,
80
+ "use_external_stats": true,
81
+ "video_backend": "torchcodec",
82
+ "streaming": false,
83
+ "dist_loading": true,
84
+ "buffer_size": 1024,
85
+ "action_mode": "delta",
86
+ "repack_transforms": {
87
+ "inputs": [],
88
+ "outputs": []
89
+ },
90
+ "data_transforms": {
91
+ "inputs": [
92
+ {
93
+ "type": "delta_action",
94
+ "mask": null,
95
+ "mapping": {}
96
+ },
97
+ {
98
+ "type": "resize_with_pad",
99
+ "height": 224,
100
+ "width": 224,
101
+ "mode": "bilinear"
102
+ },
103
+ {
104
+ "type": "remap_image_key",
105
+ "mapping": {}
106
+ },
107
+ {
108
+ "type": "qwena1_processor",
109
+ "pretrained_model_name_or_path": "Qwen/Qwen3-VL-2B-Instruct",
110
+ "max_length": 48,
111
+ "task_key": "task",
112
+ "padding_side": "right",
113
+ "padding": "max_length",
114
+ "truncation": true,
115
+ "spatial_merge_size": 2,
116
+ "vision_start_token_id": 151652,
117
+ "vision_end_token_id": 151653,
118
+ "image_token_id": 151655,
119
+ "process": null
120
+ },
121
+ {
122
+ "type": "normalize",
123
+ "selected_keys": null,
124
+ "mode": "mean_std",
125
+ "norm_stats": {}
126
+ },
127
+ {
128
+ "type": "compose_fields",
129
+ "mapping": {}
130
+ },
131
+ {
132
+ "type": "pad_state_and_action",
133
+ "max_state_dim": 32,
134
+ "max_action_dim": 32
135
+ },
136
+ {
137
+ "type": "unify_qwena1_inputs"
138
+ }
139
+ ],
140
+ "outputs": []
141
+ },
142
+ "model_transforms": {
143
+ "inputs": [],
144
+ "outputs": []
145
+ },
146
+ "height": 224,
147
+ "width": 224,
148
+ "max_state_dim": 32,
149
+ "max_action_dim": 32
150
+ },
151
+ "env": null,
152
+ "policy": {
153
+ "type": "qwena1",
154
+ "n_obs_steps": 1,
155
+ "input_features": {
156
+ "observation.state": {
157
+ "type": "STATE",
158
+ "shape": [
159
+ 32
160
+ ]
161
+ }
162
+ },
163
+ "output_features": {
164
+ "action": {
165
+ "type": "ACTION",
166
+ "shape": [
167
+ 32
168
+ ]
169
+ }
170
+ },
171
+ "device": "cuda",
172
+ "use_amp": false,
173
+ "push_to_hub": false,
174
+ "repo_id": "jcaiaq/qwena1",
175
+ "private": null,
176
+ "tags": null,
177
+ "license": null,
178
+ "pretrained_path": null,
179
+ "qwen3_vl_variant": "qwen3_vl_28l",
180
+ "action_expert_variant": "qwen3_28l",
181
+ "dtype": "bfloat16",
182
+ "chunk_size": 50,
183
+ "n_action_steps": 50,
184
+ "max_state_dim": 32,
185
+ "max_action_dim": 32,
186
+ "num_inference_steps": 10,
187
+ "time_sampling_beta_alpha": 1.5,
188
+ "time_sampling_beta_beta": 1.0,
189
+ "time_sampling_scale": 0.999,
190
+ "time_sampling_offset": 0.001,
191
+ "min_period": 0.004,
192
+ "max_period": 4.0,
193
+ "image_resolution": [
194
+ 224,
195
+ 224
196
+ ],
197
+ "empty_cameras": 0,
198
+ "normalization_mapping": {
199
+ "VISUAL": "IDENTITY",
200
+ "STATE": "IDENTITY",
201
+ "ACTION": "IDENTITY"
202
+ },
203
+ "gradient_checkpointing": false,
204
+ "compile_model": false,
205
+ "compile_mode": "max-autotune",
206
+ "optimizer_lr": 5e-05,
207
+ "optimizer_betas": [
208
+ 0.9,
209
+ 0.95
210
+ ],
211
+ "optimizer_eps": 1e-08,
212
+ "optimizer_weight_decay": 0.01,
213
+ "optimizer_grad_clip_norm": 1.0,
214
+ "scheduler_warmup_steps": 0,
215
+ "scheduler_decay_steps": 700000,
216
+ "scheduler_decay_lr": 5e-05,
217
+ "tokenizer_max_length": 48,
218
+ "freeze_vision_encoder": false,
219
+ "train_expert_only": false,
220
+ "train_vlm_only": false,
221
+ "scale_factor": 8,
222
+ "lambda_gen": 0.01
223
+ },
224
+ "output_dir": "/mnt/shared-storage-user/internvla/Users/caijunhao/lerobot/outputs/qwena1/2025_12_19_17_36_47-qwena1-agibotworld-delta-28l-pretrain",
225
+ "job_name": "2025_12_19_17_36_47-qwena1-agibotworld-delta-28l-pretrain",
226
+ "resume": false,
227
+ "seed": 7777,
228
+ "num_workers": 8,
229
+ "batch_size": 16,
230
+ "steps": 700000,
231
+ "eval_freq": 20000,
232
+ "log_freq": 100,
233
+ "save_checkpoint": true,
234
+ "save_freq": 10000,
235
+ "use_policy_training_preset": true,
236
+ "optimizer": {
237
+ "type": "adamw",
238
+ "lr": 5e-05,
239
+ "weight_decay": 0.01,
240
+ "grad_clip_norm": 1.0,
241
+ "betas": [
242
+ 0.9,
243
+ 0.95
244
+ ],
245
+ "eps": 1e-08
246
+ },
247
+ "scheduler": {
248
+ "type": "cosine_decay_with_warmup",
249
+ "num_warmup_steps": 0,
250
+ "num_decay_steps": 700000,
251
+ "peak_lr": 5e-05,
252
+ "decay_lr": 5e-05
253
+ },
254
+ "eval": {
255
+ "n_episodes": 50,
256
+ "batch_size": 50,
257
+ "use_async_envs": false
258
+ },
259
+ "wandb": {
260
+ "enable": true,
261
+ "disable_artifact": false,
262
+ "project": "lerobot_qwena1",
263
+ "entity": null,
264
+ "notes": null,
265
+ "run_id": "oz1n1l3h",
266
+ "mode": "offline"
267
+ },
268
+ "checkpoint_path": null,
269
+ "rename_map": {}
270
+ }