dmux commited on
Commit
ce4ff62
·
verified ·
1 Parent(s): ceed1af

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -12
README.md CHANGED
@@ -24,7 +24,6 @@ Official pretrained model weights for **DR.Q**, presented at the **Forty-third I
24
 
25
  > **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
26
 
27
- ---
28
 
29
  ## Model Description
30
 
@@ -68,8 +67,6 @@ python main.py --env HBench-h1-run-v0
68
 
69
  Pretrained model weights for all reported tasks are hosted here on HuggingFace
70
 
71
- ---
72
-
73
  ## Training Details
74
 
75
  ### Evaluated Benchmark Suites
@@ -90,13 +87,11 @@ Pretrained model weights for all reported tasks are hosted here on HuggingFace
90
  - **Hardware:** CUDA GPU (CPU also supported)
91
  - **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
92
 
93
- ---
94
 
95
  ## Evaluation Results
96
 
97
  All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
98
 
99
- ---
100
 
101
  ### Gym MuJoCo Tasks (1M environment steps)
102
 
@@ -113,7 +108,6 @@ Full comparison against domain-specific and general model-free / model-based RL
113
  | **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
114
  | **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
115
 
116
- ---
117
 
118
  ### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
119
 
@@ -146,7 +140,6 @@ Aggregate metrics reported in units of 1k.
146
  | **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
147
  | **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
148
 
149
- ---
150
 
151
  ### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
152
 
@@ -165,7 +158,6 @@ Aggregate metrics reported in units of 1k.
165
  | **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
166
  | **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
167
 
168
- ---
169
 
170
  ### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
171
 
@@ -189,7 +181,6 @@ Pixel-based observations at 84×84 resolution. Aggregate metrics computed over t
189
  | **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
190
  | **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
191
 
192
- ---
193
 
194
  ### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
195
 
@@ -215,7 +206,6 @@ Aggregate metrics computed over the success normalized score.
215
  | **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
216
  | **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
217
 
218
- ---
219
 
220
  ### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
221
 
@@ -241,7 +231,6 @@ Aggregate metrics computed over the success normalized score.
241
  | **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
242
  | **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
243
 
244
- ---
245
 
246
  ## Citation
247
 
@@ -255,7 +244,6 @@ Aggregate metrics computed over the success normalized score.
255
  }
256
  ```
257
 
258
- ---
259
 
260
  ## Acknowledgements
261
 
 
24
 
25
  > **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
26
 
 
27
 
28
  ## Model Description
29
 
 
67
 
68
  Pretrained model weights for all reported tasks are hosted here on HuggingFace
69
 
 
 
70
  ## Training Details
71
 
72
  ### Evaluated Benchmark Suites
 
87
  - **Hardware:** CUDA GPU (CPU also supported)
88
  - **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
89
 
 
90
 
91
  ## Evaluation Results
92
 
93
  All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
94
 
 
95
 
96
  ### Gym MuJoCo Tasks (1M environment steps)
97
 
 
108
  | **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
109
  | **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
110
 
 
111
 
112
  ### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
113
 
 
140
  | **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
141
  | **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
142
 
 
143
 
144
  ### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
145
 
 
158
  | **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
159
  | **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
160
 
 
161
 
162
  ### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
163
 
 
181
  | **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
182
  | **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
183
 
 
184
 
185
  ### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
186
 
 
206
  | **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
207
  | **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
208
 
 
209
 
210
  ### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
211
 
 
231
  | **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
232
  | **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
233
 
 
234
 
235
  ## Citation
236
 
 
244
  }
245
  ```
246
 
 
247
 
248
  ## Acknowledgements
249