Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,6 @@ Official pretrained model weights for **DR.Q**, presented at the **Forty-third I
|
|
| 24 |
|
| 25 |
> **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
|
| 26 |
|
| 27 |
-
---
|
| 28 |
|
| 29 |
## Model Description
|
| 30 |
|
|
@@ -68,8 +67,6 @@ python main.py --env HBench-h1-run-v0
|
|
| 68 |
|
| 69 |
Pretrained model weights for all reported tasks are hosted here on HuggingFace
|
| 70 |
|
| 71 |
-
---
|
| 72 |
-
|
| 73 |
## Training Details
|
| 74 |
|
| 75 |
### Evaluated Benchmark Suites
|
|
@@ -90,13 +87,11 @@ Pretrained model weights for all reported tasks are hosted here on HuggingFace
|
|
| 90 |
- **Hardware:** CUDA GPU (CPU also supported)
|
| 91 |
- **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
|
| 92 |
|
| 93 |
-
---
|
| 94 |
|
| 95 |
## Evaluation Results
|
| 96 |
|
| 97 |
All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
|
| 98 |
|
| 99 |
-
---
|
| 100 |
|
| 101 |
### Gym MuJoCo Tasks (1M environment steps)
|
| 102 |
|
|
@@ -113,7 +108,6 @@ Full comparison against domain-specific and general model-free / model-based RL
|
|
| 113 |
| **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
|
| 114 |
| **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
|
| 115 |
|
| 116 |
-
---
|
| 117 |
|
| 118 |
### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
|
| 119 |
|
|
@@ -146,7 +140,6 @@ Aggregate metrics reported in units of 1k.
|
|
| 146 |
| **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
|
| 147 |
| **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
|
| 148 |
|
| 149 |
-
---
|
| 150 |
|
| 151 |
### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
|
| 152 |
|
|
@@ -165,7 +158,6 @@ Aggregate metrics reported in units of 1k.
|
|
| 165 |
| **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
|
| 166 |
| **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
|
| 167 |
|
| 168 |
-
---
|
| 169 |
|
| 170 |
### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
|
| 171 |
|
|
@@ -189,7 +181,6 @@ Pixel-based observations at 84×84 resolution. Aggregate metrics computed over t
|
|
| 189 |
| **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
|
| 190 |
| **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
|
| 191 |
|
| 192 |
-
---
|
| 193 |
|
| 194 |
### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
|
| 195 |
|
|
@@ -215,7 +206,6 @@ Aggregate metrics computed over the success normalized score.
|
|
| 215 |
| **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
|
| 216 |
| **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
|
| 217 |
|
| 218 |
-
---
|
| 219 |
|
| 220 |
### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
|
| 221 |
|
|
@@ -241,7 +231,6 @@ Aggregate metrics computed over the success normalized score.
|
|
| 241 |
| **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
|
| 242 |
| **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
|
| 243 |
|
| 244 |
-
---
|
| 245 |
|
| 246 |
## Citation
|
| 247 |
|
|
@@ -255,7 +244,6 @@ Aggregate metrics computed over the success normalized score.
|
|
| 255 |
}
|
| 256 |
```
|
| 257 |
|
| 258 |
-
---
|
| 259 |
|
| 260 |
## Acknowledgements
|
| 261 |
|
|
|
|
| 24 |
|
| 25 |
> **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
|
| 26 |
|
|
|
|
| 27 |
|
| 28 |
## Model Description
|
| 29 |
|
|
|
|
| 67 |
|
| 68 |
Pretrained model weights for all reported tasks are hosted here on HuggingFace
|
| 69 |
|
|
|
|
|
|
|
| 70 |
## Training Details
|
| 71 |
|
| 72 |
### Evaluated Benchmark Suites
|
|
|
|
| 87 |
- **Hardware:** CUDA GPU (CPU also supported)
|
| 88 |
- **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
|
| 89 |
|
|
|
|
| 90 |
|
| 91 |
## Evaluation Results
|
| 92 |
|
| 93 |
All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
|
| 94 |
|
|
|
|
| 95 |
|
| 96 |
### Gym MuJoCo Tasks (1M environment steps)
|
| 97 |
|
|
|
|
| 108 |
| **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
|
| 109 |
| **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
|
| 110 |
|
|
|
|
| 111 |
|
| 112 |
### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
|
| 113 |
|
|
|
|
| 140 |
| **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
|
| 141 |
| **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
|
| 142 |
|
|
|
|
| 143 |
|
| 144 |
### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
|
| 145 |
|
|
|
|
| 158 |
| **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
|
| 159 |
| **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
|
| 160 |
|
|
|
|
| 161 |
|
| 162 |
### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
|
| 163 |
|
|
|
|
| 181 |
| **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
|
| 182 |
| **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
|
| 183 |
|
|
|
|
| 184 |
|
| 185 |
### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
|
| 186 |
|
|
|
|
| 206 |
| **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
|
| 207 |
| **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
|
| 208 |
|
|
|
|
| 209 |
|
| 210 |
### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
|
| 211 |
|
|
|
|
| 231 |
| **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
|
| 232 |
| **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
|
| 233 |
|
|
|
|
| 234 |
|
| 235 |
## Citation
|
| 236 |
|
|
|
|
| 244 |
}
|
| 245 |
```
|
| 246 |
|
|
|
|
| 247 |
|
| 248 |
## Acknowledgements
|
| 249 |
|