dmux
/

DR.Q

@@ -24,7 +24,6 @@ Official pretrained model weights for **DR.Q**, presented at the **Forty-third I
 > **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
----
 ## Model Description
@@ -68,8 +67,6 @@ python main.py --env HBench-h1-run-v0
 Pretrained model weights for all reported tasks are hosted here on HuggingFace
----
 ## Training Details
 ### Evaluated Benchmark Suites
@@ -90,13 +87,11 @@ Pretrained model weights for all reported tasks are hosted here on HuggingFace
 - **Hardware:** CUDA GPU (CPU also supported)
 - **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
----
 ## Evaluation Results
 All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
----
 ### Gym MuJoCo Tasks (1M environment steps)
@@ -113,7 +108,6 @@ Full comparison against domain-specific and general model-free / model-based RL
 | **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
 | **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
----
 ### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
@@ -146,7 +140,6 @@ Aggregate metrics reported in units of 1k.
 | **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
 | **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
----
 ### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
@@ -165,7 +158,6 @@ Aggregate metrics reported in units of 1k.
 | **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
 | **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
----
 ### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
@@ -189,7 +181,6 @@ Pixel-based observations at 84×84 resolution. Aggregate metrics computed over t
 | **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
 | **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
----
 ### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
@@ -215,7 +206,6 @@ Aggregate metrics computed over the success normalized score.
 | **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
 | **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
----
 ### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
@@ -241,7 +231,6 @@ Aggregate metrics computed over the success normalized score.
 | **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
 | **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
----
 ## Citation
@@ -255,7 +244,6 @@ Aggregate metrics computed over the success normalized score.
 }
 ```
----
 ## Acknowledgements

 > **Authors:** Jiafei Lyu, Zichuan Lin, Scott Fujimoto, Kai Yang, Yangkun Chen, Saiyong Yang, Zongqing Lu, Deheng Ye
 ## Model Description
 Pretrained model weights for all reported tasks are hosted here on HuggingFace
 ## Training Details
 ### Evaluated Benchmark Suites
 - **Hardware:** CUDA GPU (CPU also supported)
 - **Seeds:** Results averaged over 10 random seeds with 95% bootstrap confidence intervals
 ## Evaluation Results
 All results report the **final average return** at the end of training. Aggregate metrics (IQM, Median, Mean) are computed over the task-specific normalized score. Values in [brackets] denote **95% bootstrap confidence intervals**.
 ### Gym MuJoCo Tasks (1M environment steps)
 | **Median** | 1.550 [1.450, 1.630] | 1.180 [0.830, 1.220] | 1.488 [1.340, 1.623] | 1.261 [1.080, 1.344] | 1.616 [1.490, 1.744] | **1.564** [1.416, 1.806] |
 | **Mean** | 1.570 [1.540, 1.600] | 1.040 [0.920, 1.150] | 1.465 [1.346, 1.585] | 1.196 [1.082, 1.307] | 1.617 [1.513, 1.718] | **1.608** [1.449, 1.759] |
 ### DMC-Easy Tasks (500K steps / 1M env steps with action repeat 2)
 | **Median** | 0.876 [0.847, 0.905] | 0.870 [0.841, 0.896] | 0.875 [0.847, 0.905] | 0.874 [0.845, 0.904] | **0.885** [0.863, 0.912] |
 | **Mean** | 0.874 [0.848, 0.898] | 0.864 [0.840, 0.887] | 0.874 [0.849, 0.897] | 0.873 [0.847, 0.897] | **0.886** [0.865, 0.906] |
 ### DMC-Hard Tasks (500K steps / 1M env steps with action repeat 2)
 | **Median** | 0.486 [0.265, 0.658] | 0.722 [0.654, 0.797] | 0.706 [0.647, 0.772] | 0.729 [0.655, 0.808] | 0.788 [0.724, 0.855] | **0.844** [0.796, 0.893] |
 | **Mean** | 0.465 [0.329, 0.606] | 0.723 [0.660, 0.781] | 0.706 [0.656, 0.755] | 0.729 [0.664, 0.791] | 0.787 [0.730, 0.840] | **0.842** [0.800, 0.881] |
 ### DMC Visual Tasks (500K steps / 1M env steps with action repeat 2)
 | **Median** | 0.191 [0.172, 0.211] | 0.013 [0.012, 0.013] | 0.295 [0.198, 0.339] | 0.134 [0.124, 0.198] | 0.398 [0.320, 0.466] | **0.500** [0.427, 0.576] |
 | **Mean** | 0.321 [0.303, 0.340] | 0.034 [0.031, 0.037] | 0.269 [0.214, 0.326] | 0.247 [0.231, 0.262] | 0.395 [0.335, 0.457] | **0.501** [0.439, 0.564] |
 ### HumanoidBench — Without Dexterous Hands (500K steps / 1M env steps with action repeat 2)
 | **Median** | 0.598 [0.514, 0.692] | 0.781 [0.693, 0.865] | 0.602 [0.516, 0.687] | 0.794 [0.705, 0.899] | **0.823** [0.733, 0.920] |
 | **Mean** | 0.606 [0.536, 0.678] | 0.776 [0.705, 0.849] | 0.604 [0.531, 0.677] | 0.802 [0.721, 0.883] | **0.825** [0.748, 0.902] |
 ### HumanoidBench — With Dexterous Hands (500K steps / 1M env steps with action repeat 2)
 | **Median** | 0.021 [0.010, 0.030] | 0.298 [0.147, 0.433] | 0.356 [0.269, 0.413] | 0.420 [0.338, 0.491] | 0.388 [0.313, 0.449] | 0.342 [0.268, 0.395] | **0.529** [0.455, 0.607] |
 | **Mean** | 0.020 [0.011, 0.028] | 0.282 [0.169, 0.413] | 0.345 [0.286, 0.406] | 0.417 [0.356, 0.482] | 0.385 [0.329, 0.443] | 0.336 [0.285, 0.393] | **0.534** [0.473, 0.595] |
 ## Citation
 }
 ```
 ## Acknowledgements