Guowei-Zou commited on
Commit
a381864
Β·
verified Β·
1 Parent(s): c0400c7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +77 -0
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - robotics
5
+ - reinforcement-learning
6
+ - imitation-learning
7
+ - diffusion-policy
8
+ - flow-matching
9
+ - robomimic
10
+ - mujoco
11
+ language:
12
+ - en
13
+ library_name: pytorch
14
+ pipeline_tag: reinforcement-learning
15
+ ---
16
+
17
+ # DMPO Pretrained Checkpoints
18
+
19
+ Pretrained checkpoints for **DMPO: Dispersive MeanFlow Policy Optimization**.
20
+
21
+ [![Paper](https://img.shields.io/badge/arXiv-2601.20701-B31B1B)](http://arxiv.org/abs/2601.20701)
22
+ [![Code](https://img.shields.io/badge/GitHub-dmpo--release-blue)](https://github.com/Guowei-Zou/dmpo-release)
23
+ [![Project Page](https://img.shields.io/badge/Project-Page-4285F4)](https://guowei-zou.github.io/dmpo-page/)
24
+
25
+ ## Overview
26
+
27
+ DMPO enables **true one-step generation** for real-time robotic control via MeanFlow, dispersive regularization, and RL fine-tuning. These checkpoints can be used directly for fine-tuning with PPO.
28
+
29
+ ## Checkpoint Structure
30
+
31
+ ```
32
+ pretrained_checkpoints/
33
+ β”œβ”€β”€ DMPO_pretrained_gym_checkpoints/
34
+ β”‚ β”œβ”€β”€ gym_improved_meanflow/ # MeanFlow without dispersive loss
35
+ β”‚ └── gym_improved_meanflow_dispersive/ # MeanFlow with dispersive loss (recommended)
36
+ └── DMPO_pretraining_robomimic_checkpoints/
37
+ β”œβ”€β”€ w_0p1/ # dispersive weight = 0.1
38
+ β”œβ”€β”€ w_0p5/ # dispersive weight = 0.5 (recommended)
39
+ └── w_0p9/ # dispersive weight = 0.9
40
+ ```
41
+
42
+ ## Supported Tasks
43
+
44
+ | Domain | Tasks |
45
+ |--------|-------|
46
+ | OpenAI Gym | hopper, walker2d, ant, humanoid, kitchen-* |
47
+ | Robomimic (RGB) | lift, can, square, transport |
48
+
49
+ ## Usage
50
+
51
+ Use the `hf://` prefix in config files to auto-download:
52
+
53
+ ```yaml
54
+ # Gym tasks
55
+ base_policy_path: hf://pretrained_checkpoints/DMPO_pretrained_gym_checkpoints/gym_improved_meanflow_dispersive/hopper-medium-v2_best.pt
56
+
57
+ # Robomimic tasks
58
+ base_policy_path: hf://pretrained_checkpoints/DMPO_pretraining_robomimic_checkpoints/w_0p5/can/can_w0p5_08_meanflow_dispersive.pt
59
+ ```
60
+
61
+ ## Citation
62
+
63
+ ```bibtex
64
+ @misc{zou2026stepenoughdispersivemeanflow,
65
+ title={One Step Is Enough: Dispersive MeanFlow Policy Optimization},
66
+ author={Guowei Zou and Haitao Wang and Hejun Wu and Yukun Qian and Yuhang Wang and Weibing Li},
67
+ year={2026},
68
+ eprint={2601.20701},
69
+ archivePrefix={arXiv},
70
+ primaryClass={cs.RO},
71
+ url={https://arxiv.org/abs/2601.20701},
72
+ }
73
+ ```
74
+
75
+ ## License
76
+
77
+ MIT License