metadata
license: mit
tags:
- robotics
- reinforcement-learning
- imitation-learning
- diffusion-policy
- flow-matching
- robomimic
- mujoco
language:
- en
library_name: pytorch
pipeline_tag: reinforcement-learning
DMPO Pretrained Checkpoints
Pretrained checkpoints for DMPO: Dispersive MeanFlow Policy Optimization.
Overview
DMPO enables true one-step generation for real-time robotic control via MeanFlow, dispersive regularization, and RL fine-tuning. These checkpoints can be used directly for fine-tuning with PPO.
Checkpoint Structure
pretrained_checkpoints/
βββ DMPO_pretrained_gym_checkpoints/
β βββ gym_improved_meanflow/ # MeanFlow without dispersive loss
β βββ gym_improved_meanflow_dispersive/ # MeanFlow with dispersive loss (recommended)
βββ DMPO_pretraining_robomimic_checkpoints/
βββ w_0p1/ # dispersive weight = 0.1
βββ w_0p5/ # dispersive weight = 0.5 (recommended)
βββ w_0p9/ # dispersive weight = 0.9
Supported Tasks
| Domain | Tasks |
|---|---|
| OpenAI Gym | hopper, walker2d, ant, humanoid, kitchen-* |
| Robomimic (RGB) | lift, can, square, transport |
Usage
Use the hf:// prefix in config files to auto-download:
# Gym tasks
base_policy_path: hf://pretrained_checkpoints/DMPO_pretrained_gym_checkpoints/gym_improved_meanflow_dispersive/hopper-medium-v2_best.pt
# Robomimic tasks
base_policy_path: hf://pretrained_checkpoints/DMPO_pretraining_robomimic_checkpoints/w_0p5/can/can_w0p5_08_meanflow_dispersive.pt
Citation
@misc{zou2026stepenoughdispersivemeanflow,
title={One Step Is Enough: Dispersive MeanFlow Policy Optimization},
author={Guowei Zou and Haitao Wang and Hejun Wu and Yukun Qian and Yuhang Wang and Weibing Li},
year={2026},
eprint={2601.20701},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.20701},
}
License
MIT License