DMPO-checkpoints / README.md
Guowei-Zou's picture
Upload README.md with huggingface_hub
a381864 verified
---
license: mit
tags:
- robotics
- reinforcement-learning
- imitation-learning
- diffusion-policy
- flow-matching
- robomimic
- mujoco
language:
- en
library_name: pytorch
pipeline_tag: reinforcement-learning
---
# DMPO Pretrained Checkpoints
Pretrained checkpoints for **DMPO: Dispersive MeanFlow Policy Optimization**.
[![Paper](https://img.shields.io/badge/arXiv-2601.20701-B31B1B)](http://arxiv.org/abs/2601.20701)
[![Code](https://img.shields.io/badge/GitHub-dmpo--release-blue)](https://github.com/Guowei-Zou/dmpo-release)
[![Project Page](https://img.shields.io/badge/Project-Page-4285F4)](https://guowei-zou.github.io/dmpo-page/)
## Overview
DMPO enables **true one-step generation** for real-time robotic control via MeanFlow, dispersive regularization, and RL fine-tuning. These checkpoints can be used directly for fine-tuning with PPO.
## Checkpoint Structure
```
pretrained_checkpoints/
β”œβ”€β”€ DMPO_pretrained_gym_checkpoints/
β”‚ β”œβ”€β”€ gym_improved_meanflow/ # MeanFlow without dispersive loss
β”‚ └── gym_improved_meanflow_dispersive/ # MeanFlow with dispersive loss (recommended)
└── DMPO_pretraining_robomimic_checkpoints/
β”œβ”€β”€ w_0p1/ # dispersive weight = 0.1
β”œβ”€β”€ w_0p5/ # dispersive weight = 0.5 (recommended)
└── w_0p9/ # dispersive weight = 0.9
```
## Supported Tasks
| Domain | Tasks |
|--------|-------|
| OpenAI Gym | hopper, walker2d, ant, humanoid, kitchen-* |
| Robomimic (RGB) | lift, can, square, transport |
## Usage
Use the `hf://` prefix in config files to auto-download:
```yaml
# Gym tasks
base_policy_path: hf://pretrained_checkpoints/DMPO_pretrained_gym_checkpoints/gym_improved_meanflow_dispersive/hopper-medium-v2_best.pt
# Robomimic tasks
base_policy_path: hf://pretrained_checkpoints/DMPO_pretraining_robomimic_checkpoints/w_0p5/can/can_w0p5_08_meanflow_dispersive.pt
```
## Citation
```bibtex
@misc{zou2026stepenoughdispersivemeanflow,
title={One Step Is Enough: Dispersive MeanFlow Policy Optimization},
author={Guowei Zou and Haitao Wang and Hejun Wu and Yukun Qian and Yuhang Wang and Weibing Li},
year={2026},
eprint={2601.20701},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2601.20701},
}
```
## License
MIT License