From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning (DICE-RL)

This repository contains the checkpoints for DICE-RL, a framework that uses reinforcement learning (RL) as a "distribution contraction" operator to refine pretrained generative robot policies.

Project Website | Paper | GitHub

Introduction

Distribution Contractive Reinforcement Learning (DICE-RL) turns a pretrained behavior prior into a high-performing "pro" policy by amplifying high-success behaviors from online feedback. The framework pretrains a diffusion- or flow-based policy for broad behavioral coverage, then finetunes it with a stable, sample-efficient residual off-policy RL framework that combines selective behavior regularization with value-guided action selection. It enables mastery of complex long-horizon manipulation skills directly from high-dimensional pixel inputs.

Evaluation

To evaluate the finetuned RL checkpoints and pretrained BC checkpoints and to get success rates for both, use the following command from the official repository:

python script/eval_rl_checkpoint.py --ckpt_path path_to_finetuned_checkpoint --num_eval_episodes 10 --eval_n_envs 10

The output will include the success rates for both the finetuned RL checkpoint and the pretrained BC checkpoint, as well as the gain of the finetuned RL checkpoint over the pretrained BC checkpoint.

Citation

If you find this work or the checkpoints useful, please consider citing:

@article{sun2026prior,
  title={From Prior to Pro: Efficient Skill Mastery via Distribution Contractive RL Finetuning},
  author={Sun, Zhanyi and Song, Shuran},
  journal={arXiv preprint arXiv:2603.10263},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for wintermelontree/robomimic-pretrain-checkpoints