Multi-Task DiT (Diffusion) — rm65b-sort-v0

LeRobot Multi-Task DiT policy trained on rm65b-sort-v0 (bimanual RM65 sort task, 3 cameras, 198 episodes / 91.5k frames @ 30 Hz, 1 task).

This repo holds three candidate deployment checkpoints from the same 100k-step run.

Subfolder	Train step	Final train loss
`checkpoint-060000`	60,000	0.009
`checkpoint-080000`	80,000	0.008
`checkpoint-100000`	100,000 (last)	0.007

Training summary

Architecture: Multi-Task DiT (multi_task_dit), CLIP ViT-B/16 vision (separate encoder per camera) + CLIP text encoder
Objective: diffusion (DDPM scheduler, prediction_type=epsilon, 100 train timesteps, squaredcos_cap_v2 betas); DDIM-compatible — switch noise_scheduler_type to DDIM and set num_inference_steps=10 for fast eval
Total steps: 100,000 | batch size 32 | Adam lr=2e-5, cosine decay, AMP, grad-clip 10
Hardware: 1× L40S, ~17 h wall-clock

See checkpoint-XXXXXX/train_config.json in each subfolder for the full reproducible config.

Usage

from huggingface_hub import snapshot_download
from lerobot.policies.multi_task_dit import MultiTaskDiTPolicy

ckpt_dir = snapshot_download(
    repo_id="JayCao99/dit-diffusion-rm65b-sort-v0",
    allow_patterns="checkpoint-100000/*",
)
policy = MultiTaskDiTPolicy.from_pretrained(f"{ckpt_dir}/checkpoint-100000")

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics