Multi-Task DiT (Diffusion) โ€” rm65b-sort-v0

LeRobot Multi-Task DiT policy trained on rm65b-sort-v0 (bimanual RM65 sort task, 3 cameras, 198 episodes / 91.5k frames @ 30 Hz, 1 task).

This repo holds three candidate deployment checkpoints from the same 100k-step run.

Subfolder Train step Final train loss
checkpoint-060000 60,000 0.009
checkpoint-080000 80,000 0.008
checkpoint-100000 100,000 (last) 0.007

Training summary

  • Architecture: Multi-Task DiT (multi_task_dit), CLIP ViT-B/16 vision (separate encoder per camera) + CLIP text encoder
  • Objective: diffusion (DDPM scheduler, prediction_type=epsilon, 100 train timesteps, squaredcos_cap_v2 betas); DDIM-compatible โ€” switch noise_scheduler_type to DDIM and set num_inference_steps=10 for fast eval
  • Total steps: 100,000 | batch size 32 | Adam lr=2e-5, cosine decay, AMP, grad-clip 10
  • Hardware: 1ร— L40S, ~17 h wall-clock

See checkpoint-XXXXXX/train_config.json in each subfolder for the full reproducible config.

Usage

from huggingface_hub import snapshot_download
from lerobot.policies.multi_task_dit import MultiTaskDiTPolicy

ckpt_dir = snapshot_download(
    repo_id="JayCao99/dit-diffusion-rm65b-sort-v0",
    allow_patterns="checkpoint-100000/*",
)
policy = MultiTaskDiTPolicy.from_pretrained(f"{ckpt_dir}/checkpoint-100000")
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading