RDT-1B fine-tuned on genesis-hr-bench (step 200,000)

DeepSpeed-ZeRO fine-tune of robotics-diffusion-transformer/rdt-1b on the zhouqh/hrbench genesis-hr-bench dataset, converted to RDT's HDF5 schema (single Franka, 8-D state/action, 2 cameras).

Training

Base model robotics-diffusion-transformer/rdt-1b (RDT-1B, ~1.2B params)
Dataset zhouqh/hrbench โ†’ RDT-HDF5 (8-D state, cam_high + cam_right_wrist, instruction.json)
Internal step (rdt) 200,000 (final, on-disk checkpoint-200000/)
Optimizer step (tqdm/wandb) 89,105 (cross-reference for the wandb loss curve)
Per-GPU batch 16
GPUs 8 ร— H200
Effective batch 128 (no grad accumulation)
Optimizer AdamW via DeepSpeed ZeRO-2, lr 1e-4
Precision bfloat16
EMA enabled (max_value=0.9999, power=0.75)
Hardware 1 node, FAIR Cloud (h200 partition), slurm job 1371522
Wandb run warm-puddle-8 / 1v8j3fur

Note on step numbers: rdt has two counters โ€” the internal one used in the on-disk dir name (checkpoint-200000) and the tqdm/optimizer step shown on the wandb x-axis (89,105). They differ because rdt's training loop counts data-iterator iterations rather than optimizer updates. The HF repo is named by the on-disk counter so it matches the artifact you'd see if you replicated locally.

The full RDT policy config is in config.json (architecture: 28-layer transformer, hidden_size=2048, action_dim=128, pred_horizon=64).

Files

File Purpose
ema/model.safetensors EMA weights (~2.3 GB) โ€” primary inference artifact, what RDTRunner.from_pretrained() uses by default.
pytorch_model.bin Non-EMA weights (~2.3 GB) โ€” for ablation against EMA.
config.json RDT architecture config (depth, hidden_size, action_dim, noise scheduler, etc.).

The DeepSpeed ZeRO optimizer shards (~14 GB) and resume scaffolding (random_states_*.pkl, scheduler.bin, latest, zero_to_fp32.py) were not uploaded โ€” this repo is inference-only.

Usage

# Standard rdt eval path (EMA weights)
from scripts.agilex_model import create_model  # in baseline/rdt/

model = create_model(
    args=...,
    dtype=torch.bfloat16,
    pretrained="zimplex/rdt-1b-genesis-hr-bench-step200000",
    pretrained_text_encoder_name_or_path="google/t5-v1_1-xxl",
    pretrained_vision_encoder_name_or_path="google/siglip-so400m-patch14-384",
)

The genesis-hr-bench-specific dataloader (baseline/rdt/data/hdf5_vla_dataset.py) expects 8-D state, cam_high + cam_right_wrist, and instruction.json. See baseline/rdt_overrides/ for the overlay applied to the upstream submodule.

Provenance

Completes the 7-checkpoint genesis-hr-bench finetune sweep โ€” see CHECKPOINT_SUMMARY.md.

Downloads last month
19
Video Preview
loading

Model tree for zimplex/rdt-1b-genesis-hr-bench-step200000

Finetuned
(6)
this model

Dataset used to train zimplex/rdt-1b-genesis-hr-bench-step200000