pravsels's picture
Upload README.md with huggingface_hub
3139508 verified

DiT Block Tower Baseline v1

Diffusion Transformer policy for the build block tower task, trained on 6 datasets (1 base + 5 DAgger rounds, ~341k human-control frames).

Status: Partial run — 35,000 / 50,000 steps completed (hit 24h walltime). Loss was still decreasing at cutoff.

Model

Architecture Diffusion Transformer (DiT)
Vision encoder CLIP ViT-B/16 (per-camera, lr_mult=0.1)
Text encoder CLIP ViT-B/16
Transformer 512 hidden, 6 layers, 8 heads
Diffusion DDPM, 100 steps, squaredcos_cap_v2
State dim 16 (7 joint pos + 9 eef rot6d)
Action dim 17 (7 joint cmd + 9 eef rot6d + 1 gripper)
Cameras front (480x640), wrist (480x640)

Training

Parameter Value
Batch size 64 per GPU (256 global, 4x GH200)
Train steps 50,000 (35,000 completed)
Learning rate 2e-5, cosine schedule
Warmup 500 steps
Horizon 100
Action steps 50
Obs steps 2
AMP enabled

Datasets

Dataset Role
villekuosmanen/build_block_tower Base demonstrations
villekuosmanen/dAgger_build_block_tower_1.0.0 DAgger round 1
villekuosmanen/dAgger_build_block_tower_1.1.0 DAgger round 2
villekuosmanen/dAgger_build_block_tower_1.2.0 DAgger round 3
villekuosmanen/dAgger_build_block_tower_1.3.0 DAgger round 4
villekuosmanen/dAgger_build_block_tower_1.4.0 DAgger round 5

DAgger policy frames filtered out via ControlModePlugin (only human-control frames used).

Files

README.md
TRAINING_LOG.md
assets/
  ramen_stats.pt          # Normalization statistics
  valid_indices.json      # Per-dataset valid frame indices after DAgger filtering
checkpoints/
  35000/
    model.safetensors     # Model weights (inference + fine-tuning)
    config.json           # Resolved model config

Checkpoint Integrity

sha256 (checkpoint files):
6192188a  config.json
8f00265f  model.safetensors
df43463f  ramen_stats.pt

Full hashes:

6192188a6a705cb6ab1632234a1b4724935d42b311c1d01fff16b0eee5c00e4a  config.json
8f00265f043db4bf520441bf8eec07b6ccdcbff41f6db7a4852dea25218d2ac0  model.safetensors
df43463ff96e90b952fb3e7bc971cd7c584308acfab82ba29d0560318e2b9d2d  ramen_stats.pt

Reproduce with:

cd checkpoints/35000 && sha256sum config.json model.safetensors
cd assets && sha256sum ramen_stats.pt

W&B

Training curves: https://wandb.ai/pravsels/dit_block_tower/runs/pv8q64et

Usage

This checkpoint is from the multitask_dit_policy repo, branch stage1-multimodal-abstraction.