# DiT Block Tower Baseline v1 Diffusion Transformer policy for the **build block tower** task, trained on 6 datasets (1 base + 5 DAgger rounds, ~341k human-control frames). **Status:** Partial run — 35,000 / 50,000 steps completed (hit 24h walltime). Loss was still decreasing at cutoff. ## Model | | | |---|---| | Architecture | Diffusion Transformer (DiT) | | Vision encoder | CLIP ViT-B/16 (per-camera, lr_mult=0.1) | | Text encoder | CLIP ViT-B/16 | | Transformer | 512 hidden, 6 layers, 8 heads | | Diffusion | DDPM, 100 steps, squaredcos_cap_v2 | | State dim | 16 (7 joint pos + 9 eef rot6d) | | Action dim | 17 (7 joint cmd + 9 eef rot6d + 1 gripper) | | Cameras | front (480x640), wrist (480x640) | ## Training | Parameter | Value | |-----------|-------| | Batch size | 64 per GPU (256 global, 4x GH200) | | Train steps | 50,000 (35,000 completed) | | Learning rate | 2e-5, cosine schedule | | Warmup | 500 steps | | Horizon | 100 | | Action steps | 50 | | Obs steps | 2 | | AMP | enabled | ## Datasets | Dataset | Role | |---------|------| | `villekuosmanen/build_block_tower` | Base demonstrations | | `villekuosmanen/dAgger_build_block_tower_1.0.0` | DAgger round 1 | | `villekuosmanen/dAgger_build_block_tower_1.1.0` | DAgger round 2 | | `villekuosmanen/dAgger_build_block_tower_1.2.0` | DAgger round 3 | | `villekuosmanen/dAgger_build_block_tower_1.3.0` | DAgger round 4 | | `villekuosmanen/dAgger_build_block_tower_1.4.0` | DAgger round 5 | DAgger policy frames filtered out via `ControlModePlugin` (only human-control frames used). ## Files ``` README.md TRAINING_LOG.md assets/ ramen_stats.pt # Normalization statistics valid_indices.json # Per-dataset valid frame indices after DAgger filtering checkpoints/ 35000/ model.safetensors # Model weights (inference + fine-tuning) config.json # Resolved model config ``` ## Checkpoint Integrity ``` sha256 (checkpoint files): 6192188a config.json 8f00265f model.safetensors df43463f ramen_stats.pt ``` Full hashes: ``` 6192188a6a705cb6ab1632234a1b4724935d42b311c1d01fff16b0eee5c00e4a config.json 8f00265f043db4bf520441bf8eec07b6ccdcbff41f6db7a4852dea25218d2ac0 model.safetensors df43463ff96e90b952fb3e7bc971cd7c584308acfab82ba29d0560318e2b9d2d ramen_stats.pt ``` Reproduce with: ```bash cd checkpoints/35000 && sha256sum config.json model.safetensors cd assets && sha256sum ramen_stats.pt ``` ## W&B Training curves: https://wandb.ai/pravsels/dit_block_tower/runs/pv8q64et ## Usage This checkpoint is from the [multitask_dit_policy](https://github.com/pravsels/multitask_dit_policy) repo, branch `stage1-multimodal-abstraction`.