pravsels's picture
Upload README.md with huggingface_hub
3139508 verified
# DiT Block Tower Baseline v1
Diffusion Transformer policy for the **build block tower** task, trained on 6 datasets (1 base + 5 DAgger rounds, ~341k human-control frames).
**Status:** Partial run — 35,000 / 50,000 steps completed (hit 24h walltime). Loss was still decreasing at cutoff.
## Model
| | |
|---|---|
| Architecture | Diffusion Transformer (DiT) |
| Vision encoder | CLIP ViT-B/16 (per-camera, lr_mult=0.1) |
| Text encoder | CLIP ViT-B/16 |
| Transformer | 512 hidden, 6 layers, 8 heads |
| Diffusion | DDPM, 100 steps, squaredcos_cap_v2 |
| State dim | 16 (7 joint pos + 9 eef rot6d) |
| Action dim | 17 (7 joint cmd + 9 eef rot6d + 1 gripper) |
| Cameras | front (480x640), wrist (480x640) |
## Training
| Parameter | Value |
|-----------|-------|
| Batch size | 64 per GPU (256 global, 4x GH200) |
| Train steps | 50,000 (35,000 completed) |
| Learning rate | 2e-5, cosine schedule |
| Warmup | 500 steps |
| Horizon | 100 |
| Action steps | 50 |
| Obs steps | 2 |
| AMP | enabled |
## Datasets
| Dataset | Role |
|---------|------|
| `villekuosmanen/build_block_tower` | Base demonstrations |
| `villekuosmanen/dAgger_build_block_tower_1.0.0` | DAgger round 1 |
| `villekuosmanen/dAgger_build_block_tower_1.1.0` | DAgger round 2 |
| `villekuosmanen/dAgger_build_block_tower_1.2.0` | DAgger round 3 |
| `villekuosmanen/dAgger_build_block_tower_1.3.0` | DAgger round 4 |
| `villekuosmanen/dAgger_build_block_tower_1.4.0` | DAgger round 5 |
DAgger policy frames filtered out via `ControlModePlugin` (only human-control frames used).
## Files
```
README.md
TRAINING_LOG.md
assets/
ramen_stats.pt # Normalization statistics
valid_indices.json # Per-dataset valid frame indices after DAgger filtering
checkpoints/
35000/
model.safetensors # Model weights (inference + fine-tuning)
config.json # Resolved model config
```
## Checkpoint Integrity
```
sha256 (checkpoint files):
6192188a config.json
8f00265f model.safetensors
df43463f ramen_stats.pt
```
Full hashes:
```
6192188a6a705cb6ab1632234a1b4724935d42b311c1d01fff16b0eee5c00e4a config.json
8f00265f043db4bf520441bf8eec07b6ccdcbff41f6db7a4852dea25218d2ac0 model.safetensors
df43463ff96e90b952fb3e7bc971cd7c584308acfab82ba29d0560318e2b9d2d ramen_stats.pt
```
Reproduce with:
```bash
cd checkpoints/35000 && sha256sum config.json model.safetensors
cd assets && sha256sum ramen_stats.pt
```
## W&B
Training curves: https://wandb.ai/pravsels/dit_block_tower/runs/pv8q64et
## Usage
This checkpoint is from the [multitask_dit_policy](https://github.com/pravsels/multitask_dit_policy) repo, branch `stage1-multimodal-abstraction`.