pravsels
/

dit_block_tower_baseline

Model card Files Files and versions

dit_block_tower_baseline / README.md

pravsels's picture

Upload README.md with huggingface_hub

3139508 verified about 2 months ago

|

history blame contribute delete

2.66 kB

	# DiT Block Tower Baseline v1

	Diffusion Transformer policy for the build block tower task, trained on 6 datasets (1 base + 5 DAgger rounds, ~341k human-control frames).

	Status: Partial run — 35,000 / 50,000 steps completed (hit 24h walltime). Loss was still decreasing at cutoff.

	## Model

	\| \| \|
	\|---\|---\|
	\| Architecture \| Diffusion Transformer (DiT) \|
	\| Vision encoder \| CLIP ViT-B/16 (per-camera, lr_mult=0.1) \|
	\| Text encoder \| CLIP ViT-B/16 \|
	\| Transformer \| 512 hidden, 6 layers, 8 heads \|
	\| Diffusion \| DDPM, 100 steps, squaredcos_cap_v2 \|
	\| State dim \| 16 (7 joint pos + 9 eef rot6d) \|
	\| Action dim \| 17 (7 joint cmd + 9 eef rot6d + 1 gripper) \|
	\| Cameras \| front (480x640), wrist (480x640) \|

	## Training

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Batch size \| 64 per GPU (256 global, 4x GH200) \|
	\| Train steps \| 50,000 (35,000 completed) \|
	\| Learning rate \| 2e-5, cosine schedule \|
	\| Warmup \| 500 steps \|
	\| Horizon \| 100 \|
	\| Action steps \| 50 \|
	\| Obs steps \| 2 \|
	\| AMP \| enabled \|

	## Datasets

	\| Dataset \| Role \|
	\|---------\|------\|
	\| `villekuosmanen/build_block_tower` \| Base demonstrations \|
	\| `villekuosmanen/dAgger_build_block_tower_1.0.0` \| DAgger round 1 \|
	\| `villekuosmanen/dAgger_build_block_tower_1.1.0` \| DAgger round 2 \|
	\| `villekuosmanen/dAgger_build_block_tower_1.2.0` \| DAgger round 3 \|
	\| `villekuosmanen/dAgger_build_block_tower_1.3.0` \| DAgger round 4 \|
	\| `villekuosmanen/dAgger_build_block_tower_1.4.0` \| DAgger round 5 \|

	DAgger policy frames filtered out via `ControlModePlugin` (only human-control frames used).

	## Files

	```
	README.md
	TRAINING_LOG.md
	assets/
	ramen_stats.pt # Normalization statistics
	valid_indices.json # Per-dataset valid frame indices after DAgger filtering
	checkpoints/
	35000/
	model.safetensors # Model weights (inference + fine-tuning)
	config.json # Resolved model config
	```

	## Checkpoint Integrity

	```
	sha256 (checkpoint files):
	6192188a config.json
	8f00265f model.safetensors
	df43463f ramen_stats.pt
	```

	Full hashes:
	```
	6192188a6a705cb6ab1632234a1b4724935d42b311c1d01fff16b0eee5c00e4a config.json
	8f00265f043db4bf520441bf8eec07b6ccdcbff41f6db7a4852dea25218d2ac0 model.safetensors
	df43463ff96e90b952fb3e7bc971cd7c584308acfab82ba29d0560318e2b9d2d ramen_stats.pt
	```

	Reproduce with:
	```bash
	cd checkpoints/35000 && sha256sum config.json model.safetensors
	cd assets && sha256sum ramen_stats.pt
	```

	## W&B

	Training curves: https://wandb.ai/pravsels/dit_block_tower/runs/pv8q64et

	## Usage

	This checkpoint is from the [multitask_dit_policy](https://github.com/pravsels/multitask_dit_policy) repo, branch `stage1-multimodal-abstraction`.