CAST: Connectivity-Aware Sampling for Topology
Training-free sampling-time guidance for 3D voxel diffusion models. Improves trunk connectivity success rate from 6.5% to 30% (β5Γ baseline) on Minecraft-style tree voxel generation β no retraining of the base denoiser, no architectural change.
Sampling progression: baseline DDPM (left) vs CAST with connectivity-aware guidance (right).
- π Project page: https://drinkai.notion.site/Connectivity-Aware-Sampling-for-Topology-CAST-report-v2-36daa15702ab801f8291d71c2bc043c5
- π» Code: https://github.com/KyleLin0927/cast-gen3d-tree
Method (TL;DR)
At each denoising step within the window t = 800β300, add β C(x_t) from a separately trained connectivity scorer to the standard DDPM update. The base denoiser is frozen; the scorer (346K parameters, trained in 145 seconds) acts as a lightweight, on-demand consistency oracle.
The key empirical finding behind this design: connectivity failures form mid-sampling (t β 800β600) and rarely self-recover. Guidance applied only within an intermediate window outperforms either always-on or late-stage-only intervention.
Files
ckpt_0025_033_baseline_e0400.ptβ Baseline 3D voxel DDPM (16Β³ resolution, ~313 MB)ckpt_0026_001_scorer_e100.ptβ Connectivity scorer used for sampling-time guidance (~1.4 MB)
How to Use
from huggingface_hub import hf_hub_download
diffusion_ckpt = hf_hub_download(
repo_id="jenkai-lin/cast-tree-voxel-diffusion",
filename="ckpt_0025_033_baseline_e0400.pt"
)
scorer_ckpt = hf_hub_download(
repo_id="jenkai-lin/cast-tree-voxel-diffusion",
filename="ckpt_0026_001_scorer_e100.pt"
)
Full sampling pipeline and reproduction scripts: GitHub repository.
Training Details
Baseline diffusion
- Architecture: 3D U-Net
- Resolution: 16Β³ voxels
- Diffusion: T=1000 timesteps, cosine noise schedule (Ξ²: 0.0001 β 0.02)
- Training data: 1,286 Minecraft tree voxel volumes
- Training: 400 epochs, batch size 64, AdamW (lr 1e-4)
Connectivity scorer
- Architecture: 3D CNN (346K parameters)
- Task: 4-class classification (positive / floating / disconnected / fragmented)
- Training data: 5,000 samples (20% real + 80% baseline-generated, with hard-negative mining)
- Loss: BCEWithLogitsLoss
- Training: 100 epochs on top of baseline checkpoint, completes in ~145 seconds on a single GPU
- No human annotation required β labels are derived automatically from base-model samples
Examples of Minecraft-style tree voxel volumes used for training.
Limitations
- Trained on 16Β³ resolution only; performance at higher resolutions not validated
- Tree-specific; generalization to other voxel-based topological objects untested
- Guidance produces a quantifiable trade-off between connectivity and structure β see project page for full ablation grid

