CAST: Connectivity-Aware Sampling for Topology

Training-free sampling-time guidance for 3D voxel diffusion models. Improves trunk connectivity success rate from 6.5% to 30% (≈5× baseline) on Minecraft-style tree voxel generation — no retraining of the base denoiser, no architectural change.

Sampling progression: baseline DDPM (left) vs CAST with connectivity-aware guidance (right).

📄 Project page: https://drinkai.notion.site/Connectivity-Aware-Sampling-for-Topology-CAST-report-v2-36daa15702ab801f8291d71c2bc043c5
💻 Code: https://github.com/KyleLin0927/cast-gen3d-tree

Method (TL;DR)

At each denoising step within the window t = 800–300, add ∇ C(x_t) from a separately trained connectivity scorer to the standard DDPM update. The base denoiser is frozen; the scorer (346K parameters, trained in 145 seconds) acts as a lightweight, on-demand consistency oracle.

The key empirical finding behind this design: connectivity failures form mid-sampling (t ≈ 800–600) and rarely self-recover. Guidance applied only within an intermediate window outperforms either always-on or late-stage-only intervention.

Files

ckpt_0025_033_baseline_e0400.pt — Baseline 3D voxel DDPM (16³ resolution, ~313 MB)
ckpt_0026_001_scorer_e100.pt — Connectivity scorer used for sampling-time guidance (~1.4 MB)

How to Use

from huggingface_hub import hf_hub_download

diffusion_ckpt = hf_hub_download(
    repo_id="jenkai-lin/cast-tree-voxel-diffusion",
    filename="ckpt_0025_033_baseline_e0400.pt"
)
scorer_ckpt = hf_hub_download(
    repo_id="jenkai-lin/cast-tree-voxel-diffusion",
    filename="ckpt_0026_001_scorer_e100.pt"
)

Full sampling pipeline and reproduction scripts: GitHub repository.

Training Details

Baseline diffusion

Architecture: 3D U-Net
Resolution: 16³ voxels
Diffusion: T=1000 timesteps, cosine noise schedule (β: 0.0001 → 0.02)
Training data: 1,286 Minecraft tree voxel volumes
Training: 400 epochs, batch size 64, AdamW (lr 1e-4)

Connectivity scorer

Architecture: 3D CNN (346K parameters)
Task: 4-class classification (positive / floating / disconnected / fragmented)
Training data: 5,000 samples (20% real + 80% baseline-generated, with hard-negative mining)
Loss: BCEWithLogitsLoss
Training: 100 epochs on top of baseline checkpoint, completes in ~145 seconds on a single GPU
No human annotation required — labels are derived automatically from base-model samples

Examples of Minecraft-style tree voxel volumes used for training.

Limitations

Trained on 16³ resolution only; performance at higher resolutions not validated
Tree-specific; generalization to other voxel-based topological objects untested
Guidance produces a quantifiable trade-off between connectivity and structure — see project page for full ablation grid

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support