CAST: Connectivity-Aware Sampling for Topology

Training-free sampling-time guidance for 3D voxel diffusion models. Improves trunk connectivity success rate from 6.5% to 30% (β‰ˆ5Γ— baseline) on Minecraft-style tree voxel generation β€” no retraining of the base denoiser, no architectural change.

Comparison: baseline vs CAST sampling

Sampling progression: baseline DDPM (left) vs CAST with connectivity-aware guidance (right).

Method (TL;DR)

At each denoising step within the window t = 800–300, add βˆ‡ C(x_t) from a separately trained connectivity scorer to the standard DDPM update. The base denoiser is frozen; the scorer (346K parameters, trained in 145 seconds) acts as a lightweight, on-demand consistency oracle.

The key empirical finding behind this design: connectivity failures form mid-sampling (t β‰ˆ 800–600) and rarely self-recover. Guidance applied only within an intermediate window outperforms either always-on or late-stage-only intervention.

Files

  • ckpt_0025_033_baseline_e0400.pt β€” Baseline 3D voxel DDPM (16Β³ resolution, ~313 MB)
  • ckpt_0026_001_scorer_e100.pt β€” Connectivity scorer used for sampling-time guidance (~1.4 MB)

How to Use

from huggingface_hub import hf_hub_download

diffusion_ckpt = hf_hub_download(
    repo_id="jenkai-lin/cast-tree-voxel-diffusion",
    filename="ckpt_0025_033_baseline_e0400.pt"
)
scorer_ckpt = hf_hub_download(
    repo_id="jenkai-lin/cast-tree-voxel-diffusion",
    filename="ckpt_0026_001_scorer_e100.pt"
)

Full sampling pipeline and reproduction scripts: GitHub repository.

Training Details

Baseline diffusion

  • Architecture: 3D U-Net
  • Resolution: 16Β³ voxels
  • Diffusion: T=1000 timesteps, cosine noise schedule (Ξ²: 0.0001 β†’ 0.02)
  • Training data: 1,286 Minecraft tree voxel volumes
  • Training: 400 epochs, batch size 64, AdamW (lr 1e-4)

Connectivity scorer

  • Architecture: 3D CNN (346K parameters)
  • Task: 4-class classification (positive / floating / disconnected / fragmented)
  • Training data: 5,000 samples (20% real + 80% baseline-generated, with hard-negative mining)
  • Loss: BCEWithLogitsLoss
  • Training: 100 epochs on top of baseline checkpoint, completes in ~145 seconds on a single GPU
  • No human annotation required β€” labels are derived automatically from base-model samples

Training data examples

Examples of Minecraft-style tree voxel volumes used for training.

Limitations

  • Trained on 16Β³ resolution only; performance at higher resolutions not validated
  • Tree-specific; generalization to other voxel-based topological objects untested
  • Guidance produces a quantifiable trade-off between connectivity and structure β€” see project page for full ablation grid
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support