gcDAE Checkpoints

This repository contains the Stage 1 and Stage 2 checkpoints used for gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation.

gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup:

  • Stage 1: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1.
  • Stage 2: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone.
  • HypDiffusion: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint.

The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces.

Released Checkpoints

Recommended repository layout:

gcDAE-models/
β”œβ”€β”€ stage1/
β”‚   β”œβ”€β”€ cub200/
β”‚   β”‚   └── stage1_cub200.ckpt
β”‚   └── caltech256/
β”‚       └── stage1_caltech256.ckpt
└── stage2/
    β”œβ”€β”€ cub200/
    β”‚   β”œβ”€β”€ euc/
    β”‚   β”‚   └── stage2_cub200_euclidean.pt
    β”‚   β”œβ”€β”€ hyp_c1/
    β”‚   β”‚   └── stage2_cub200_hyperbolic_c1.pt
    β”‚   └── sph_c1/
    β”‚       └── stage2_cub200_spherical_c1.pt
    └── caltech256/
        β”œβ”€β”€ euc/
        β”‚   └── stage2_caltech256_euclidean.pt
        β”œβ”€β”€ hyp_c1/
        β”‚   └── stage2_caltech256_hyperbolic_c1.pt
        └── sph_c1/
            └── stage2_caltech256_spherical_c1.pt
Dataset Stage Geometry Curvature Best Epoch
CUB-200-2011 Stage 1 Diffusion backbone – 7
Caltech-256 Stage 1 Diffusion backbone – 20
CUB-200-2011 Stage 2 Euclidean c=0 16
CUB-200-2011 Stage 2 Hyperbolic c=1 32
CUB-200-2011 Stage 2 Spherical c=1 32
Caltech-256 Stage 2 Euclidean c=0 6
Caltech-256 Stage 2 Hyperbolic c=1 12
Caltech-256 Stage 2 Spherical c=1 6

Code

The checkpoints are intended for use with the gcDAE codebase:

git clone https://github.com/damidirad/gcDAE.git
cd gcDAE

Create the HypDiffusion inference environment:

conda env create -f HypDiffusion/hypdiff_env.yaml
conda activate hypdiff_env

The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as clip-vit-large-patch14.

Download

Clone this checkpoint repository with Git LFS:

git lfs install
git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints

Or download files programmatically:

from huggingface_hub import hf_hub_download

stage1_ckpt = hf_hub_download(
    repo_id="k-sert/gcDAE-checkpoints",
    filename="stage1/cub200/epoch=000007.ckpt",
)

stage2_ckpt = hf_hub_download(
    repo_id="k-sert/gcDAE-checkpoints",
    filename="stage2/cub200/hyp_c1/epoch=000032.ckpt",
)

Inference

GcDAE inference uses:

  • s1_checkpoint_path: the Stage 1 diffusion checkpoint.
  • s2_checkpoint_path: the Stage 2 geometry conditioned encoder checkpoint.

Example for CUB-200-2011 with the hyperbolic c=1 Stage 2 encoder:

cd gcDAE/HypDiffusion

python inference_for_eva.py \
  --config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \
  --ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \
  --image_folder /path/to/cub200/test_split \
  --outdir /path/to/outputs/cub200_hyp_c1 \
  --strength 0.35 \
  --scale 1.0 \
  --seed 23 \
  --n_selected 5 \
  --n_samples 1 \
  --n_perturb_samples 4

Before running, update the s1_checkpoint_path, s2_checkpoint_path, and encoder_version fields in the corresponding gcDAE YAML file under:

HypDiffusion/configs/stable-diffusion/

Recommended config mapping:

Dataset Geometry Config
CUB-200-2011 Euclidean v2_inference_cub200_euc.yaml
CUB-200-2011 Hyperbolic c=1 v2_inference_cub200_hyp_c1.yaml
CUB-200-2011 Spherical c=1 v2_inference_cub200_sph_c1.yaml
Caltech-256 Euclidean v2_inference_caltech256_euc.yaml
Caltech-256 Hyperbolic c=1 v2_inference_caltech256_hyp_c1.yaml
Caltech-256 Spherical c=1 v2_inference_caltech256_sph_c1.yaml

Training Data

These checkpoints were trained and evaluated on:

  • CUB-200-2011: fine-grained bird image classification dataset.
  • Caltech-256: object category recognition dataset.

Users are responsible for downloading the datasets from their official sources and complying with their terms of use.

Intended Use

These checkpoints are released for research on:

  • geometry-conditioned diffusion generation,
  • latent-space representation learning,
  • semantic perturbations and controllable diversity,
  • Euclidean, hyperbolic, and spherical latent-space comparisons.

They are not intended for production image-generation services or safety-critical applications.

Limitations

  • The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets.
  • Generated images may contain artifacts, incorrect class attributes, or dataset biases.
  • The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders.
  • Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning.

License

GcDAE has multiple upstream license considerations:

  • The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints.
  • The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License.
  • CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms.

Please also consult:

Citation

@misc{sert2026gcDAE,
  title  = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation},
  author = {Sert, Kagan and Oey, Elyanne and Amidirad, DaniΓ«l and van Campenhout, Arthur and Schuttrups, Casper},
  year   = {2026}
}

Acknowledgments

This release builds on HypDAE, Hyperbolic-Flow-Matching, Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support