gcDAE Checkpoints
This repository contains the Stage 1 and Stage 2 checkpoints used for gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation.
gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup:
- Stage 1: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1.
- Stage 2: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone.
- HypDiffusion: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint.
The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces.
Released Checkpoints
Recommended repository layout:
gcDAE-models/
βββ stage1/
β βββ cub200/
β β βββ stage1_cub200.ckpt
β βββ caltech256/
β βββ stage1_caltech256.ckpt
βββ stage2/
βββ cub200/
β βββ euc/
β β βββ stage2_cub200_euclidean.pt
β βββ hyp_c1/
β β βββ stage2_cub200_hyperbolic_c1.pt
β βββ sph_c1/
β βββ stage2_cub200_spherical_c1.pt
βββ caltech256/
βββ euc/
β βββ stage2_caltech256_euclidean.pt
βββ hyp_c1/
β βββ stage2_caltech256_hyperbolic_c1.pt
βββ sph_c1/
βββ stage2_caltech256_spherical_c1.pt
| Dataset | Stage | Geometry | Curvature | Best Epoch |
|---|---|---|---|---|
| CUB-200-2011 | Stage 1 | Diffusion backbone | β | 7 |
| Caltech-256 | Stage 1 | Diffusion backbone | β | 20 |
| CUB-200-2011 | Stage 2 | Euclidean | c=0 |
16 |
| CUB-200-2011 | Stage 2 | Hyperbolic | c=1 |
32 |
| CUB-200-2011 | Stage 2 | Spherical | c=1 |
32 |
| Caltech-256 | Stage 2 | Euclidean | c=0 |
6 |
| Caltech-256 | Stage 2 | Hyperbolic | c=1 |
12 |
| Caltech-256 | Stage 2 | Spherical | c=1 |
6 |
Code
The checkpoints are intended for use with the gcDAE codebase:
git clone https://github.com/damidirad/gcDAE.git
cd gcDAE
Create the HypDiffusion inference environment:
conda env create -f HypDiffusion/hypdiff_env.yaml
conda activate hypdiff_env
The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as clip-vit-large-patch14.
Download
Clone this checkpoint repository with Git LFS:
git lfs install
git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints
Or download files programmatically:
from huggingface_hub import hf_hub_download
stage1_ckpt = hf_hub_download(
repo_id="k-sert/gcDAE-checkpoints",
filename="stage1/cub200/epoch=000007.ckpt",
)
stage2_ckpt = hf_hub_download(
repo_id="k-sert/gcDAE-checkpoints",
filename="stage2/cub200/hyp_c1/epoch=000032.ckpt",
)
Inference
GcDAE inference uses:
s1_checkpoint_path: the Stage 1 diffusion checkpoint.s2_checkpoint_path: the Stage 2 geometry conditioned encoder checkpoint.
Example for CUB-200-2011 with the hyperbolic c=1 Stage 2 encoder:
cd gcDAE/HypDiffusion
python inference_for_eva.py \
--config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \
--ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \
--image_folder /path/to/cub200/test_split \
--outdir /path/to/outputs/cub200_hyp_c1 \
--strength 0.35 \
--scale 1.0 \
--seed 23 \
--n_selected 5 \
--n_samples 1 \
--n_perturb_samples 4
Before running, update the s1_checkpoint_path, s2_checkpoint_path, and encoder_version fields in the corresponding gcDAE YAML file under:
HypDiffusion/configs/stable-diffusion/
Recommended config mapping:
| Dataset | Geometry | Config |
|---|---|---|
| CUB-200-2011 | Euclidean | v2_inference_cub200_euc.yaml |
| CUB-200-2011 | Hyperbolic c=1 | v2_inference_cub200_hyp_c1.yaml |
| CUB-200-2011 | Spherical c=1 | v2_inference_cub200_sph_c1.yaml |
| Caltech-256 | Euclidean | v2_inference_caltech256_euc.yaml |
| Caltech-256 | Hyperbolic c=1 | v2_inference_caltech256_hyp_c1.yaml |
| Caltech-256 | Spherical c=1 | v2_inference_caltech256_sph_c1.yaml |
Training Data
These checkpoints were trained and evaluated on:
- CUB-200-2011: fine-grained bird image classification dataset.
- Caltech-256: object category recognition dataset.
Users are responsible for downloading the datasets from their official sources and complying with their terms of use.
Intended Use
These checkpoints are released for research on:
- geometry-conditioned diffusion generation,
- latent-space representation learning,
- semantic perturbations and controllable diversity,
- Euclidean, hyperbolic, and spherical latent-space comparisons.
They are not intended for production image-generation services or safety-critical applications.
Limitations
- The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets.
- Generated images may contain artifacts, incorrect class attributes, or dataset biases.
- The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders.
- Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning.
License
GcDAE has multiple upstream license considerations:
- The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints.
- The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License.
- CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms.
Please also consult:
- Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base
- OpenAI CLIP: https://github.com/openai/CLIP
Citation
@misc{sert2026gcDAE,
title = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation},
author = {Sert, Kagan and Oey, Elyanne and Amidirad, DaniΓ«l and van Campenhout, Arthur and Schuttrups, Casper},
year = {2026}
}
Acknowledgments
This release builds on HypDAE, Hyperbolic-Flow-Matching, Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.