--- license: other library_name: pytorch pipeline_tag: image-to-image tags: - diffusion - stable-diffusion - latent-space - representation-learning - hyperbolic-geometry - spherical-geometry - cub-200-2011 - caltech-256 datasets: - CUB-200-2011 - Caltech-256 --- # gcDAE Checkpoints This repository contains the Stage 1 and Stage 2 checkpoints used for **gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation**. gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup: - **Stage 1**: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1. - **Stage 2**: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone. - **HypDiffusion**: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint. The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces. ## Released Checkpoints Recommended repository layout: ```text gcDAE-models/ ├── stage1/ │ ├── cub200/ │ │ └── stage1_cub200.ckpt │ └── caltech256/ │ └── stage1_caltech256.ckpt └── stage2/ ├── cub200/ │ ├── euc/ │ │ └── stage2_cub200_euclidean.pt │ ├── hyp_c1/ │ │ └── stage2_cub200_hyperbolic_c1.pt │ └── sph_c1/ │ └── stage2_cub200_spherical_c1.pt └── caltech256/ ├── euc/ │ └── stage2_caltech256_euclidean.pt ├── hyp_c1/ │ └── stage2_caltech256_hyperbolic_c1.pt └── sph_c1/ └── stage2_caltech256_spherical_c1.pt ``` | Dataset | Stage | Geometry | Curvature | Best Epoch | | :--- | :--- | :--- | :---: | :---: | | CUB-200-2011 | Stage 1 | Diffusion backbone | – | 7 | | Caltech-256 | Stage 1 | Diffusion backbone | – | 20 | | CUB-200-2011 | Stage 2 | Euclidean | `c=0` | 16 | | CUB-200-2011 | Stage 2 | Hyperbolic | `c=1` | 32 | | CUB-200-2011 | Stage 2 | Spherical | `c=1` | 32 | | Caltech-256 | Stage 2 | Euclidean | `c=0` | 6 | | Caltech-256 | Stage 2 | Hyperbolic | `c=1` | 12 | | Caltech-256 | Stage 2 | Spherical | `c=1` | 6 | ## Code The checkpoints are intended for use with the gcDAE codebase: ```bash git clone https://github.com/damidirad/gcDAE.git cd gcDAE ``` Create the HypDiffusion inference environment: ```bash conda env create -f HypDiffusion/hypdiff_env.yaml conda activate hypdiff_env ``` The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as `clip-vit-large-patch14`. ## Download Clone this checkpoint repository with Git LFS: ```bash git lfs install git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints ``` Or download files programmatically: ```python from huggingface_hub import hf_hub_download stage1_ckpt = hf_hub_download( repo_id="k-sert/gcDAE-checkpoints", filename="stage1/cub200/epoch=000007.ckpt", ) stage2_ckpt = hf_hub_download( repo_id="k-sert/gcDAE-checkpoints", filename="stage2/cub200/hyp_c1/epoch=000032.ckpt", ) ``` ## Inference GcDAE inference uses: - `s1_checkpoint_path`: the Stage 1 diffusion checkpoint. - `s2_checkpoint_path`: the Stage 2 geometry conditioned encoder checkpoint. Example for CUB-200-2011 with the hyperbolic `c=1` Stage 2 encoder: ```bash cd gcDAE/HypDiffusion python inference_for_eva.py \ --config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \ --ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \ --image_folder /path/to/cub200/test_split \ --outdir /path/to/outputs/cub200_hyp_c1 \ --strength 0.35 \ --scale 1.0 \ --seed 23 \ --n_selected 5 \ --n_samples 1 \ --n_perturb_samples 4 ``` Before running, update the `s1_checkpoint_path`, `s2_checkpoint_path`, and `encoder_version` fields in the corresponding gcDAE YAML file under: ```text HypDiffusion/configs/stable-diffusion/ ``` Recommended config mapping: | Dataset | Geometry | Config | | :--- | :--- | :--- | | CUB-200-2011 | Euclidean | `v2_inference_cub200_euc.yaml` | | CUB-200-2011 | Hyperbolic c=1 | `v2_inference_cub200_hyp_c1.yaml` | | CUB-200-2011 | Spherical c=1 | `v2_inference_cub200_sph_c1.yaml` | | Caltech-256 | Euclidean | `v2_inference_caltech256_euc.yaml` | | Caltech-256 | Hyperbolic c=1 | `v2_inference_caltech256_hyp_c1.yaml` | | Caltech-256 | Spherical c=1 | `v2_inference_caltech256_sph_c1.yaml` | ## Training Data These checkpoints were trained and evaluated on: - **CUB-200-2011**: fine-grained bird image classification dataset. - **Caltech-256**: object category recognition dataset. Users are responsible for downloading the datasets from their official sources and complying with their terms of use. ## Intended Use These checkpoints are released for research on: - geometry-conditioned diffusion generation, - latent-space representation learning, - semantic perturbations and controllable diversity, - Euclidean, hyperbolic, and spherical latent-space comparisons. They are not intended for production image-generation services or safety-critical applications. ## Limitations - The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets. - Generated images may contain artifacts, incorrect class attributes, or dataset biases. - The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders. - Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning. ## License GcDAE has multiple upstream license considerations: - The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints. - The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License. - CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms. Please also consult: - Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base - OpenAI CLIP: https://github.com/openai/CLIP ## Citation ```bibtex @misc{sert2026gcDAE, title = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation}, author = {Sert, Kagan and Oey, Elyanne and Amidirad, Daniël and van Campenhout, Arthur and Schuttrups, Casper}, year = {2026} } ``` ## Acknowledgments This release builds on [HypDAE](https://github.com/lingxiao-li/HypDAE), [Hyperbolic-Flow-Matching](https://github.com/federicavaleau/Hyperbolic-Flow-Matching), Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.