| --- |
| license: other |
| library_name: pytorch |
| pipeline_tag: image-to-image |
| tags: |
| - diffusion |
| - stable-diffusion |
| - latent-space |
| - representation-learning |
| - hyperbolic-geometry |
| - spherical-geometry |
| - cub-200-2011 |
| - caltech-256 |
| datasets: |
| - CUB-200-2011 |
| - Caltech-256 |
| --- |
| |
| # gcDAE Checkpoints |
|
|
| This repository contains the Stage 1 and Stage 2 checkpoints used for **gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation**. |
|
|
| gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup: |
|
|
| - **Stage 1**: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1. |
| - **Stage 2**: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone. |
| - **HypDiffusion**: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint. |
|
|
| The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces. |
|
|
| ## Released Checkpoints |
|
|
| Recommended repository layout: |
|
|
| ```text |
| gcDAE-models/ |
| ├── stage1/ |
| │ ├── cub200/ |
| │ │ └── stage1_cub200.ckpt |
| │ └── caltech256/ |
| │ └── stage1_caltech256.ckpt |
| └── stage2/ |
| ├── cub200/ |
| │ ├── euc/ |
| │ │ └── stage2_cub200_euclidean.pt |
| │ ├── hyp_c1/ |
| │ │ └── stage2_cub200_hyperbolic_c1.pt |
| │ └── sph_c1/ |
| │ └── stage2_cub200_spherical_c1.pt |
| └── caltech256/ |
| ├── euc/ |
| │ └── stage2_caltech256_euclidean.pt |
| ├── hyp_c1/ |
| │ └── stage2_caltech256_hyperbolic_c1.pt |
| └── sph_c1/ |
| └── stage2_caltech256_spherical_c1.pt |
| ``` |
|
|
| | Dataset | Stage | Geometry | Curvature | Best Epoch | |
| | :--- | :--- | :--- | :---: | :---: | |
| | CUB-200-2011 | Stage 1 | Diffusion backbone | – | 7 | |
| | Caltech-256 | Stage 1 | Diffusion backbone | – | 20 | |
| | CUB-200-2011 | Stage 2 | Euclidean | `c=0` | 16 | |
| | CUB-200-2011 | Stage 2 | Hyperbolic | `c=1` | 32 | |
| | CUB-200-2011 | Stage 2 | Spherical | `c=1` | 32 | |
| | Caltech-256 | Stage 2 | Euclidean | `c=0` | 6 | |
| | Caltech-256 | Stage 2 | Hyperbolic | `c=1` | 12 | |
| | Caltech-256 | Stage 2 | Spherical | `c=1` | 6 | |
|
|
| ## Code |
|
|
|
|
| The checkpoints are intended for use with the gcDAE codebase: |
|
|
| ```bash |
| git clone https://github.com/damidirad/gcDAE.git |
| cd gcDAE |
| ``` |
|
|
| Create the HypDiffusion inference environment: |
|
|
| ```bash |
| conda env create -f HypDiffusion/hypdiff_env.yaml |
| conda activate hypdiff_env |
| ``` |
|
|
| The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as `clip-vit-large-patch14`. |
|
|
| ## Download |
|
|
| Clone this checkpoint repository with Git LFS: |
|
|
| ```bash |
| git lfs install |
| git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints |
| ``` |
|
|
| Or download files programmatically: |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| |
| stage1_ckpt = hf_hub_download( |
| repo_id="k-sert/gcDAE-checkpoints", |
| filename="stage1/cub200/epoch=000007.ckpt", |
| ) |
| |
| stage2_ckpt = hf_hub_download( |
| repo_id="k-sert/gcDAE-checkpoints", |
| filename="stage2/cub200/hyp_c1/epoch=000032.ckpt", |
| ) |
| ``` |
|
|
| ## Inference |
|
|
| GcDAE inference uses: |
|
|
| - `s1_checkpoint_path`: the Stage 1 diffusion checkpoint. |
| - `s2_checkpoint_path`: the Stage 2 geometry conditioned encoder checkpoint. |
|
|
| Example for CUB-200-2011 with the hyperbolic `c=1` Stage 2 encoder: |
|
|
| ```bash |
| cd gcDAE/HypDiffusion |
| |
| python inference_for_eva.py \ |
| --config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \ |
| --ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \ |
| --image_folder /path/to/cub200/test_split \ |
| --outdir /path/to/outputs/cub200_hyp_c1 \ |
| --strength 0.35 \ |
| --scale 1.0 \ |
| --seed 23 \ |
| --n_selected 5 \ |
| --n_samples 1 \ |
| --n_perturb_samples 4 |
| ``` |
|
|
| Before running, update the `s1_checkpoint_path`, `s2_checkpoint_path`, and `encoder_version` fields in the corresponding gcDAE YAML file under: |
|
|
| ```text |
| HypDiffusion/configs/stable-diffusion/ |
| ``` |
|
|
| Recommended config mapping: |
|
|
| | Dataset | Geometry | Config | |
| | :--- | :--- | :--- | |
| | CUB-200-2011 | Euclidean | `v2_inference_cub200_euc.yaml` | |
| | CUB-200-2011 | Hyperbolic c=1 | `v2_inference_cub200_hyp_c1.yaml` | |
| | CUB-200-2011 | Spherical c=1 | `v2_inference_cub200_sph_c1.yaml` | |
| | Caltech-256 | Euclidean | `v2_inference_caltech256_euc.yaml` | |
| | Caltech-256 | Hyperbolic c=1 | `v2_inference_caltech256_hyp_c1.yaml` | |
| | Caltech-256 | Spherical c=1 | `v2_inference_caltech256_sph_c1.yaml` | |
|
|
| ## Training Data |
|
|
| These checkpoints were trained and evaluated on: |
|
|
| - **CUB-200-2011**: fine-grained bird image classification dataset. |
| - **Caltech-256**: object category recognition dataset. |
|
|
| Users are responsible for downloading the datasets from their official sources and complying with their terms of use. |
|
|
| ## Intended Use |
|
|
| These checkpoints are released for research on: |
|
|
| - geometry-conditioned diffusion generation, |
| - latent-space representation learning, |
| - semantic perturbations and controllable diversity, |
| - Euclidean, hyperbolic, and spherical latent-space comparisons. |
|
|
| They are not intended for production image-generation services or safety-critical applications. |
|
|
| ## Limitations |
|
|
| - The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets. |
| - Generated images may contain artifacts, incorrect class attributes, or dataset biases. |
| - The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders. |
| - Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning. |
|
|
| ## License |
|
|
| GcDAE has multiple upstream license considerations: |
|
|
| - The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints. |
| - The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License. |
| - CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms. |
|
|
| Please also consult: |
|
|
| - Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base |
| - OpenAI CLIP: https://github.com/openai/CLIP |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{sert2026gcDAE, |
| title = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation}, |
| author = {Sert, Kagan and Oey, Elyanne and Amidirad, Daniël and van Campenhout, Arthur and Schuttrups, Casper}, |
| year = {2026} |
| } |
| ``` |
|
|
| ## Acknowledgments |
|
|
| This release builds on [HypDAE](https://github.com/lingxiao-li/HypDAE), [Hyperbolic-Flow-Matching](https://github.com/federicavaleau/Hyperbolic-Flow-Matching), Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions. |
|
|