---
license: other
library_name: pytorch
pipeline_tag: image-to-image
tags:
  - diffusion
  - stable-diffusion
  - latent-space
  - representation-learning
  - hyperbolic-geometry
  - spherical-geometry
  - cub-200-2011
  - caltech-256
datasets:
  - CUB-200-2011
  - Caltech-256
---

# gcDAE Checkpoints

This repository contains the Stage 1 and Stage 2 checkpoints used for **gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation**.

gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup:

- **Stage 1**: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1.
- **Stage 2**: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone.
- **HypDiffusion**: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint.

The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces.

## Released Checkpoints

Recommended repository layout:

```text
gcDAE-models/
├── stage1/
│   ├── cub200/
│   │   └── stage1_cub200.ckpt
│   └── caltech256/
│       └── stage1_caltech256.ckpt
└── stage2/
    ├── cub200/
    │   ├── euc/
    │   │   └── stage2_cub200_euclidean.pt
    │   ├── hyp_c1/
    │   │   └── stage2_cub200_hyperbolic_c1.pt
    │   └── sph_c1/
    │       └── stage2_cub200_spherical_c1.pt
    └── caltech256/
        ├── euc/
        │   └── stage2_caltech256_euclidean.pt
        ├── hyp_c1/
        │   └── stage2_caltech256_hyperbolic_c1.pt
        └── sph_c1/
            └── stage2_caltech256_spherical_c1.pt
```

| Dataset | Stage | Geometry | Curvature | Best Epoch |
| :--- | :--- | :--- | :---: | :---: |
| CUB-200-2011 | Stage 1 | Diffusion backbone | – | 7 |
| Caltech-256 | Stage 1 | Diffusion backbone | – | 20 |
| CUB-200-2011 | Stage 2 | Euclidean | `c=0` | 16 |
| CUB-200-2011 | Stage 2 | Hyperbolic | `c=1` | 32 |
| CUB-200-2011 | Stage 2 | Spherical | `c=1` | 32 |
| Caltech-256 | Stage 2 | Euclidean | `c=0` | 6 |
| Caltech-256 | Stage 2 | Hyperbolic | `c=1` | 12 |
| Caltech-256 | Stage 2 | Spherical | `c=1` | 6 |

## Code


The checkpoints are intended for use with the gcDAE codebase:

```bash
git clone https://github.com/damidirad/gcDAE.git
cd gcDAE
```

Create the HypDiffusion inference environment:

```bash
conda env create -f HypDiffusion/hypdiff_env.yaml
conda activate hypdiff_env
```

The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as `clip-vit-large-patch14`.

## Download

Clone this checkpoint repository with Git LFS:

```bash
git lfs install
git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints
```

Or download files programmatically:

```python
from huggingface_hub import hf_hub_download

stage1_ckpt = hf_hub_download(
    repo_id="k-sert/gcDAE-checkpoints",
    filename="stage1/cub200/epoch=000007.ckpt",
)

stage2_ckpt = hf_hub_download(
    repo_id="k-sert/gcDAE-checkpoints",
    filename="stage2/cub200/hyp_c1/epoch=000032.ckpt",
)
```

## Inference

GcDAE inference uses:

- `s1_checkpoint_path`: the Stage 1 diffusion checkpoint.
- `s2_checkpoint_path`: the Stage 2 geometry conditioned encoder checkpoint.

Example for CUB-200-2011 with the hyperbolic `c=1` Stage 2 encoder:

```bash
cd gcDAE/HypDiffusion

python inference_for_eva.py \
  --config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \
  --ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \
  --image_folder /path/to/cub200/test_split \
  --outdir /path/to/outputs/cub200_hyp_c1 \
  --strength 0.35 \
  --scale 1.0 \
  --seed 23 \
  --n_selected 5 \
  --n_samples 1 \
  --n_perturb_samples 4
```

Before running, update the `s1_checkpoint_path`, `s2_checkpoint_path`, and `encoder_version` fields in the corresponding gcDAE YAML file under:

```text
HypDiffusion/configs/stable-diffusion/
```

Recommended config mapping:

| Dataset | Geometry | Config |
| :--- | :--- | :--- |
| CUB-200-2011 | Euclidean | `v2_inference_cub200_euc.yaml` |
| CUB-200-2011 | Hyperbolic c=1 | `v2_inference_cub200_hyp_c1.yaml` |
| CUB-200-2011 | Spherical c=1 | `v2_inference_cub200_sph_c1.yaml` |
| Caltech-256 | Euclidean | `v2_inference_caltech256_euc.yaml` |
| Caltech-256 | Hyperbolic c=1 | `v2_inference_caltech256_hyp_c1.yaml` |
| Caltech-256 | Spherical c=1 | `v2_inference_caltech256_sph_c1.yaml` |

## Training Data

These checkpoints were trained and evaluated on:

- **CUB-200-2011**: fine-grained bird image classification dataset.
- **Caltech-256**: object category recognition dataset.

Users are responsible for downloading the datasets from their official sources and complying with their terms of use.

## Intended Use

These checkpoints are released for research on:

- geometry-conditioned diffusion generation,
- latent-space representation learning,
- semantic perturbations and controllable diversity,
- Euclidean, hyperbolic, and spherical latent-space comparisons.

They are not intended for production image-generation services or safety-critical applications.

## Limitations

- The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets.
- Generated images may contain artifacts, incorrect class attributes, or dataset biases.
- The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders.
- Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning.

## License

GcDAE has multiple upstream license considerations:

- The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints.
- The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License.
- CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms.

Please also consult:

- Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base
- OpenAI CLIP: https://github.com/openai/CLIP

## Citation

```bibtex
@misc{sert2026gcDAE,
  title  = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation},
  author = {Sert, Kagan and Oey, Elyanne and Amidirad, Daniël and van Campenhout, Arthur and Schuttrups, Casper},
  year   = {2026}
}
```

## Acknowledgments

This release builds on [HypDAE](https://github.com/lingxiao-li/HypDAE), [Hyperbolic-Flow-Matching](https://github.com/federicavaleau/Hyperbolic-Flow-Matching), Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.