gcDAE-checkpoints / README.md
k-sert's picture
Update README.md
d341710 verified
---
license: other
library_name: pytorch
pipeline_tag: image-to-image
tags:
- diffusion
- stable-diffusion
- latent-space
- representation-learning
- hyperbolic-geometry
- spherical-geometry
- cub-200-2011
- caltech-256
datasets:
- CUB-200-2011
- Caltech-256
---
# gcDAE Checkpoints
This repository contains the Stage 1 and Stage 2 checkpoints used for **gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation**.
gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup:
- **Stage 1**: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1.
- **Stage 2**: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone.
- **HypDiffusion**: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint.
The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces.
## Released Checkpoints
Recommended repository layout:
```text
gcDAE-models/
├── stage1/
│ ├── cub200/
│ │ └── stage1_cub200.ckpt
│ └── caltech256/
│ └── stage1_caltech256.ckpt
└── stage2/
├── cub200/
│ ├── euc/
│ │ └── stage2_cub200_euclidean.pt
│ ├── hyp_c1/
│ │ └── stage2_cub200_hyperbolic_c1.pt
│ └── sph_c1/
│ └── stage2_cub200_spherical_c1.pt
└── caltech256/
├── euc/
│ └── stage2_caltech256_euclidean.pt
├── hyp_c1/
│ └── stage2_caltech256_hyperbolic_c1.pt
└── sph_c1/
└── stage2_caltech256_spherical_c1.pt
```
| Dataset | Stage | Geometry | Curvature | Best Epoch |
| :--- | :--- | :--- | :---: | :---: |
| CUB-200-2011 | Stage 1 | Diffusion backbone | – | 7 |
| Caltech-256 | Stage 1 | Diffusion backbone | – | 20 |
| CUB-200-2011 | Stage 2 | Euclidean | `c=0` | 16 |
| CUB-200-2011 | Stage 2 | Hyperbolic | `c=1` | 32 |
| CUB-200-2011 | Stage 2 | Spherical | `c=1` | 32 |
| Caltech-256 | Stage 2 | Euclidean | `c=0` | 6 |
| Caltech-256 | Stage 2 | Hyperbolic | `c=1` | 12 |
| Caltech-256 | Stage 2 | Spherical | `c=1` | 6 |
## Code
The checkpoints are intended for use with the gcDAE codebase:
```bash
git clone https://github.com/damidirad/gcDAE.git
cd gcDAE
```
Create the HypDiffusion inference environment:
```bash
conda env create -f HypDiffusion/hypdiff_env.yaml
conda activate hypdiff_env
```
The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as `clip-vit-large-patch14`.
## Download
Clone this checkpoint repository with Git LFS:
```bash
git lfs install
git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints
```
Or download files programmatically:
```python
from huggingface_hub import hf_hub_download
stage1_ckpt = hf_hub_download(
repo_id="k-sert/gcDAE-checkpoints",
filename="stage1/cub200/epoch=000007.ckpt",
)
stage2_ckpt = hf_hub_download(
repo_id="k-sert/gcDAE-checkpoints",
filename="stage2/cub200/hyp_c1/epoch=000032.ckpt",
)
```
## Inference
GcDAE inference uses:
- `s1_checkpoint_path`: the Stage 1 diffusion checkpoint.
- `s2_checkpoint_path`: the Stage 2 geometry conditioned encoder checkpoint.
Example for CUB-200-2011 with the hyperbolic `c=1` Stage 2 encoder:
```bash
cd gcDAE/HypDiffusion
python inference_for_eva.py \
--config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \
--ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \
--image_folder /path/to/cub200/test_split \
--outdir /path/to/outputs/cub200_hyp_c1 \
--strength 0.35 \
--scale 1.0 \
--seed 23 \
--n_selected 5 \
--n_samples 1 \
--n_perturb_samples 4
```
Before running, update the `s1_checkpoint_path`, `s2_checkpoint_path`, and `encoder_version` fields in the corresponding gcDAE YAML file under:
```text
HypDiffusion/configs/stable-diffusion/
```
Recommended config mapping:
| Dataset | Geometry | Config |
| :--- | :--- | :--- |
| CUB-200-2011 | Euclidean | `v2_inference_cub200_euc.yaml` |
| CUB-200-2011 | Hyperbolic c=1 | `v2_inference_cub200_hyp_c1.yaml` |
| CUB-200-2011 | Spherical c=1 | `v2_inference_cub200_sph_c1.yaml` |
| Caltech-256 | Euclidean | `v2_inference_caltech256_euc.yaml` |
| Caltech-256 | Hyperbolic c=1 | `v2_inference_caltech256_hyp_c1.yaml` |
| Caltech-256 | Spherical c=1 | `v2_inference_caltech256_sph_c1.yaml` |
## Training Data
These checkpoints were trained and evaluated on:
- **CUB-200-2011**: fine-grained bird image classification dataset.
- **Caltech-256**: object category recognition dataset.
Users are responsible for downloading the datasets from their official sources and complying with their terms of use.
## Intended Use
These checkpoints are released for research on:
- geometry-conditioned diffusion generation,
- latent-space representation learning,
- semantic perturbations and controllable diversity,
- Euclidean, hyperbolic, and spherical latent-space comparisons.
They are not intended for production image-generation services or safety-critical applications.
## Limitations
- The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets.
- Generated images may contain artifacts, incorrect class attributes, or dataset biases.
- The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders.
- Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning.
## License
GcDAE has multiple upstream license considerations:
- The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints.
- The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License.
- CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms.
Please also consult:
- Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base
- OpenAI CLIP: https://github.com/openai/CLIP
## Citation
```bibtex
@misc{sert2026gcDAE,
title = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation},
author = {Sert, Kagan and Oey, Elyanne and Amidirad, Daniël and van Campenhout, Arthur and Schuttrups, Casper},
year = {2026}
}
```
## Acknowledgments
This release builds on [HypDAE](https://github.com/lingxiao-li/HypDAE), [Hyperbolic-Flow-Matching](https://github.com/federicavaleau/Hyperbolic-Flow-Matching), Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.