Update README.md

d341710 verified 8 days ago

6.91 kB

	---
	license: other
	library_name: pytorch
	pipeline_tag: image-to-image
	tags:
	- diffusion
	- stable-diffusion
	- latent-space
	- representation-learning
	- hyperbolic-geometry
	- spherical-geometry
	- cub-200-2011
	- caltech-256
	datasets:
	- CUB-200-2011
	- Caltech-256
	---

	# gcDAE Checkpoints

	This repository contains the Stage 1 and Stage 2 checkpoints used for gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation.

	gcDAE studies how the geometry of different latent spaces affect diffusion-based image generation. The pipeline follows a two-stage HypDAE-style setup:

	- Stage 1: a diffusion autoencoder backbone adapted from Stable Diffusion 2.1.
	- Stage 2: a geometry-conditioned semantic encoder trained on top of the Stage 1 backbone.
	- HypDiffusion: inference code that combines a Stage 1 checkpoint with a dataset- and geometry-specific Stage 2 checkpoint.

	The released checkpoints cover CUB-200-2011 and Caltech-256 with Euclidean, hyperbolic, and spherical Stage 2 latent spaces.

	## Released Checkpoints

	Recommended repository layout:

	```text
	gcDAE-models/
	├── stage1/
	│ ├── cub200/
	│ │ └── stage1_cub200.ckpt
	│ └── caltech256/
	│ └── stage1_caltech256.ckpt
	└── stage2/
	├── cub200/
	│ ├── euc/
	│ │ └── stage2_cub200_euclidean.pt
	│ ├── hyp_c1/
	│ │ └── stage2_cub200_hyperbolic_c1.pt
	│ └── sph_c1/
	│ └── stage2_cub200_spherical_c1.pt
	└── caltech256/
	├── euc/
	│ └── stage2_caltech256_euclidean.pt
	├── hyp_c1/
	│ └── stage2_caltech256_hyperbolic_c1.pt
	└── sph_c1/
	└── stage2_caltech256_spherical_c1.pt
	```

	\| Dataset \| Stage \| Geometry \| Curvature \| Best Epoch \|
	\| :--- \| :--- \| :--- \| :---: \| :---: \|
	\| CUB-200-2011 \| Stage 1 \| Diffusion backbone \| – \| 7 \|
	\| Caltech-256 \| Stage 1 \| Diffusion backbone \| – \| 20 \|
	\| CUB-200-2011 \| Stage 2 \| Euclidean \| `c=0` \| 16 \|
	\| CUB-200-2011 \| Stage 2 \| Hyperbolic \| `c=1` \| 32 \|
	\| CUB-200-2011 \| Stage 2 \| Spherical \| `c=1` \| 32 \|
	\| Caltech-256 \| Stage 2 \| Euclidean \| `c=0` \| 6 \|
	\| Caltech-256 \| Stage 2 \| Hyperbolic \| `c=1` \| 12 \|
	\| Caltech-256 \| Stage 2 \| Spherical \| `c=1` \| 6 \|

	## Code


	The checkpoints are intended for use with the gcDAE codebase:

	```bash
	git clone https://github.com/damidirad/gcDAE.git
	cd gcDAE
	```

	Create the HypDiffusion inference environment:

	```bash
	conda env create -f HypDiffusion/hypdiff_env.yaml
	conda activate hypdiff_env
	```

	The Stage 2 training code also expects a CLIP ViT-L/14 image encoder, configured in the gcDAE scripts as `clip-vit-large-patch14`.

	## Download

	Clone this checkpoint repository with Git LFS:

	```bash
	git lfs install
	git clone https://huggingface.co/k-sert/gcDAE-checkpoints checkpoints/gcDAE-checkpoints
	```

	Or download files programmatically:

	```python
	from huggingface_hub import hf_hub_download

	stage1_ckpt = hf_hub_download(
	repo_id="k-sert/gcDAE-checkpoints",
	filename="stage1/cub200/epoch=000007.ckpt",
	)

	stage2_ckpt = hf_hub_download(
	repo_id="k-sert/gcDAE-checkpoints",
	filename="stage2/cub200/hyp_c1/epoch=000032.ckpt",
	)
	```

	## Inference

	GcDAE inference uses:

	- `s1_checkpoint_path`: the Stage 1 diffusion checkpoint.
	- `s2_checkpoint_path`: the Stage 2 geometry conditioned encoder checkpoint.

	Example for CUB-200-2011 with the hyperbolic `c=1` Stage 2 encoder:

	```bash
	cd gcDAE/HypDiffusion

	python inference_for_eva.py \
	--config_path configs/stable-diffusion/v2_inference_cub200_hyp_c1.yaml \
	--ckpt ../checkpoints/gcDAE-checkpoints/stage1/cub200/epoch=000007.ckpt \
	--image_folder /path/to/cub200/test_split \
	--outdir /path/to/outputs/cub200_hyp_c1 \
	--strength 0.35 \
	--scale 1.0 \
	--seed 23 \
	--n_selected 5 \
	--n_samples 1 \
	--n_perturb_samples 4
	```

	Before running, update the `s1_checkpoint_path`, `s2_checkpoint_path`, and `encoder_version` fields in the corresponding gcDAE YAML file under:

	```text
	HypDiffusion/configs/stable-diffusion/
	```

	Recommended config mapping:

	\| Dataset \| Geometry \| Config \|
	\| :--- \| :--- \| :--- \|
	\| CUB-200-2011 \| Euclidean \| `v2_inference_cub200_euc.yaml` \|
	\| CUB-200-2011 \| Hyperbolic c=1 \| `v2_inference_cub200_hyp_c1.yaml` \|
	\| CUB-200-2011 \| Spherical c=1 \| `v2_inference_cub200_sph_c1.yaml` \|
	\| Caltech-256 \| Euclidean \| `v2_inference_caltech256_euc.yaml` \|
	\| Caltech-256 \| Hyperbolic c=1 \| `v2_inference_caltech256_hyp_c1.yaml` \|
	\| Caltech-256 \| Spherical c=1 \| `v2_inference_caltech256_sph_c1.yaml` \|

	## Training Data

	These checkpoints were trained and evaluated on:

	- CUB-200-2011: fine-grained bird image classification dataset.
	- Caltech-256: object category recognition dataset.

	Users are responsible for downloading the datasets from their official sources and complying with their terms of use.

	## Intended Use

	These checkpoints are released for research on:

	- geometry-conditioned diffusion generation,
	- latent-space representation learning,
	- semantic perturbations and controllable diversity,
	- Euclidean, hyperbolic, and spherical latent-space comparisons.

	They are not intended for production image-generation services or safety-critical applications.

	## Limitations

	- The checkpoints inherit limitations from Stable Diffusion 2.1 and the training datasets.
	- Generated images may contain artifacts, incorrect class attributes, or dataset biases.
	- The Stage 2 encoders are dataset-specific and should not be treated as general-purpose visual encoders.
	- Hyperbolic and spherical variants are experimental research checkpoints rather than universal improvements over Euclidean conditioning.

	## License

	GcDAE has multiple upstream license considerations:

	- The Stage 1 diffusion backbone builds on Stable Diffusion 2.1 and follows the CreativeML Open RAIL++-M license terms for the diffusion backbone and derived checkpoints.
	- The Stage 2 encoder uses OpenAI CLIP ViT-L/14 features. The OpenAI CLIP repository is released under the MIT License.
	- CUB-200-2011 and Caltech-256 have their own dataset distribution and usage terms.

	Please also consult:

	- Stable Diffusion 2.1: https://huggingface.co/stabilityai/stable-diffusion-2-1-base
	- OpenAI CLIP: https://github.com/openai/CLIP

	## Citation

	```bibtex
	@misc{sert2026gcDAE,
	title = {gcDAE: Geometry-Conditioned Latent Spaces for Diffusion-Based Generation},
	author = {Sert, Kagan and Oey, Elyanne and Amidirad, Daniël and van Campenhout, Arthur and Schuttrups, Casper},
	year = {2026}
	}
	```

	## Acknowledgments

	This release builds on [HypDAE](https://github.com/lingxiao-li/HypDAE), [Hyperbolic-Flow-Matching](https://github.com/federicavaleau/Hyperbolic-Flow-Matching), Stable Diffusion 2.1, and OpenAI CLIP. We thank the authors for their open-source contributions.