Upload folder using huggingface_hub

Browse files

Files changed (6) hide show

README.md +118 -0
galaxy10_classifier.config.json +4 -0
galaxy10_classifier.model.safetensors +3 -0
latent_diffusion_galaxy10_xattn_v1.config.json +28 -0
latent_diffusion_galaxy10_xattn_v1.model.safetensors +3 -0
latent_diffusion_galaxy10_xattn_v1.vae.safetensors +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,118 @@

+---
+license: cc0-1.0
+tags:
+  - diffusion
+  - latent-diffusion
+  - classifier-free-guidance
+  - astronomy
+  - galaxies
+  - galaxy10
+library_name: pytorch
+---
+# Galaxy Diffusion — latent diffusion weights (Galaxy10 DECaLS)
+Conditional **latent diffusion model** (VAE + classifier-free guidance) for generating
+galaxy images by morphology class, trained on **Galaxy10 DECaLS** (17,736 RGB images,
+256×256, 10 morphological classes).
+These are the `.safetensors` weights. The model uses a **custom architecture** — it is
+**not** a `transformers` / `diffusers` model and does not load via `AutoModel`. You need
+the `galaxy_diffusion` package from the code repository to instantiate it.
+- **Code:** https://github.com/LLapsus/galaxy-diffusion
+- **License:** CC0 1.0 (public domain)
+## Files
+| File | Contents |
+|---|---|
+| `latent_diffusion_galaxy10_xattn_v1.model.safetensors` | UNet denoiser (`LatentUNetCA`, cross-attention conditioning), ~27.9M params |
+| `latent_diffusion_galaxy10_xattn_v1.vae.safetensors` | VAE (image ↔ 4×32×32 latent), ~1.09M params |
+| `latent_diffusion_galaxy10_xattn_v1.config.json` | constructor args (`vae_config`, `unet_config`, `unet_type`) + latent normalisation stats (`latents_mean`, `latents_std`) |
+| `galaxy10_classifier.model.safetensors` | `GalaxyCNN` evaluation classifier, ~1.75M params (val acc 0.829) |
+| `galaxy10_classifier.config.json` | classifier metadata (`val_acc`, `epoch`) |
+## Installation
+```bash
+pip install "git+https://github.com/LLapsus/galaxy-diffusion.git"
+pip install huggingface_hub safetensors
+```
+## Load the weights
+```python
+import json
+import torch
+from huggingface_hub import snapshot_download
+from safetensors.torch import load_file
+from galaxy_diffusion.models.vae import VAE
+from galaxy_diffusion.models.unet import LatentUNetCA
+path = snapshot_download("LLapsus/galaxy-diffusion")  # downloads all files
+cfg  = json.load(open(f"{path}/latent_diffusion_galaxy10_xattn_v1.config.json"))
+vae = VAE(**cfg["vae_config"])
+vae.load_state_dict(load_file(f"{path}/latent_diffusion_galaxy10_xattn_v1.vae.safetensors"))
+vae.eval()
+unet = LatentUNetCA(**cfg["unet_config"])
+unet.load_state_dict(load_file(f"{path}/latent_diffusion_galaxy10_xattn_v1.model.safetensors"))
+unet.eval()
+```
+## Generate images
+```python
+from galaxy_diffusion.diffusion.ddpm import cosine_schedule, sample_cfg
+device = "cuda"
+vae, unet = vae.to(device), unet.to(device)
+_, alpha, alpha_bar = cosine_schedule(1000)
+alpha, alpha_bar = alpha.to(device), alpha_bar.to(device)
+latents_mean = torch.tensor(cfg["latents_mean"])
+latents_std  = torch.tensor(cfg["latents_std"])
+images = sample_cfg(
+    unet, vae,
+    classes=list(range(10)),                       # one image per class
+    alpha=alpha, alpha_bar=alpha_bar,
+    latent_shape=(cfg["unet_config"]["latent_channels"], 32, 32),
+    latents_mean=latents_mean, latents_std=latents_std,
+    device=device,
+    guidance_scale=2.5,                            # see "Guidance scale" below
+    cfg_rescale=0.7,                               # CFG rescaling (Lin et al., 2023)
+)   # -> tensor (10, 3, 256, 256) in [-1, 1]
+```
+The classifier is loaded analogously with `GalaxyCNN` from
+`galaxy_diffusion.models.classifier`.
+## Model details
+- **VAE:** 8× spatial compression, 3×256×256 ↔ 4×32×32, KL-regularised.
+- **UNet (`LatentUNetCA`):** time conditioning via AdaGN, class conditioning via a
+  cross-attention block after each encoder/decoder level + bottleneck; cosine noise
+  schedule (T=1000); trained with Min-SNR-weighted MSE and 10% CFG label dropout.
+- **Classifier (`GalaxyCNN`):** trained on VAE-reconstructed images (to match the
+  distribution of diffusion outputs) for evaluating class fidelity of generated samples.
+### Guidance scale
+Classifier recall on generated images peaks around `w ≈ 3`, but latent-space coverage
+analysis shows `w ≈ 2.5` is the better fidelity/diversity operating point (matched
+within-class spread). Higher `w` over-extrapolates samples toward neighbouring classes.
+See the coverage analysis in the code repository.
+## Training data
+**Galaxy10 DECaLS** — https://astronn.readthedocs.io/en/latest/galaxy10.html
+(17,736 images; 10 classes; not redistributed here).
+## Citation
+TODO(pavel): citation / blog post link.

galaxy10_classifier.config.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "val_acc": 0.8291925465838509,
+  "epoch": 84
+}

galaxy10_classifier.model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:fbed56693211bafd98ab136c5935f84e3dfe6ca3faaf870b02dc67afe0baf16a
+size 7022264

latent_diffusion_galaxy10_xattn_v1.config.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "latents_mean": -0.03181709349155426,
+  "latents_std": 0.8395193219184875,
+  "vae_config": {
+    "in_channels": 3,
+    "latent_channels": 4,
+    "base_channels": 32,
+    "num_downsamples": 3
+  },
+  "unet_config": {
+    "latent_channels": 4,
+    "base_channels": 128,
+    "channel_mult": [
+      1,
+      2,
+      4
+    ],
+    "num_res_blocks": 2,
+    "time_emb_dim": 256,
+    "class_emb_dim": 256,
+    "num_classes": 10,
+    "attn_levels": [
+      1
+    ],
+    "cross_attn_heads": 4
+  },
+  "unet_type": "LatentUNetCA"
+}

latent_diffusion_galaxy10_xattn_v1.model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:744d5491c1f6318e795b0aaa4bbba9eb9be6c40a96d7e110039fc0050c782663
+size 111486200

latent_diffusion_galaxy10_xattn_v1.vae.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ea48697060ddb9ea816876d90c6090d3383c3803f30643a38384c7f694844917
+size 4367516