File size: 3,527 Bytes
b6acc0a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | ---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
- remote-sensing
- diffusion
- controlnet
- custom-pipeline
language:
- en
---
> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn
# BiliSakura/CRS-Diff
Diffusers-style packaging for the CRS-Diff checkpoint, with a custom Hugging Face `DiffusionPipeline` implementation.
## Model Details
- **Base project**: `CRS-Diff` (Controllable Remote Sensing Image Generation with Diffusion Model)
- **Checkpoint source**: `/root/worksapce/models/raw/CRS-Diff/last.ckpt`
- **Pipeline class**: `CRSDiffPipeline` (in `pipeline.py`)
- **Scheduler**: `DDIMScheduler`
- **Resolution**: 512x512 (default in training/inference config)
## Repository Structure
```text
CRS-Diff/
pipeline.py
modular_pipeline.py
crs_core/
autoencoder.py
text_encoder.py
local_adapter.py
global_adapter.py
metadata_embedding.py
modules/
model_index.json
scheduler/
scheduler_config.json
unet/
vae/
text_encoder/
local_adapter/
global_content_adapter/
global_text_adapter/
metadata_encoder/
```
## Usage
Install dependencies first:
```bash
pip install diffusers transformers torch torchvision omegaconf einops safetensors pytorch-lightning
```
Load the pipeline (local path or Hub repo), then run inference:
```python
import torch
import numpy as np
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"/root/worksapce/models/BiliSakura/CRS-Diff",
custom_pipeline="pipeline.py",
trust_remote_code=True,
model_path="/root/worksapce/models/BiliSakura/CRS-Diff",
)
pipe = pipe.to("cuda")
# Example placeholder controls; replace with real CRS controls.
b = 1
local_control = torch.zeros((b, 18, 512, 512), device="cuda", dtype=torch.float32)
global_control = torch.zeros((b, 1536), device="cuda", dtype=torch.float32)
metadata = torch.zeros((b, 7), device="cuda", dtype=torch.float32)
out = pipe(
prompt=["a remote sensing image of an urban area"],
negative_prompt=["blurry, distorted, overexposed"],
local_control=local_control,
global_control=global_control,
metadata=metadata,
num_inference_steps=50,
guidance_scale=7.5,
eta=0.0,
strength=1.0,
global_strength=1.0,
output_type="pil",
)
image = out.images[0]
image.save("crs_diff_sample.png")
```
## Notes
- This repository is packaged in a diffusers-compatible layout with a custom pipeline.
- Loading path follows the same placeholder-aware custom pipeline pattern as HSIGene.
- Split component weights are provided in diffusers-style folders (`unet/`, `vae/`, adapters, and encoders).
- Monolithic `crs_model/last.ckpt` fallback is intentionally removed; this repo is split-components only.
- Legacy external source trees (`models/`, `ldm/`) are removed; runtime code is in lightweight `crs_core/`.
- `CRSDiffPipeline` expects CRS-specific condition tensors (`local_control`, `global_control`, `metadata`).
- If you publish to Hugging Face Hub, keep `trust_remote_code=True` when loading.
## Citation
```bibtex
@article{tang2024crs,
title={Crs-diff: Controllable remote sensing image generation with diffusion model},
author={Tang, Datao and Cao, Xiangyong and Hou, Xingsong and Jiang, Zhongyuan and Liu, Junmin and Meng, Deyu},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2024},
publisher={IEEE}
}
```
|