File size: 3,527 Bytes

b6acc0a

---
license: apache-2.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
  - remote-sensing
  - diffusion
  - controlnet
  - custom-pipeline
language:
  - en
---

> [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn

# BiliSakura/CRS-Diff

Diffusers-style packaging for the CRS-Diff checkpoint, with a custom Hugging Face `DiffusionPipeline` implementation.

## Model Details

- **Base project**: `CRS-Diff` (Controllable Remote Sensing Image Generation with Diffusion Model)
- **Checkpoint source**: `/root/worksapce/models/raw/CRS-Diff/last.ckpt`
- **Pipeline class**: `CRSDiffPipeline` (in `pipeline.py`)
- **Scheduler**: `DDIMScheduler`
- **Resolution**: 512x512 (default in training/inference config)

## Repository Structure

```text
CRS-Diff/
  pipeline.py
  modular_pipeline.py
  crs_core/
    autoencoder.py
    text_encoder.py
    local_adapter.py
    global_adapter.py
    metadata_embedding.py
    modules/
  model_index.json
  scheduler/
    scheduler_config.json
  unet/
  vae/
  text_encoder/
  local_adapter/
  global_content_adapter/
  global_text_adapter/
  metadata_encoder/
```

## Usage

Install dependencies first:

```bash
pip install diffusers transformers torch torchvision omegaconf einops safetensors pytorch-lightning
```

Load the pipeline (local path or Hub repo), then run inference:

```python
import torch
import numpy as np
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "/root/worksapce/models/BiliSakura/CRS-Diff",
    custom_pipeline="pipeline.py",
    trust_remote_code=True,
    model_path="/root/worksapce/models/BiliSakura/CRS-Diff",
)
pipe = pipe.to("cuda")

# Example placeholder controls; replace with real CRS controls.
b = 1
local_control = torch.zeros((b, 18, 512, 512), device="cuda", dtype=torch.float32)
global_control = torch.zeros((b, 1536), device="cuda", dtype=torch.float32)
metadata = torch.zeros((b, 7), device="cuda", dtype=torch.float32)

out = pipe(
    prompt=["a remote sensing image of an urban area"],
    negative_prompt=["blurry, distorted, overexposed"],
    local_control=local_control,
    global_control=global_control,
    metadata=metadata,
    num_inference_steps=50,
    guidance_scale=7.5,
    eta=0.0,
    strength=1.0,
    global_strength=1.0,
    output_type="pil",
)
image = out.images[0]
image.save("crs_diff_sample.png")
```

## Notes

- This repository is packaged in a diffusers-compatible layout with a custom pipeline.
- Loading path follows the same placeholder-aware custom pipeline pattern as HSIGene.
- Split component weights are provided in diffusers-style folders (`unet/`, `vae/`, adapters, and encoders).
- Monolithic `crs_model/last.ckpt` fallback is intentionally removed; this repo is split-components only.
- Legacy external source trees (`models/`, `ldm/`) are removed; runtime code is in lightweight `crs_core/`.
- `CRSDiffPipeline` expects CRS-specific condition tensors (`local_control`, `global_control`, `metadata`).
- If you publish to Hugging Face Hub, keep `trust_remote_code=True` when loading.

## Citation

```bibtex
@article{tang2024crs,
  title={Crs-diff: Controllable remote sensing image generation with diffusion model},
  author={Tang, Datao and Cao, Xiangyong and Hou, Xingsong and Jiang, Zhongyuan and Liu, Junmin and Meng, Deyu},
  journal={IEEE Transactions on Geoscience and Remote Sensing},
  year={2024},
  publisher={IEEE}
}
```