--- license: apache-2.0 library_name: diffusers pipeline_tag: text-to-image tags: - remote-sensing - diffusion - controlnet - custom-pipeline language: - en --- > [!WARNING] we do not have a full checkpoint conversion validation, if you encounter pipeline loading failure and unsidered output, please contact me via bili_sakura@zju.edu.cn # BiliSakura/CRS-Diff Diffusers-style packaging for the CRS-Diff checkpoint, with a custom Hugging Face `DiffusionPipeline` implementation. ## Model Details - **Base project**: `CRS-Diff` (Controllable Remote Sensing Image Generation with Diffusion Model) - **Checkpoint source**: `/root/worksapce/models/raw/CRS-Diff/last.ckpt` - **Pipeline class**: `CRSDiffPipeline` (in `pipeline.py`) - **Scheduler**: `DDIMScheduler` - **Resolution**: 512x512 (default in training/inference config) ## Repository Structure ```text CRS-Diff/ pipeline.py modular_pipeline.py crs_core/ autoencoder.py text_encoder.py local_adapter.py global_adapter.py metadata_embedding.py modules/ model_index.json scheduler/ scheduler_config.json unet/ vae/ text_encoder/ local_adapter/ global_content_adapter/ global_text_adapter/ metadata_encoder/ ``` ## Usage Install dependencies first: ```bash pip install diffusers transformers torch torchvision omegaconf einops safetensors pytorch-lightning ``` Load the pipeline (local path or Hub repo), then run inference: ```python import torch import numpy as np from diffusers import DiffusionPipeline pipe = DiffusionPipeline.from_pretrained( "/root/worksapce/models/BiliSakura/CRS-Diff", custom_pipeline="pipeline.py", trust_remote_code=True, model_path="/root/worksapce/models/BiliSakura/CRS-Diff", ) pipe = pipe.to("cuda") # Example placeholder controls; replace with real CRS controls. b = 1 local_control = torch.zeros((b, 18, 512, 512), device="cuda", dtype=torch.float32) global_control = torch.zeros((b, 1536), device="cuda", dtype=torch.float32) metadata = torch.zeros((b, 7), device="cuda", dtype=torch.float32) out = pipe( prompt=["a remote sensing image of an urban area"], negative_prompt=["blurry, distorted, overexposed"], local_control=local_control, global_control=global_control, metadata=metadata, num_inference_steps=50, guidance_scale=7.5, eta=0.0, strength=1.0, global_strength=1.0, output_type="pil", ) image = out.images[0] image.save("crs_diff_sample.png") ``` ## Notes - This repository is packaged in a diffusers-compatible layout with a custom pipeline. - Loading path follows the same placeholder-aware custom pipeline pattern as HSIGene. - Split component weights are provided in diffusers-style folders (`unet/`, `vae/`, adapters, and encoders). - Monolithic `crs_model/last.ckpt` fallback is intentionally removed; this repo is split-components only. - Legacy external source trees (`models/`, `ldm/`) are removed; runtime code is in lightweight `crs_core/`. - `CRSDiffPipeline` expects CRS-specific condition tensors (`local_control`, `global_control`, `metadata`). - If you publish to Hugging Face Hub, keep `trust_remote_code=True` when loading. ## Citation ```bibtex @article{tang2024crs, title={Crs-diff: Controllable remote sensing image generation with diffusion model}, author={Tang, Datao and Cao, Xiangyong and Hou, Xingsong and Jiang, Zhongyuan and Liu, Junmin and Meng, Deyu}, journal={IEEE Transactions on Geoscience and Remote Sensing}, year={2024}, publisher={IEEE} } ```