BiliSakura
/

RSEdit-UNet-text-ablation

+# RSEdit-UNet Text Encoder Ablation Models - Inference Guide
+Quick guide for running inference with RSEdit UNet ablation models (text encoder variants).
+## Quick Start
+### Python Code Example
+```python
+import torch
+from PIL import Image
+from diffusers import StableDiffusionInstructPix2PixPipeline, UNet2DConditionModel
+# Example: DGTRS-CLIP-ViT-L-14 ablation model
+# Each checkpoint directory is self-contained with all components
+checkpoint_path = "/data/models/ours/BiliSakura/RSEdit-UNet-text-ablation/DGTRS-CLIP-ViT-L-14"
+# Load pipeline from checkpoint (loads all components: vae, text_encoder, tokenizer, scheduler)
+pipe = StableDiffusionInstructPix2PixPipeline.from_pretrained(
+    checkpoint_path,
+    torch_dtype=torch.bfloat16,
+    safety_checker=None,
+    requires_safety_checker=False,
+)
+# Override UNet with trained EMA weights
+pipe.unet = UNet2DConditionModel.from_pretrained(
+    f"{checkpoint_path}/checkpoint-30000/unet_ema",
+    torch_dtype=torch.bfloat16,
+)
+pipe = pipe.to("cuda")
+# Load source image
+source_image = Image.open("satellite_image.png").convert("RGB")
+# Edit with instruction
+prompt = "Flood the coastal area"
+edited_image = pipe(
+    prompt=prompt,
+    image=source_image,
+    num_inference_steps=50,
+    guidance_scale=7.5,
+    image_guidance_scale=1.5,
+).images[0]
+# Save result
+edited_image.save("edited_image.png")
+```
+## Model Structure
+Each ablation model directory is self-contained and includes:
+- `text_encoder/`: Text encoder component
+- `tokenizer/`: Tokenizer component
+- `vae/`: VAE component
+- `scheduler/`: Scheduler component
+- `unet/`: Base UNet (not used for inference)
+- `checkpoint-30000/unet_ema/`: Trained UNet EMA weights (use for inference)
+- `model_index.json`: Pipeline configuration