telcom
/

deewaiREALCN

@@ -6,7 +6,63 @@ tags:
 - stable-diffusion-xl
 - realistic
 - photorealistic
 ---
-Original model is [here](https://civitai.com/models/469902/wai-realcn).

 - stable-diffusion-xl
 - realistic
 - photorealistic
+- diffusers
+library_name: diffusers
+base_model: stabilityai/stable-diffusion-xl-base-1.0
 ---
+# WAI REALCN (SDXL)
+Photorealistic Stable Diffusion XL checkpoint released by the community as “WAI REALCN”. The model keeps the standard SDXL architecture (two CLIP text encoders, latent UNet, and VAE) and was shared on [Civitai](https://civitai.com/models/469902/wai-realcn).
+## Model Summary
+- Task: text-to-image generation at 1024×1024 (and downscaled resolutions).
+- Architecture: SDXL with two CLIP text encoders (`CLIPTextModel` + `CLIPTextModelWithProjection`), UNet with cross-attention, and an AutoencoderKL VAE (scaling factor 0.13025).
+- Scheduler: EulerDiscreteScheduler by default; other SDXL schedulers from `diffusers` also work.
+- Format: Diffusers pipeline (`StableDiffusionXLPipeline`) with FP16 weights expected at load time for GPU inference.
+## Recommended Use
+- Photorealistic portraits and lifestyle imagery; neutral prompting works best (avoid over-stylized prompts).
+- Works with standard SDXL negative prompting (e.g., “blurry, low quality, artifacts, extra limbs”).
+- 1024×1024 is the native resolution; smaller sizes are fine, higher may need upscaling.
+## Quickstart (Diffusers)
+```python
+import torch
+from diffusers import StableDiffusionXLPipeline
+pipe = StableDiffusionXLPipeline.from_pretrained(
+    "YOUR_USERNAME_HERE/deewaiREALCN",
+    torch_dtype=torch.float16,
+).to("cuda")
+prompt = "a candid street portrait of a young adult, soft daylight, shallow depth of field, high detail"
+negative_prompt = "blurry, low quality, extra fingers, distorted face"
+image = pipe(
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    num_inference_steps=30,
+    guidance_scale=7.5,
+).images[0]
+image.save("sample.png")
+```
+## Files and Architecture Notes
+- `model_index.json`: Declares `StableDiffusionXLPipeline` with dual tokenizers/encoders (standard SDXL design).
+- `tokenizer/` & `tokenizer_2/`: Separate CLIP tokenizers matching the two text encoders; keep both to preserve padding/special-token behavior.
+- `text_encoder/`: 12-layer CLIP text encoder (768 hidden size, quick GELU).
+- `text_encoder_2/`: 32-layer CLIP text encoder with projection (1280 hidden size, GELU).
+- `unet/`: Latent UNet with cross-attention (`sample_size`: 128 → 1024px images).
+- `vae/`: AutoencoderKL with `scaling_factor: 0.13025` for latents.
+- `scheduler/`: Default Euler scheduler settings.
+## Prompting Tips
+- Start concise: subject + setting + lighting + camera feel (e.g., “portrait, indoor window light, 85mm, f/1.8”).
+- Add quality anchors sparingly (“high detail”, “natural skin”, “cinematic lighting”).
+- Keep negatives short; overlong negatives can reduce fidelity.
+## Safety and Limitations
+- May reproduce biases or create sensitive/NSFW content; review outputs before use.
+- Not guaranteed for medical, legal, or safety-critical applications.
+- Respect the CreativeML Open RAIL-M license; comply with downstream use restrictions.