Javad Taghia commited on
Commit
afef3b7
·
1 Parent(s): 6e02353

update the model card

Browse files
Files changed (1) hide show
  1. README.md +57 -1
README.md CHANGED
@@ -6,7 +6,63 @@ tags:
6
  - stable-diffusion-xl
7
  - realistic
8
  - photorealistic
 
 
 
9
  ---
 
10
 
11
- Original model is [here](https://civitai.com/models/469902/wai-realcn).
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - stable-diffusion-xl
7
  - realistic
8
  - photorealistic
9
+ - diffusers
10
+ library_name: diffusers
11
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
12
  ---
13
+ # WAI REALCN (SDXL)
14
 
15
+ Photorealistic Stable Diffusion XL checkpoint released by the community as “WAI REALCN”. The model keeps the standard SDXL architecture (two CLIP text encoders, latent UNet, and VAE) and was shared on [Civitai](https://civitai.com/models/469902/wai-realcn).
16
 
17
+ ## Model Summary
18
+ - Task: text-to-image generation at 1024×1024 (and downscaled resolutions).
19
+ - Architecture: SDXL with two CLIP text encoders (`CLIPTextModel` + `CLIPTextModelWithProjection`), UNet with cross-attention, and an AutoencoderKL VAE (scaling factor 0.13025).
20
+ - Scheduler: EulerDiscreteScheduler by default; other SDXL schedulers from `diffusers` also work.
21
+ - Format: Diffusers pipeline (`StableDiffusionXLPipeline`) with FP16 weights expected at load time for GPU inference.
22
+
23
+ ## Recommended Use
24
+ - Photorealistic portraits and lifestyle imagery; neutral prompting works best (avoid over-stylized prompts).
25
+ - Works with standard SDXL negative prompting (e.g., “blurry, low quality, artifacts, extra limbs”).
26
+ - 1024×1024 is the native resolution; smaller sizes are fine, higher may need upscaling.
27
+
28
+ ## Quickstart (Diffusers)
29
+ ```python
30
+ import torch
31
+ from diffusers import StableDiffusionXLPipeline
32
+
33
+ pipe = StableDiffusionXLPipeline.from_pretrained(
34
+ "YOUR_USERNAME_HERE/deewaiREALCN",
35
+ torch_dtype=torch.float16,
36
+ ).to("cuda")
37
+
38
+ prompt = "a candid street portrait of a young adult, soft daylight, shallow depth of field, high detail"
39
+ negative_prompt = "blurry, low quality, extra fingers, distorted face"
40
+
41
+ image = pipe(
42
+ prompt=prompt,
43
+ negative_prompt=negative_prompt,
44
+ num_inference_steps=30,
45
+ guidance_scale=7.5,
46
+ ).images[0]
47
+
48
+ image.save("sample.png")
49
+ ```
50
+
51
+ ## Files and Architecture Notes
52
+ - `model_index.json`: Declares `StableDiffusionXLPipeline` with dual tokenizers/encoders (standard SDXL design).
53
+ - `tokenizer/` & `tokenizer_2/`: Separate CLIP tokenizers matching the two text encoders; keep both to preserve padding/special-token behavior.
54
+ - `text_encoder/`: 12-layer CLIP text encoder (768 hidden size, quick GELU).
55
+ - `text_encoder_2/`: 32-layer CLIP text encoder with projection (1280 hidden size, GELU).
56
+ - `unet/`: Latent UNet with cross-attention (`sample_size`: 128 → 1024px images).
57
+ - `vae/`: AutoencoderKL with `scaling_factor: 0.13025` for latents.
58
+ - `scheduler/`: Default Euler scheduler settings.
59
+
60
+ ## Prompting Tips
61
+ - Start concise: subject + setting + lighting + camera feel (e.g., “portrait, indoor window light, 85mm, f/1.8”).
62
+ - Add quality anchors sparingly (“high detail”, “natural skin”, “cinematic lighting”).
63
+ - Keep negatives short; overlong negatives can reduce fidelity.
64
+
65
+ ## Safety and Limitations
66
+ - May reproduce biases or create sensitive/NSFW content; review outputs before use.
67
+ - Not guaranteed for medical, legal, or safety-critical applications.
68
+ - Respect the CreativeML Open RAIL-M license; comply with downstream use restrictions.