mingyi456 commited on
Commit
38f3380
·
verified ·
1 Parent(s): d2b75ee

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -1
README.md CHANGED
@@ -10,4 +10,84 @@ pipeline_tag: text-to-image
10
  library_name: diffusers
11
  tags:
12
  - diffusion-single-file
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  library_name: diffusers
11
  tags:
12
  - diffusion-single-file
13
+ ---
14
+ For more information (including how to compress models yourself), check out https://huggingface.co/DFloat11 and https://github.com/LeanModels/DFloat11
15
+
16
+ Feel free to request for other models for compression as well (for either the `diffusers` library, ComfyUI, or any other model), although models that use architectures which are unfamiliar to me might be more difficult.
17
+
18
+ ### How to Use
19
+
20
+ #### `diffusers`
21
+
22
+ ```python
23
+ import torch
24
+ from diffusers import ErnieImagePipeline, ErnieImageTransformer2DModel
25
+
26
+ # from transformers.modeling_utils import no_init_weights # for transformers version < 5.0.0
27
+ from transformers.initialization import no_init_weights # for transformers version >= 5.0.0
28
+
29
+ with no_init_weights():
30
+ transformer = ErnieImageTransformer2DModel.from_config(
31
+ ErnieImageTransformer2DModel.load_config(
32
+ "baidu/ERNIE-Image", subfolder="transformer"
33
+ ),
34
+ torch_dtype=torch.bfloat16
35
+ ).to(torch.bfloat16)
36
+ DFloat11Model.from_pretrained(
37
+ "mingyi456/ERNIE-Image-DF11",
38
+ device="cpu",
39
+ bfloat16_model=transformer,
40
+ )
41
+ pipe = ErnieImagePipeline.from_pretrained(
42
+ "baidu/ERNIE-Image",
43
+ transformer=transformer,
44
+ torch_dtype=torch.bfloat16
45
+ )
46
+
47
+ pipe.enable_model_cpu_offload()
48
+
49
+ prompt = "This is a photograph depicting an urban street scene. Shot at eye level, it shows a covered pedestrian or commercial street. Slightly below the center of the frame, a cyclist rides away from the camera toward the background, appearing as a dark silhouette against backlighting with indistinct details. The ground is paved with regular square tiles, bisected by a prominent tactile paving strip running through the scene, whose raised textures are clearly visible under the light. Light streams in diagonally from the right side of the frame, creating a strong backlight effect with a distinct Tyndall effect—visible light beams illuminating dust or vapor in the air and casting long shadows across the street. Several pedestrians appear on the left side and in the distance, some with their backs to the camera and others walking sideways, all rendered as silhouettes or semi-silhouettes. The overall color palette is warm, dominated by golden yellows and dark browns, evoking the atmosphere of dusk or early morning."
50
+
51
+ # `torch.inference_mode()` is important for `ErnieImagePipeline`, as by default it invokes
52
+ # `_gradient_checkpointing_func()` which makes the DF11 result differ from BF16
53
+ with torch.inference_mode():
54
+ image = pipe(
55
+ prompt=prompt,
56
+ height=1264,
57
+ width=848,
58
+ num_inference_steps=50,
59
+ guidance_scale=4.0,
60
+ use_pe=True # use prompt enhancer
61
+ ).images[0]
62
+
63
+ image.save('image ernie-image.png')
64
+ ```
65
+
66
+ #### ComfyUI
67
+ Refer to this [model](https://huggingface.co/mingyi456/ERNIE-Image-DF11-ComfyUI) instead.
68
+
69
+
70
+ ### Compression details
71
+
72
+ This is the `pattern_dict` for compression:
73
+
74
+ ```python
75
+ pattern_dict = {
76
+ r"time_embedding": (
77
+ "linear_1",
78
+ "linear_2",
79
+ ),
80
+ r"adaLN_modulation.1": [],
81
+
82
+ r"layers\.\d+": (
83
+ "self_attention.to_q",
84
+ "self_attention.to_k",
85
+ "self_attention.to_v",
86
+ "self_attention.to_out.0",
87
+ "mlp.gate_proj",
88
+ "mlp.up_proj",
89
+ "mlp.linear_fc2",
90
+ ),
91
+ r"final_norm.linear": [],
92
+ }
93
+ ```