SichengMo-UCLA commited on
Commit
1c0d5df
·
verified ·
1 Parent(s): 713515b

Upload config.yaml

Browse files
Files changed (1) hide show
  1. config.yaml +156 -0
config.yaml ADDED
@@ -0,0 +1,156 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ model:
2
+ target: simgen.models.cascade_controlnet.UniControlNet
3
+ params:
4
+ linear_start: 0.00085
5
+ linear_end: 0.0120
6
+ num_timesteps_cond: 1
7
+ log_every_t: 200
8
+ timesteps: 1000
9
+ first_stage_key: "jpg"
10
+ cond_stage_key: "txt"
11
+ image_size: 64
12
+ channels: 4
13
+ cond_stage_trainable: false
14
+ conditioning_key: crossattn
15
+ monitor: val/loss_simple_ema
16
+ scale_factor: 0.18215
17
+ use_ema: False
18
+ mode: local
19
+ parameterization: "v"
20
+
21
+ local_control_config:
22
+ target: simgen.models.local_adapter.LocalAdapter
23
+ params:
24
+ in_channels: 4
25
+ model_channels: 320
26
+ local_channels: 6 # 21, then 6 for 2 condition, now 15 for 5
27
+ inject_channels: [192, 256, 384, 512]
28
+ inject_layers: [1, 4, 7, 10]
29
+ num_res_blocks: 2
30
+ attention_resolutions: [4, 2, 1]
31
+ channel_mult: [1, 2, 4, 4]
32
+ use_checkpoint: True
33
+ # num_heads: 8
34
+ num_head_channels: 64 # need to fix for flash-attn
35
+ use_spatial_transformer: True
36
+ use_linear_in_transformer: True
37
+ transformer_depth: 1
38
+ context_dim: 1024 # 768
39
+ legacy: False
40
+
41
+ unet_config:
42
+ target: simgen.models.local_adapter.LocalControlUNetModel
43
+ params:
44
+ image_size: 32
45
+ in_channels: 4
46
+ model_channels: 320
47
+ out_channels: 4
48
+ num_res_blocks: 2
49
+ attention_resolutions: [4, 2, 1]
50
+ channel_mult: [1, 2, 4, 4]
51
+ use_checkpoint: True
52
+ # num_heads: 8
53
+ num_head_channels: 64 # need to fix for flash-attn
54
+ use_spatial_transformer: True
55
+ use_linear_in_transformer: True
56
+ transformer_depth: 1
57
+ context_dim: 1024 # 768
58
+ legacy: False
59
+
60
+ first_stage_config:
61
+ target: simgen.ldm.models.autoencoder.AutoencoderKL
62
+ params:
63
+ embed_dim: 4
64
+ monitor: val/rec_loss
65
+ ddconfig:
66
+ double_z: true
67
+ z_channels: 4
68
+ resolution: 256
69
+ in_channels: 3
70
+ out_ch: 3
71
+ ch: 128
72
+ ch_mult:
73
+ - 1
74
+ - 2
75
+ - 4
76
+ - 4
77
+ num_res_blocks: 2
78
+ attn_resolutions: []
79
+ dropout: 0.0
80
+ lossconfig:
81
+ target: torch.nn.Identity
82
+
83
+ cond_stage_config:
84
+ target: simgen.ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
85
+ params:
86
+ freeze: True
87
+ layer: "penultimate"
88
+ # target: ldm.modules.encoders.modules.FrozenCLIPEmbedder
89
+
90
+ first_cond_config:
91
+ target: simgen.models.t2i_model.T2IModel
92
+ params:
93
+ linear_start: 0.00085
94
+ linear_end: 0.0120
95
+ num_timesteps_cond: 1
96
+ log_every_t: 200
97
+ timesteps: 1000
98
+ first_stage_key: "jpg"
99
+ cond_stage_key: "txt"
100
+ image_size: 64
101
+ channels: 4
102
+ cond_stage_trainable: false
103
+ conditioning_key: crossattn
104
+ monitor: val/loss_simple_ema
105
+ scale_factor: 0.18215
106
+ use_ema: False
107
+ parameterization: "v"
108
+
109
+ unet_config:
110
+ target: simgen.ldm.modules.diffusionmodules.openaimodel.UNetModel
111
+ params:
112
+ image_size: 32
113
+ in_channels: 4
114
+ model_channels: 320
115
+ out_channels: 4
116
+ num_res_blocks: 2
117
+ attention_resolutions: [4, 2, 1]
118
+ channel_mult: [1, 2, 4, 4]
119
+ use_checkpoint: True
120
+ # num_heads: 8
121
+ num_head_channels: 64 # need to fix for flash-attn
122
+ use_spatial_transformer: True
123
+ use_linear_in_transformer: True
124
+ transformer_depth: 1
125
+ context_dim: 1024 # 768
126
+ legacy: False
127
+
128
+ first_stage_config:
129
+ target: simgen.ldm.models.autoencoder.AutoencoderKL
130
+ params:
131
+ embed_dim: 4
132
+ monitor: val/rec_loss
133
+ ddconfig:
134
+ double_z: true
135
+ z_channels: 4
136
+ resolution: 256
137
+ in_channels: 3
138
+ out_ch: 3
139
+ ch: 128
140
+ ch_mult:
141
+ - 1
142
+ - 2
143
+ - 4
144
+ - 4
145
+ num_res_blocks: 2
146
+ attn_resolutions: []
147
+ dropout: 0.0
148
+ lossconfig:
149
+ target: torch.nn.Identity
150
+
151
+ cond_stage_config:
152
+ target: simgen.ldm.modules.encoders.modules.FrozenOpenCLIPEmbedder
153
+ params:
154
+ freeze: True
155
+ layer: "penultimate"
156
+ # target: ldm.modules.encoders.modules.FrozenCLIPEmbedder