JasonXF commited on
Commit
4069eda
Β·
verified Β·
1 Parent(s): 78fe081

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. output/_smoke_test_1gpu/train.log +160 -0
output/_smoke_test_1gpu/train.log ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Run dir : output/_smoke_test_1gpu
2
+ Log file: output/_smoke_test_1gpu/train.log
3
+ GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition | VRAM: 95.0 GiB | PyTorch: 2.10.0+cu128
4
+
5
+ Final Configuration:
6
+ Paths:
7
+ transformer_path weights/flux2_dev_fp8mixed.safetensors
8
+ vae_path weights/flux2-vae.safetensors
9
+ controlnet_path weights/FLUX.2-dev-Fun-Controlnet-Union-2602.safetensors
10
+ dataset_dir dataset
11
+ color_map_path configs/color_map.json
12
+ output_dir output/_smoke_test_1gpu
13
+ text_encoder_path weights/mistral_3_small_flux2_fp8.safetensors
14
+ precomputed_embeddings output/text_embeddings_global.pt
15
+ Model:
16
+ image_size 1024
17
+ num_classes 6
18
+ control_in_dim 3072
19
+ fusion_dim 768
20
+ num_fusion_blocks 3
21
+ num_heads 12
22
+ num_fourier_bands 32
23
+ boundary_threshold 0.1
24
+ Training:
25
+ num_epochs 1
26
+ batch_size 4
27
+ learning_rate 0.0003
28
+ weight_decay 0.01
29
+ max_grad_norm 1.0
30
+ grad_accum_steps 4
31
+ guidance_scale 3.5
32
+ num_workers 0
33
+ Text Encoder:
34
+ text_seq_len 512
35
+ text_dim 15360
36
+ Logging:
37
+ log_interval 1
38
+ save_every_n_epochs 5
39
+ val_every_n_epochs 1
40
+ WandB:
41
+ wandb_entity
42
+ wandb_project _smoke_test_1gpu
43
+ Resume:
44
+ resume_from (not set)
45
+ [MEM @ pre-flight] RAM: 25.5/188.2 GiB (13.6%) | VRAM: 0.0/95.0 GiB (0.0%)
46
+
47
+ ============================================================
48
+ [1/8] Text Embeddings
49
+ ============================================================
50
+ Loading cached embedding from output/text_embeddings_global.pt
51
+ Loaded global text embedding from output/text_embeddings_global.pt (shape: torch.Size([512, 15360]))
52
+
53
+ ============================================================
54
+ [2/8] Loading VAE
55
+ ============================================================
56
+ Done (4.3s), VRAM: 0.16 GiB
57
+ [MEM @ after VAE] RAM: 25.9/188.2 GiB (13.8%) | VRAM: 0.2/95.0 GiB (0.2%)
58
+
59
+ ============================================================
60
+ [3/8] Loading Transformer
61
+ ============================================================
62
+ Dequantizing FP8 transformer weights...
63
+ Dequantized 128 FP8 tensors
64
+ Converting ComfyUI β†’ diffusers keys...
65
+ Converted: 331 diffusers keys
66
+ Loading ControlNet weights...
67
+ ControlNet: 76 keys
68
+ Creating Flux2ControlTransformer2DModel (control_in_dim=3072)...
69
+ Skipped 2 control_img_in keys (dim mismatch):
70
+ control_img_in.bias [6144]
71
+ control_img_in.weight [6144, 260]
72
+ Missing: 2, Unexpected: 0
73
+ Initialized control_img_in.weight [6144, 3072] on cuda
74
+ Initialized control_img_in.bias [6144] on cuda
75
+ FP8 compression: 203 frozen Linears, 67.9 β†’ 37.9 GiB (saved 30.0 GiB)
76
+ Done (30.8s), VRAM: 37.87 GiB
77
+ Gradient checkpointing: enabled
78
+ Backbone FROZEN: all transformer params set requires_grad=False
79
+ Gradients will still propagate to HDCΒ²A via control_context autograd
80
+ [MEM @ after Transformer] RAM: 27.0/188.2 GiB (14.3%) | VRAM: 37.9/95.0 GiB (39.9%)
81
+
82
+ ============================================================
83
+ [4/8] Creating HDCΒ²A Adapter
84
+ ============================================================
85
+ HDCΒ²A: 52.4M params
86
+ Control: 0.0M params
87
+ Total trainable: 52.4M params
88
+
89
+ ============================================================
90
+ [4.5/8] Applying LoRA to ControlNet Control Blocks
91
+ ============================================================
92
+ LoRA rank=32, alpha=32.0, dropout=0
93
+ LoRA control_transformer_blocks.0.attn.to_q [6144β†’6144]
94
+ LoRA control_transformer_blocks.0.attn.to_k [6144β†’6144]
95
+ LoRA control_transformer_blocks.0.attn.to_v [6144β†’6144]
96
+ LoRA control_transformer_blocks.0.attn.add_q_proj [6144β†’6144]
97
+ LoRA control_transformer_blocks.0.attn.add_k_proj [6144β†’6144]
98
+ LoRA control_transformer_blocks.0.attn.add_v_proj [6144β†’6144]
99
+ LoRA control_transformer_blocks.0.attn.to_out.0 [6144β†’6144]
100
+ LoRA control_transformer_blocks.1.attn.to_q [6144β†’6144]
101
+ LoRA control_transformer_blocks.1.attn.to_k [6144β†’6144]
102
+ LoRA control_transformer_blocks.1.attn.to_v [6144β†’6144]
103
+ LoRA control_transformer_blocks.1.attn.add_q_proj [6144β†’6144]
104
+ LoRA control_transformer_blocks.1.attn.add_k_proj [6144β†’6144]
105
+ LoRA control_transformer_blocks.1.attn.add_v_proj [6144β†’6144]
106
+ LoRA control_transformer_blocks.1.attn.to_out.0 [6144β†’6144]
107
+ LoRA control_transformer_blocks.2.attn.to_q [6144β†’6144]
108
+ LoRA control_transformer_blocks.2.attn.to_k [6144β†’6144]
109
+ LoRA control_transformer_blocks.2.attn.to_v [6144β†’6144]
110
+ LoRA control_transformer_blocks.2.attn.add_q_proj [6144β†’6144]
111
+ LoRA control_transformer_blocks.2.attn.add_k_proj [6144β†’6144]
112
+ LoRA control_transformer_blocks.2.attn.add_v_proj [6144β†’6144]
113
+ LoRA control_transformer_blocks.2.attn.to_out.0 [6144β†’6144]
114
+ LoRA control_transformer_blocks.3.attn.to_q [6144β†’6144]
115
+ LoRA control_transformer_blocks.3.attn.to_k [6144β†’6144]
116
+ LoRA control_transformer_blocks.3.attn.to_v [6144β†’6144]
117
+ LoRA control_transformer_blocks.3.attn.to_out.0 [6144β†’6144]
118
+
119
+ LoRA modules injected: 25
120
+ LoRA trainable params: 9.83M
121
+
122
+ Parameter Statistics:
123
+ HDCΒ²A Adapter: total=52.4M trainable=52.4M
124
+ ControlNet (frozen): total=4143.4M LoRA trainable=9.83M
125
+ Flux2 backbone: total=0.0M trainable=0.0M βœ“
126
+ ──────────────────────────────────────────────────
127
+ Total trainable: HDCΒ²A 52.4M + LoRA 9.83M = 62.19M
128
+
129
+ ============================================================
130
+ [5/8] Building Optimizer
131
+ ============================================================
132
+ AdamW: adapter_lr=3.00e-04, backbone_lr=0.00e+00
133
+ param_group 'adapter': 112 tensors, lr=3.00e-04
134
+ Scheduler: 400 warmup steps β†’ cosine over ~25 steps
135
+ [6/8] Resume: skipped (no checkpoint specified)
136
+
137
+ ============================================================
138
+ [7/8] Forward Sanity Check
139
+ ============================================================
140
+ [test 1/4] Forward pass (eval mode)...
141
+ Output shape: torch.Size([1, 4096, 128])
142
+ Output stats: mean=0.0427, std=0.5156
143
+ VRAM peak (forward): 68.44 GiB
144
+ [test 2/4] Loss computation (train mode)...
145
+ Loss value: 1.437658
146
+ [test 3/4] Backward pass...
147
+ Backward completed. VRAM peak (backward): 49.17 GiB
148
+ [test 4/4] Gradient flow check...
149
+ HDCΒ²A: 112/112 params have non-zero grad
150
+ Control: 25/50 params have non-zero grad
151
+ Top grad norms (HDCΒ²A):
152
+ semantic_encoder.conv_stem.6.weight: 0.005524
153
+ depth_encoder.conv_stem.6.weight: 0.004883
154
+ W_s.weight: 0.004456
155
+ W_d.weight: 0.004181
156
+ fusion_blocks.0.ffn_sem.2.weight: 0.003784
157
+ Test result: PASSED
158
+ [MEM @ after test] RAM: 27.5/188.2 GiB (14.6%) | VRAM: 38.0/95.0 GiB (40.0%)
159
+
160
+ *** --test passed: all models loaded, forward test OK. Exiting. ***