Upload folder using huggingface_hub
Browse files
output/_smoke_test_1gpu/train.log
ADDED
|
@@ -0,0 +1,160 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Run dir : output/_smoke_test_1gpu
|
| 2 |
+
Log file: output/_smoke_test_1gpu/train.log
|
| 3 |
+
GPU: NVIDIA RTX PRO 6000 Blackwell Workstation Edition | VRAM: 95.0 GiB | PyTorch: 2.10.0+cu128
|
| 4 |
+
|
| 5 |
+
Final Configuration:
|
| 6 |
+
Paths:
|
| 7 |
+
transformer_path weights/flux2_dev_fp8mixed.safetensors
|
| 8 |
+
vae_path weights/flux2-vae.safetensors
|
| 9 |
+
controlnet_path weights/FLUX.2-dev-Fun-Controlnet-Union-2602.safetensors
|
| 10 |
+
dataset_dir dataset
|
| 11 |
+
color_map_path configs/color_map.json
|
| 12 |
+
output_dir output/_smoke_test_1gpu
|
| 13 |
+
text_encoder_path weights/mistral_3_small_flux2_fp8.safetensors
|
| 14 |
+
precomputed_embeddings output/text_embeddings_global.pt
|
| 15 |
+
Model:
|
| 16 |
+
image_size 1024
|
| 17 |
+
num_classes 6
|
| 18 |
+
control_in_dim 3072
|
| 19 |
+
fusion_dim 768
|
| 20 |
+
num_fusion_blocks 3
|
| 21 |
+
num_heads 12
|
| 22 |
+
num_fourier_bands 32
|
| 23 |
+
boundary_threshold 0.1
|
| 24 |
+
Training:
|
| 25 |
+
num_epochs 1
|
| 26 |
+
batch_size 4
|
| 27 |
+
learning_rate 0.0003
|
| 28 |
+
weight_decay 0.01
|
| 29 |
+
max_grad_norm 1.0
|
| 30 |
+
grad_accum_steps 4
|
| 31 |
+
guidance_scale 3.5
|
| 32 |
+
num_workers 0
|
| 33 |
+
Text Encoder:
|
| 34 |
+
text_seq_len 512
|
| 35 |
+
text_dim 15360
|
| 36 |
+
Logging:
|
| 37 |
+
log_interval 1
|
| 38 |
+
save_every_n_epochs 5
|
| 39 |
+
val_every_n_epochs 1
|
| 40 |
+
WandB:
|
| 41 |
+
wandb_entity
|
| 42 |
+
wandb_project _smoke_test_1gpu
|
| 43 |
+
Resume:
|
| 44 |
+
resume_from (not set)
|
| 45 |
+
[MEM @ pre-flight] RAM: 25.5/188.2 GiB (13.6%) | VRAM: 0.0/95.0 GiB (0.0%)
|
| 46 |
+
|
| 47 |
+
============================================================
|
| 48 |
+
[1/8] Text Embeddings
|
| 49 |
+
============================================================
|
| 50 |
+
Loading cached embedding from output/text_embeddings_global.pt
|
| 51 |
+
Loaded global text embedding from output/text_embeddings_global.pt (shape: torch.Size([512, 15360]))
|
| 52 |
+
|
| 53 |
+
============================================================
|
| 54 |
+
[2/8] Loading VAE
|
| 55 |
+
============================================================
|
| 56 |
+
Done (4.3s), VRAM: 0.16 GiB
|
| 57 |
+
[MEM @ after VAE] RAM: 25.9/188.2 GiB (13.8%) | VRAM: 0.2/95.0 GiB (0.2%)
|
| 58 |
+
|
| 59 |
+
============================================================
|
| 60 |
+
[3/8] Loading Transformer
|
| 61 |
+
============================================================
|
| 62 |
+
Dequantizing FP8 transformer weights...
|
| 63 |
+
Dequantized 128 FP8 tensors
|
| 64 |
+
Converting ComfyUI β diffusers keys...
|
| 65 |
+
Converted: 331 diffusers keys
|
| 66 |
+
Loading ControlNet weights...
|
| 67 |
+
ControlNet: 76 keys
|
| 68 |
+
Creating Flux2ControlTransformer2DModel (control_in_dim=3072)...
|
| 69 |
+
Skipped 2 control_img_in keys (dim mismatch):
|
| 70 |
+
control_img_in.bias [6144]
|
| 71 |
+
control_img_in.weight [6144, 260]
|
| 72 |
+
Missing: 2, Unexpected: 0
|
| 73 |
+
Initialized control_img_in.weight [6144, 3072] on cuda
|
| 74 |
+
Initialized control_img_in.bias [6144] on cuda
|
| 75 |
+
FP8 compression: 203 frozen Linears, 67.9 β 37.9 GiB (saved 30.0 GiB)
|
| 76 |
+
Done (30.8s), VRAM: 37.87 GiB
|
| 77 |
+
Gradient checkpointing: enabled
|
| 78 |
+
Backbone FROZEN: all transformer params set requires_grad=False
|
| 79 |
+
Gradients will still propagate to HDCΒ²A via control_context autograd
|
| 80 |
+
[MEM @ after Transformer] RAM: 27.0/188.2 GiB (14.3%) | VRAM: 37.9/95.0 GiB (39.9%)
|
| 81 |
+
|
| 82 |
+
============================================================
|
| 83 |
+
[4/8] Creating HDCΒ²A Adapter
|
| 84 |
+
============================================================
|
| 85 |
+
HDCΒ²A: 52.4M params
|
| 86 |
+
Control: 0.0M params
|
| 87 |
+
Total trainable: 52.4M params
|
| 88 |
+
|
| 89 |
+
============================================================
|
| 90 |
+
[4.5/8] Applying LoRA to ControlNet Control Blocks
|
| 91 |
+
============================================================
|
| 92 |
+
LoRA rank=32, alpha=32.0, dropout=0
|
| 93 |
+
LoRA control_transformer_blocks.0.attn.to_q [6144β6144]
|
| 94 |
+
LoRA control_transformer_blocks.0.attn.to_k [6144β6144]
|
| 95 |
+
LoRA control_transformer_blocks.0.attn.to_v [6144β6144]
|
| 96 |
+
LoRA control_transformer_blocks.0.attn.add_q_proj [6144β6144]
|
| 97 |
+
LoRA control_transformer_blocks.0.attn.add_k_proj [6144β6144]
|
| 98 |
+
LoRA control_transformer_blocks.0.attn.add_v_proj [6144β6144]
|
| 99 |
+
LoRA control_transformer_blocks.0.attn.to_out.0 [6144β6144]
|
| 100 |
+
LoRA control_transformer_blocks.1.attn.to_q [6144β6144]
|
| 101 |
+
LoRA control_transformer_blocks.1.attn.to_k [6144β6144]
|
| 102 |
+
LoRA control_transformer_blocks.1.attn.to_v [6144β6144]
|
| 103 |
+
LoRA control_transformer_blocks.1.attn.add_q_proj [6144β6144]
|
| 104 |
+
LoRA control_transformer_blocks.1.attn.add_k_proj [6144β6144]
|
| 105 |
+
LoRA control_transformer_blocks.1.attn.add_v_proj [6144β6144]
|
| 106 |
+
LoRA control_transformer_blocks.1.attn.to_out.0 [6144β6144]
|
| 107 |
+
LoRA control_transformer_blocks.2.attn.to_q [6144β6144]
|
| 108 |
+
LoRA control_transformer_blocks.2.attn.to_k [6144β6144]
|
| 109 |
+
LoRA control_transformer_blocks.2.attn.to_v [6144β6144]
|
| 110 |
+
LoRA control_transformer_blocks.2.attn.add_q_proj [6144β6144]
|
| 111 |
+
LoRA control_transformer_blocks.2.attn.add_k_proj [6144β6144]
|
| 112 |
+
LoRA control_transformer_blocks.2.attn.add_v_proj [6144β6144]
|
| 113 |
+
LoRA control_transformer_blocks.2.attn.to_out.0 [6144β6144]
|
| 114 |
+
LoRA control_transformer_blocks.3.attn.to_q [6144β6144]
|
| 115 |
+
LoRA control_transformer_blocks.3.attn.to_k [6144β6144]
|
| 116 |
+
LoRA control_transformer_blocks.3.attn.to_v [6144β6144]
|
| 117 |
+
LoRA control_transformer_blocks.3.attn.to_out.0 [6144β6144]
|
| 118 |
+
|
| 119 |
+
LoRA modules injected: 25
|
| 120 |
+
LoRA trainable params: 9.83M
|
| 121 |
+
|
| 122 |
+
Parameter Statistics:
|
| 123 |
+
HDCΒ²A Adapter: total=52.4M trainable=52.4M
|
| 124 |
+
ControlNet (frozen): total=4143.4M LoRA trainable=9.83M
|
| 125 |
+
Flux2 backbone: total=0.0M trainable=0.0M β
|
| 126 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 127 |
+
Total trainable: HDCΒ²A 52.4M + LoRA 9.83M = 62.19M
|
| 128 |
+
|
| 129 |
+
============================================================
|
| 130 |
+
[5/8] Building Optimizer
|
| 131 |
+
============================================================
|
| 132 |
+
AdamW: adapter_lr=3.00e-04, backbone_lr=0.00e+00
|
| 133 |
+
param_group 'adapter': 112 tensors, lr=3.00e-04
|
| 134 |
+
Scheduler: 400 warmup steps β cosine over ~25 steps
|
| 135 |
+
[6/8] Resume: skipped (no checkpoint specified)
|
| 136 |
+
|
| 137 |
+
============================================================
|
| 138 |
+
[7/8] Forward Sanity Check
|
| 139 |
+
============================================================
|
| 140 |
+
[test 1/4] Forward pass (eval mode)...
|
| 141 |
+
Output shape: torch.Size([1, 4096, 128])
|
| 142 |
+
Output stats: mean=0.0427, std=0.5156
|
| 143 |
+
VRAM peak (forward): 68.44 GiB
|
| 144 |
+
[test 2/4] Loss computation (train mode)...
|
| 145 |
+
Loss value: 1.437658
|
| 146 |
+
[test 3/4] Backward pass...
|
| 147 |
+
Backward completed. VRAM peak (backward): 49.17 GiB
|
| 148 |
+
[test 4/4] Gradient flow check...
|
| 149 |
+
HDCΒ²A: 112/112 params have non-zero grad
|
| 150 |
+
Control: 25/50 params have non-zero grad
|
| 151 |
+
Top grad norms (HDCΒ²A):
|
| 152 |
+
semantic_encoder.conv_stem.6.weight: 0.005524
|
| 153 |
+
depth_encoder.conv_stem.6.weight: 0.004883
|
| 154 |
+
W_s.weight: 0.004456
|
| 155 |
+
W_d.weight: 0.004181
|
| 156 |
+
fusion_blocks.0.ffn_sem.2.weight: 0.003784
|
| 157 |
+
Test result: PASSED
|
| 158 |
+
[MEM @ after test] RAM: 27.5/188.2 GiB (14.6%) | VRAM: 38.0/95.0 GiB (40.0%)
|
| 159 |
+
|
| 160 |
+
*** --test passed: all models loaded, forward test OK. Exiting. ***
|