File size: 6,968 Bytes
63845c1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
11/29/2025 14:21:48 - INFO - __main__ - Distributed environment: DistributedType.NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

11/29/2025 14:21:48 - INFO - __main__ - Starting script: train_controlnet.py
11/29/2025 14:21:50 - INFO - __main__ - Initializing controlnet weights from unet
11/29/2025 14:21:52 - INFO - __main__ - Training Arguments: 
 pretrained_model_name_or_path: stable-diffusion-v1-5/stable-diffusion-v1-5
 controlnet_model_name_or_path: None
 revision: None
 variant: None
 trust_remote_code: False
 dataset_name_or_path: /home/23132798r/workspace/tmp-smoke/data/controlnet
 dataset_config_name: None
 image_column: image
 conditioning_image_column: conditioning_image
 caption_column: text
 resolution: 512
 center_crop: False
 random_flip: False
 validation_ids: [1500, 5500, 8500]
 validation_steps: 10000
 output_dir: ./output-controlnet
 cache_dir: None
 logging_dir: logs
 tracker_project_name: controlnet-training
 checkpointing_steps: None
 checkpoints_total_limit: None
 resume_from_checkpoint: None
 report_to: tensorboard
 seed: 42
 train_batch_size: 16
 num_train_epochs: 3
 max_train_steps: None
 gradient_accumulation_steps: 1
 gradient_checkpointing: False
 dataloader_num_workers: 8
 noise_offset: 0.1
 prediction_type: None
 adam_beta1: 0.9
 adam_beta2: 0.999
 adam_weight_decay: 0.01
 adam_epsilon: 1e-08
 max_grad_norm: 1.0
 learning_rate: 1e-05
 scale_lr: False
 lr_scheduler: constant
 lr_warmup_steps: 0
 mixed_precision: fp16
 use_8bit_adam: False
 allow_tf32: False
 enable_xformers_memory_efficient_attention: False
 local_rank: -1 

11/29/2025 14:21:52 - INFO - __main__ - ControlNet Model Config: 
 FrozenDict({'in_channels': 4, 'conditioning_channels': 3, 'flip_sin_to_cos': True, 'freq_shift': 0, 'down_block_types': ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D'], 'mid_block_type': 'UNetMidBlock2DCrossAttn', 'only_cross_attention': False, 'block_out_channels': [320, 640, 1280, 1280], 'layers_per_block': 2, 'downsample_padding': 1, 'mid_block_scale_factor': 1, 'act_fn': 'silu', 'norm_num_groups': 32, 'norm_eps': 1e-05, 'cross_attention_dim': 768, 'transformer_layers_per_block': 1, 'encoder_hid_dim': None, 'encoder_hid_dim_type': None, 'attention_head_dim': 8, 'num_attention_heads': None, 'use_linear_projection': False, 'class_embed_type': None, 'addition_embed_type': None, 'addition_time_embed_dim': None, 'num_class_embeds': None, 'upcast_attention': False, 'resnet_time_scale_shift': 'default', 'projection_class_embeddings_input_dim': None, 'controlnet_conditioning_channel_order': 'rgb', 'conditioning_embedding_out_channels': (16, 32, 96, 256), 'global_pool_conditions': False, 'addition_embed_type_num_heads': 64, '_use_default_values': ['global_pool_conditions', 'addition_embed_type_num_heads']})
11/29/2025 14:21:54 - INFO - __main__ - ============ Training Begins ============
11/29/2025 14:21:54 - INFO - __main__ -   Num Epochs = 3
11/29/2025 14:21:54 - INFO - __main__ -   Instantaneous batch size per device = 16
11/29/2025 14:21:54 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 16
11/29/2025 14:21:54 - INFO - __main__ -   Gradient Accumulation steps = 1
11/29/2025 14:21:54 - INFO - __main__ -   Total optimization steps = 45000
11/29/2025 16:56:53 - INFO - __main__ - Running validation... 
11/29/2025 18:15:04 - INFO - accelerate.accelerator - Saving current state to output-controlnet/checkpoint-15000
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Optimizer state saved in output-controlnet/checkpoint-15000/optimizer.bin
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Scheduler state saved in output-controlnet/checkpoint-15000/scheduler.bin
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in output-controlnet/checkpoint-15000/sampler.bin
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in output-controlnet/checkpoint-15000/sampler_1.bin
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Gradient scaler state saved in output-controlnet/checkpoint-15000/scaler.pt
11/29/2025 18:15:11 - INFO - accelerate.checkpointing - Random states saved in output-controlnet/checkpoint-15000/random_states_0.pkl
11/29/2025 18:15:11 - INFO - __main__ - Saved state to output-controlnet/checkpoint-15000
11/29/2025 18:15:12 - INFO - __main__ - Epoch   0 | Global Step 15000
11/29/2025 19:32:33 - INFO - __main__ - Running validation... 
11/29/2025 22:08:22 - INFO - accelerate.accelerator - Saving current state to output-controlnet/checkpoint-30000
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Optimizer state saved in output-controlnet/checkpoint-30000/optimizer.bin
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Scheduler state saved in output-controlnet/checkpoint-30000/scheduler.bin
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in output-controlnet/checkpoint-30000/sampler.bin
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in output-controlnet/checkpoint-30000/sampler_1.bin
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Gradient scaler state saved in output-controlnet/checkpoint-30000/scaler.pt
11/29/2025 22:08:29 - INFO - accelerate.checkpointing - Random states saved in output-controlnet/checkpoint-30000/random_states_0.pkl
11/29/2025 22:08:29 - INFO - __main__ - Saved state to output-controlnet/checkpoint-30000
11/29/2025 22:08:29 - INFO - __main__ - Running validation... 
11/29/2025 22:08:32 - INFO - __main__ - Epoch   1 | Global Step 30000
11/30/2025 00:43:43 - INFO - __main__ - Running validation... 
11/30/2025 02:01:21 - INFO - accelerate.accelerator - Saving current state to output-controlnet/checkpoint-45000
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Optimizer state saved in output-controlnet/checkpoint-45000/optimizer.bin
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Scheduler state saved in output-controlnet/checkpoint-45000/scheduler.bin
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in output-controlnet/checkpoint-45000/sampler.bin
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Sampler state for dataloader 1 saved in output-controlnet/checkpoint-45000/sampler_1.bin
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Gradient scaler state saved in output-controlnet/checkpoint-45000/scaler.pt
11/30/2025 02:01:28 - INFO - accelerate.checkpointing - Random states saved in output-controlnet/checkpoint-45000/random_states_0.pkl
11/30/2025 02:01:28 - INFO - __main__ - Saved state to output-controlnet/checkpoint-45000
11/30/2025 02:01:28 - INFO - __main__ - Epoch   2 | Global Step 45000
11/30/2025 02:01:34 - INFO - __main__ - Running validation... 
11/30/2025 02:01:37 - INFO - __main__ - Finished!