[2025-10-24 11:27:55,703][main][INFO] - Will write tensorboard logs inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/tensorboard_logs [2025-10-24 11:27:55,722][main][INFO] - Runtime at /workspace/DC_SSDAE [2025-10-24 11:27:55,723][main][INFO] - Running inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM [2025-10-24 11:27:55,724][main][INFO] - Running args: ['main.py', 'run_name=train_enc_vq_f8c4_FM', 'dataset.im_size=128', 'dataset.aug_scale=2', 'training.epochs=20', 'dc_ssdae.encoder_train=true'] [2025-10-24 11:27:55,725][main][INFO] - Command: 'main.py' 'run_name=train_enc_vq_f8c4_FM' 'dataset.im_size=128' 'dataset.aug_scale=2' 'training.epochs=20' 'dc_ssdae.encoder_train=true' [2025-10-24 11:27:55,726][main][INFO] - Accelerator with 8 processes, running on cuda:0 [2025-10-24 11:27:55,729][main][INFO] - Hydra configuration: seed: 0 task: train runtime_path: ${hydra:runtime.cwd} ckpt_dir: ${runtime_path}/runs run_name: train_enc_vq_f8c4_FM cache_dir: ${ckpt_dir}/cache run_dir: ${ckpt_dir}/jobs/${run_name} checkpoint_path: ${run_dir}/checkpoints dataset: imagenet_root: imagenet_data im_size: 128 batch_size: 192 aug_scale: 2 limit: null distill_teacher: false dc_ssdae: compile: false checkpoint: null encoder: f8c4 encoder_checkpoint: null encoder_train: true decoder: S trainer_type: FM encoder_type: vq sampler: steps: 10 ema: decay: 0.999 start_iter: 50000 aux_losses: compile: ${dc_ssdae.compile} repa: i_extract: 4 n_layers: 2 lpips: true training: sdpa_kernel: 2 mixed_precision: bf16 grad_accumulate: 1 grad_clip: 0.1 epochs: 20 eval_freq: 1 save_on_best: FID log_freq: 100 lr: 0.0003 weight_decay: 0.001 losses: diffusion: 1 repa: 0.25 lpips: 0.5 kl: 1.0e-06 show_samples: 8 [2025-10-24 11:28:09,494][main][INFO] - Loaded ImageNet dataset: {'train': Dataset ImageNet Number of datapoints: 1279867 Root location: ../../../imagenet_data Split: train StandardTransform Transform: Compose( RandomResize(min_size=128, max_size=256, interpolation=InterpolationMode.LANCZOS, antialias=True) RandomCrop(size=(128, 128), pad_if_needed=False, fill=0, padding_mode=constant) RandomHorizontalFlip(p=0.5) ToImage() ToDtype(scale=True) Normalize(mean=[0.5], std=[0.5], inplace=False) ), 'test': Dataset ImageNet Number of datapoints: 49950 Root location: ../../../imagenet_data Split: validation StandardTransform Transform: Compose( Resize(size=[128], interpolation=InterpolationMode.BILINEAR, antialias=True) CenterCrop(size=(128, 128)) ToImage() ToDtype(scale=True) Normalize(mean=[0.5], std=[0.5], inplace=False) )} [2025-10-24 11:28:18,537][main][INFO] - ae parameters count: [2025-10-24 11:28:18,540][main][INFO] - Total: #46.0M (trainable: #46.0M) [2025-10-24 11:28:18,541][main][INFO] - - encoder: #32.6M (trainable: #32.6M) [2025-10-24 11:28:18,542][main][INFO] - - conv_in: #3.5K (trainable: #3.5K) [2025-10-24 11:28:18,543][main][INFO] - - down: #22.5M (trainable: #22.5M) [2025-10-24 11:28:18,543][main][INFO] - - mid: #10.0M (trainable: #10.0M) [2025-10-24 11:28:18,544][main][INFO] - - norm_out: #1.0K (trainable: #1.0K) [2025-10-24 11:28:18,545][main][INFO] - - act_out: #0 (trainable: #0) [2025-10-24 11:28:18,545][main][INFO] - - conv_out: #36.0K (trainable: #36.0K) [2025-10-24 11:28:18,546][main][INFO] - - out_proj: #72 (trainable: #72) [2025-10-24 11:28:18,547][main][INFO] - - decoder: #13.4M (trainable: #13.4M) [2025-10-24 11:28:18,548][main][INFO] - - conv_in_img: #896 (trainable: #896) [2025-10-24 11:28:18,548][main][INFO] - - conv_in_z: #1.2K (trainable: #1.2K) [2025-10-24 11:28:18,549][main][INFO] - - conv_in: #36.1K (trainable: #36.1K) [2025-10-24 11:28:18,550][main][INFO] - - batch_norm_z: #8 (trainable: #8) [2025-10-24 11:28:18,550][main][INFO] - - time_proj: #0 (trainable: #0) [2025-10-24 11:28:18,551][main][INFO] - - time_embedding: #80.5K (trainable: #80.5K) [2025-10-24 11:28:18,551][main][INFO] - - ada_ctx_proj: #38.4K (trainable: #38.4K) [2025-10-24 11:28:18,552][main][INFO] - - down_blocks: #3.0M (trainable: #3.0M) [2025-10-24 11:28:18,553][main][INFO] - - mid_block: #3.4M (trainable: #3.4M) [2025-10-24 11:28:18,554][main][INFO] - - up_blocks: #6.9M (trainable: #6.9M) [2025-10-24 11:28:18,554][main][INFO] - - conv_norm_out: #128 (trainable: #128) [2025-10-24 11:28:18,555][main][INFO] - - conv_out_act: #0 (trainable: #0) [2025-10-24 11:28:18,555][main][INFO] - - conv_out: #1.7K (trainable: #1.7K) [2025-10-24 11:28:18,557][main][INFO] - ae: EMAWrapper( (model): DistributedDataParallel( (module): DC_SSDAE( (encoder): VQEncoder( (conv_in): Conv2d(3, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (down): ModuleList( (0): Module( (block): ModuleList( (0-1): 2 x VQGResnetBlock( (norm1): GroupNorm(32, 128, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 128, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (attn): ModuleList() (downsample): VQGDownsample( (conv): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2)) ) ) (1): Module( (block): ModuleList( (0): VQGResnetBlock( (norm1): GroupNorm(32, 128, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 256, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nin_shortcut): Conv2d(128, 256, kernel_size=(1, 1), stride=(1, 1)) ) (1): VQGResnetBlock( (norm1): GroupNorm(32, 256, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 256, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (attn): ModuleList() (downsample): VQGDownsample( (conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2)) ) ) (2): Module( (block): ModuleList( (0): VQGResnetBlock( (norm1): GroupNorm(32, 256, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 512, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nin_shortcut): Conv2d(256, 512, kernel_size=(1, 1), stride=(1, 1)) ) (1): VQGResnetBlock( (norm1): GroupNorm(32, 512, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 512, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (attn): ModuleList() (downsample): VQGDownsample( (conv): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2)) ) ) (3): Module( (block): ModuleList( (0-1): 2 x VQGResnetBlock( (norm1): GroupNorm(32, 512, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 512, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (attn): ModuleList() ) ) (mid): Module( (block_1): VQGResnetBlock( (norm1): GroupNorm(32, 512, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 512, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (attn_1): VQGAttnBlock( (norm): GroupNorm(32, 512, eps=1e-06, affine=True) (q): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1)) (k): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1)) (v): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1)) (proj_out): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1)) ) (block_2): VQGResnetBlock( (norm1): GroupNorm(32, 512, eps=1e-06, affine=True) (act1): SwishActivation() (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (norm2): GroupNorm(32, 512, eps=1e-06, affine=True) (act2): SwishActivation() (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) (norm_out): GroupNorm(32, 512, eps=1e-06, affine=True) (act_out): SwishActivation() (conv_out): Conv2d(512, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (out_proj): Conv2d(8, 8, kernel_size=(1, 1), stride=(1, 1)) ) (decoder): UViTDecoder( (conv_in_img): Conv2d(3, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (conv_in_z): Conv2d(4, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (conv_in): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (batch_norm_z): BatchNorm2d(4, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (time_proj): Timesteps() (time_embedding): TimestepEmbedding( (linear_1): Linear(in_features=64, out_features=256, bias=True) (act): SiLU() (linear_2): Linear(in_features=256, out_features=256, bias=True) ) (ada_ctx_proj): Sequential( (0): Conv2d(4, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (1): SiLU() (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) (down_blocks): ModuleList( (0): DownBlock2D( (resnets): ModuleList( (0-1): 2 x ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=128, bias=True) (norm2): GroupNorm(32, 64, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() ) ) (downsamplers): ModuleList( (0): Downsample2D( (conv): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) ) ) ) (1): DownBlock2D( (resnets): ModuleList( (0): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 128, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(64, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=192, bias=True) (norm2): GroupNorm(32, 96, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(64, 96, kernel_size=(1, 1), stride=(1, 1)) ) (1): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=192, bias=True) (norm2): GroupNorm(32, 96, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() ) ) (downsamplers): ModuleList( (0): Downsample2D( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) ) ) ) (2): DownBlock2D( (resnets): ModuleList( (0): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 192, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(96, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(96, 160, kernel_size=(1, 1), stride=(1, 1)) ) (1): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() ) ) (downsamplers): ModuleList( (0): Downsample2D( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1)) ) ) ) (3): DownBlock2D( (resnets): ModuleList( (0-1): 2 x ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() ) ) ) ) (mid_block): UViTMiddleTransformer( (proj_in): Linear(in_features=160, out_features=160, bias=True) (transformer_blocks): ModuleList( (0-7): 8 x TransformerBlock( (norm1): AdaLayerNorm( (silu): SiLU() (linear): Linear(in_features=64, out_features=320, bias=True) (norm): LayerNorm((160,), eps=1e-05, elementwise_affine=False) ) (attn1): Attention( (to_q): Linear(in_features=160, out_features=160, bias=False) (to_k): Linear(in_features=160, out_features=160, bias=False) (to_v): Linear(in_features=160, out_features=160, bias=False) (out_proj): Linear(in_features=160, out_features=160, bias=True) (out_drop): Dropout(p=0.0, inplace=False) ) (norm2): LayerNorm((160,), eps=1e-05, elementwise_affine=True) (ff): FeedForward( (proj_in_act): GEGLU( (proj): Linear(in_features=160, out_features=1280, bias=True) ) (drop): Dropout(p=0.0, inplace=False) (proj_out): Linear(in_features=640, out_features=160, bias=True) ) (relative_position_bias): RelativePositionBias() ) ) (proj_out): Linear(in_features=160, out_features=160, bias=True) (norm): GroupNorm(32, 160, eps=1e-06, affine=True) ) (up_blocks): ModuleList( (0): UpBlock2D( (resnets): ModuleList( (0-2): 3 x ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 640, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(320, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1)) ) ) (upsamplers): ModuleList( (0): Upsample2D( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) (1): UpBlock2D( (resnets): ModuleList( (0-1): 2 x ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 640, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(320, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(320, 160, kernel_size=(1, 1), stride=(1, 1)) ) (2): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(256, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=320, bias=True) (norm2): GroupNorm(32, 160, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(256, 160, kernel_size=(1, 1), stride=(1, 1)) ) ) (upsamplers): ModuleList( (0): Upsample2D( (conv): Conv2d(160, 160, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) (2): UpBlock2D( (resnets): ModuleList( (0): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 512, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(256, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=192, bias=True) (norm2): GroupNorm(32, 96, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(256, 96, kernel_size=(1, 1), stride=(1, 1)) ) (1): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 384, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=192, bias=True) (norm2): GroupNorm(32, 96, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1)) ) (2): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(160, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=192, bias=True) (norm2): GroupNorm(32, 96, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(160, 96, kernel_size=(1, 1), stride=(1, 1)) ) ) (upsamplers): ModuleList( (0): Upsample2D( (conv): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) (3): UpBlock2D( (resnets): ModuleList( (0): ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 320, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(160, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=128, bias=True) (norm2): GroupNorm(32, 64, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(160, 64, kernel_size=(1, 1), stride=(1, 1)) ) (1-2): 2 x ResnetBlock2D( (norm1): AdaGroupNorm2D( (ctx_proj): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1)) ) (conv1): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (time_emb_proj): Linear(in_features=256, out_features=128, bias=True) (norm2): GroupNorm(32, 64, eps=1e-05, affine=True) (dropout): Dropout(p=0.0, inplace=False) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (nonlinearity): SiLU() (conv_shortcut): Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1)) ) ) ) ) (conv_norm_out): GroupNorm(32, 64, eps=1e-05, affine=True) (conv_out_act): SiLU() (conv_out): Conv2d(64, 3, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) ) ) ) (ema): EMA(ema_model=DC_SSDAE, decay=0.999, start_iter=50000) ) [2025-10-24 11:28:18,558][main][INFO] - aux_losses parameters count: [2025-10-24 11:28:18,559][main][INFO] - Total: #96.7M (trainable: #145.9K) [2025-10-24 11:28:18,560][main][INFO] - - repa_loss: #82.7M (trainable: #145.9K) [2025-10-24 11:28:18,561][main][INFO] - - lpips_loss: #14.0M (trainable: #0) [2025-10-24 11:28:18,561][main][INFO] - aux_losses: DistributedDataParallel( (module): SSDDLosses( (repa_loss): REPALoss( (features_extractor): Frozen(DinoEncoder/Dinov2Model) (repa_mlp): Sequential( (0): Linear(in_features=160, out_features=160, bias=True) (1): SiLU() (2): Linear(in_features=160, out_features=768, bias=True) ) (repa_loss): CosineSimilarity() ) (lpips_loss): Frozen(LPIPS) ) ) [2025-10-24 11:28:18,565][main][INFO] - Optimizer for autoencoder: RAdamScheduleFree ( Parameter Group 0 betas: (0.9, 0.999) eps: 1e-08 foreach: True k: 0 lr: 0.0003 lr_max: -1.0 r: 0.0 scheduled_lr: 0.0 silent_sgd_phase: True train_mode: False weight_decay: 0.001 weight_lr_power: 2.0 weight_sum: 0.0 Parameter Group 1 betas: (0.9, 0.999) eps: 1e-08 foreach: True k: 0 lr: 0.0003 lr_max: -1.0 r: 0.0 scheduled_lr: 0.0 silent_sgd_phase: True train_mode: False weight_decay: 0.0 weight_lr_power: 2.0 weight_sum: 0.0 ) [2025-10-24 11:28:18,570][main][INFO] - No training state found to resume from None [2025-10-24 11:28:18,571][main][INFO] - ====================== RUNNING TASK train [2025-10-24 11:28:18,572][main][INFO] - Starting training [2025-10-24 11:28:18,572][main][INFO] - Batch size of 192 (24 per GPU, 1 acumulation step(s) 8 process(es)) [2025-10-24 11:28:18,582][main][INFO] - --- [2025-10-24 11:28:18,583][main][INFO] - [T_total=00:00:22 | T_train=00:00:00] Start epoch 0 [2025-10-24 12:31:01,697][main][INFO] - [T_total=01:03:06 | T_train=01:02:43 | T_epoch=01:02:43] End of epoch 0 (6666 steps) train loss 0.379739 [2025-10-24 12:31:01,700][main][INFO] - [Epoch 0] All losses: [[diffusion=0.0877689 ; kl=3611.6 ; lpips=0.251927 ; repa=0.64958]] [2025-10-24 12:34:30,741][main][INFO] - [Epoch 1] Test metrics: [[MSE=14.16 | MAE=0.0884 | LPIPS=0.1332 | PSNR=18.49 | SSIM=0.6156 | dreamsim=0.2237 | FID=21.41]] [2025-10-24 12:34:30,743][main][INFO] - [Epoch 1] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1332 | max_PSNR=18.49 | max_SSIM=0.6156 | min_dreamsim=0.2237 | min_FID=21.41]] [2025-10-24 12:34:30,744][main][DEBUG] - Writing images to disk... [2025-10-24 12:34:31,976][main][DEBUG] - Image(s) saved on disk [2025-10-24 12:34:32,219][main][INFO] - End of epoch timers: [T_train=01:02:43 | T_epoch=01:02:43 | T_eval=00:03:30 | T_total=01:06:36] [2025-10-24 12:34:32,220][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 12:34:34,794][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 12:34:36,825][main][INFO] - --- [2025-10-24 12:34:36,826][main][INFO] - [T_total=01:06:41 | T_train=01:02:43] Start epoch 1 [2025-10-24 13:37:11,223][main][INFO] - [T_total=02:09:15 | T_train=02:05:17 | T_epoch=01:02:34] End of epoch 1 (13332 steps) train loss 0.295457 [2025-10-24 13:37:11,224][main][INFO] - [Epoch 1] All losses: [[diffusion=0.0670763 ; kl=3707.57 ; lpips=0.1699 ; repa=0.55889]] [2025-10-24 13:40:38,432][main][INFO] - [Epoch 2] Test metrics: [[MSE=18.03 | MAE=0.1014 | LPIPS=0.1322 | PSNR=17.44 | SSIM=0.6126 | dreamsim=0.2068 | FID=15.49]] [2025-10-24 13:40:38,434][main][INFO] - [Epoch 2] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1322 | max_PSNR=18.49 | max_SSIM=0.6156 | min_dreamsim=0.2068 | min_FID=15.49]] [2025-10-24 13:40:38,435][main][DEBUG] - Writing images to disk... [2025-10-24 13:40:39,512][main][DEBUG] - Image(s) saved on disk [2025-10-24 13:40:39,760][main][INFO] - End of epoch timers: [T_train=02:05:17 | T_epoch=01:02:34 | T_eval=00:06:58 | T_total=02:12:44] [2025-10-24 13:40:39,762][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 13:40:42,329][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 13:40:44,960][main][INFO] - --- [2025-10-24 13:40:44,961][main][INFO] - [T_total=02:12:49 | T_train=02:05:17] Start epoch 2 [2025-10-24 14:43:20,521][main][INFO] - [T_total=03:15:24 | T_train=03:07:53 | T_epoch=01:02:35] End of epoch 2 (19998 steps) train loss 0.278652 [2025-10-24 14:43:20,523][main][INFO] - [Epoch 2] All losses: [[diffusion=0.0649925 ; kl=3695.06 ; lpips=0.154526 ; repa=0.530805]] [2025-10-24 14:46:47,655][main][INFO] - [Epoch 3] Test metrics: [[MSE=20.47 | MAE=0.1086 | LPIPS=0.13 | PSNR=16.89 | SSIM=0.6123 | dreamsim=0.1954 | FID=12.6]] [2025-10-24 14:46:47,656][main][INFO] - [Epoch 3] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.13 | max_PSNR=18.49 | max_SSIM=0.6156 | min_dreamsim=0.1954 | min_FID=12.6]] [2025-10-24 14:46:47,657][main][DEBUG] - Writing images to disk... [2025-10-24 14:46:48,722][main][DEBUG] - Image(s) saved on disk [2025-10-24 14:46:48,970][main][INFO] - End of epoch timers: [T_train=03:07:53 | T_epoch=01:02:35 | T_eval=00:10:27 | T_total=03:18:53] [2025-10-24 14:46:48,971][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 14:46:51,733][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 14:46:54,361][main][INFO] - --- [2025-10-24 14:46:54,362][main][INFO] - [T_total=03:18:58 | T_train=03:07:53] Start epoch 3 [2025-10-24 15:49:29,065][main][INFO] - [T_total=04:21:33 | T_train=04:10:27 | T_epoch=01:02:34] End of epoch 3 (26664 steps) train loss 0.268908 [2025-10-24 15:49:29,066][main][INFO] - [Epoch 3] All losses: [[diffusion=0.0635387 ; kl=3692.76 ; lpips=0.146519 ; repa=0.513671]] [2025-10-24 15:52:56,207][main][INFO] - [Epoch 4] Test metrics: [[MSE=21.69 | MAE=0.112 | LPIPS=0.127 | PSNR=16.64 | SSIM=0.6152 | dreamsim=0.1867 | FID=10.75]] [2025-10-24 15:52:56,209][main][INFO] - [Epoch 4] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.127 | max_PSNR=18.49 | max_SSIM=0.6156 | min_dreamsim=0.1867 | min_FID=10.75]] [2025-10-24 15:52:56,210][main][DEBUG] - Writing images to disk... [2025-10-24 15:52:57,298][main][DEBUG] - Image(s) saved on disk [2025-10-24 15:52:57,498][main][INFO] - End of epoch timers: [T_train=04:10:27 | T_epoch=01:02:34 | T_eval=00:13:55 | T_total=04:25:01] [2025-10-24 15:52:57,500][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 15:52:59,857][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 15:53:02,578][main][INFO] - --- [2025-10-24 15:53:02,579][main][INFO] - [T_total=04:25:06 | T_train=04:10:27] Start epoch 4 [2025-10-24 16:55:38,098][main][INFO] - [T_total=05:27:42 | T_train=05:13:03 | T_epoch=01:02:35] End of epoch 4 (33330 steps) train loss 0.262561 [2025-10-24 16:55:38,102][main][INFO] - [Epoch 4] All losses: [[diffusion=0.06292 ; kl=3688.83 ; lpips=0.141097 ; repa=0.501614]] [2025-10-24 16:59:05,267][main][INFO] - [Epoch 5] Test metrics: [[MSE=21.7 | MAE=0.1119 | LPIPS=0.1238 | PSNR=16.64 | SSIM=0.6186 | dreamsim=0.1799 | FID=9.549]] [2025-10-24 16:59:05,270][main][INFO] - [Epoch 5] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1238 | max_PSNR=18.49 | max_SSIM=0.6186 | min_dreamsim=0.1799 | min_FID=9.549]] [2025-10-24 16:59:05,271][main][DEBUG] - Writing images to disk... [2025-10-24 16:59:06,351][main][DEBUG] - Image(s) saved on disk [2025-10-24 16:59:06,591][main][INFO] - End of epoch timers: [T_train=05:13:03 | T_epoch=01:02:35 | T_eval=00:17:23 | T_total=05:31:10] [2025-10-24 16:59:06,592][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 16:59:09,275][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 16:59:11,878][main][INFO] - --- [2025-10-24 16:59:11,879][main][INFO] - [T_total=05:31:16 | T_train=05:13:03] Start epoch 5 [2025-10-24 18:01:46,540][main][INFO] - [T_total=06:33:50 | T_train=06:15:37 | T_epoch=01:02:34] End of epoch 5 (39996 steps) train loss 0.257655 [2025-10-24 18:01:46,542][main][INFO] - [Epoch 5] All losses: [[diffusion=0.0621701 ; kl=3687.45 ; lpips=0.137338 ; repa=0.492512]] [2025-10-24 18:05:13,288][main][INFO] - [Epoch 6] Test metrics: [[MSE=21.93 | MAE=0.1125 | LPIPS=0.1213 | PSNR=16.59 | SSIM=0.6218 | dreamsim=0.1746 | FID=8.68]] [2025-10-24 18:05:13,290][main][INFO] - [Epoch 6] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1213 | max_PSNR=18.49 | max_SSIM=0.6218 | min_dreamsim=0.1746 | min_FID=8.68]] [2025-10-24 18:05:13,291][main][DEBUG] - Writing images to disk... [2025-10-24 18:05:14,398][main][DEBUG] - Image(s) saved on disk [2025-10-24 18:05:14,601][main][INFO] - End of epoch timers: [T_train=06:15:37 | T_epoch=01:02:34 | T_eval=00:20:51 | T_total=06:37:18] [2025-10-24 18:05:14,604][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 18:05:17,445][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 18:05:19,868][main][INFO] - --- [2025-10-24 18:05:19,869][main][INFO] - [T_total=06:37:24 | T_train=06:15:37] Start epoch 6 [2025-10-24 19:07:56,898][main][INFO] - [T_total=07:40:01 | T_train=07:18:14 | T_epoch=01:02:37] End of epoch 6 (46662 steps) train loss 0.253725 [2025-10-24 19:07:56,900][main][INFO] - [Epoch 6] All losses: [[diffusion=0.0615326 ; kl=3688.36 ; lpips=0.134359 ; repa=0.485297]] [2025-10-24 19:11:24,094][main][INFO] - [Epoch 7] Test metrics: [[MSE=22.28 | MAE=0.1135 | LPIPS=0.1196 | PSNR=16.52 | SSIM=0.624 | dreamsim=0.1707 | FID=8.082]] [2025-10-24 19:11:24,096][main][INFO] - [Epoch 7] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1196 | max_PSNR=18.49 | max_SSIM=0.624 | min_dreamsim=0.1707 | min_FID=8.082]] [2025-10-24 19:11:24,097][main][DEBUG] - Writing images to disk... [2025-10-24 19:11:25,201][main][DEBUG] - Image(s) saved on disk [2025-10-24 19:11:25,400][main][INFO] - End of epoch timers: [T_train=07:18:14 | T_epoch=01:02:37 | T_eval=00:24:19 | T_total=07:43:29] [2025-10-24 19:11:25,403][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 19:11:28,161][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 19:11:30,853][main][INFO] - --- [2025-10-24 19:11:30,853][main][INFO] - [T_total=07:43:35 | T_train=07:18:14] Start epoch 7 [2025-10-24 20:14:06,645][main][INFO] - [T_total=08:46:10 | T_train=08:20:50 | T_epoch=01:02:35] End of epoch 7 (53328 steps) train loss 0.250756 [2025-10-24 20:14:06,647][main][INFO] - [Epoch 7] All losses: [[diffusion=0.06134 ; kl=3691.13 ; lpips=0.131829 ; repa=0.479243]] [2025-10-24 20:17:33,486][main][INFO] - [Epoch 8] Test metrics: [[MSE=22.07 | MAE=0.1128 | LPIPS=0.1169 | PSNR=16.56 | SSIM=0.6267 | dreamsim=0.1663 | FID=7.509]] [2025-10-24 20:17:33,488][main][INFO] - [Epoch 8] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1169 | max_PSNR=18.49 | max_SSIM=0.6267 | min_dreamsim=0.1663 | min_FID=7.509]] [2025-10-24 20:17:33,489][main][DEBUG] - Writing images to disk... [2025-10-24 20:17:34,577][main][DEBUG] - Image(s) saved on disk [2025-10-24 20:17:34,803][main][INFO] - End of epoch timers: [T_train=08:20:50 | T_epoch=01:02:35 | T_eval=00:27:47 | T_total=08:49:39] [2025-10-24 20:17:34,804][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 20:17:37,556][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 20:17:40,188][main][INFO] - --- [2025-10-24 20:17:40,189][main][INFO] - [T_total=08:49:44 | T_train=08:20:50] Start epoch 8 [2025-10-24 21:20:17,007][main][INFO] - [T_total=09:52:21 | T_train=09:23:27 | T_epoch=01:02:36] End of epoch 8 (59994 steps) train loss 0.248044 [2025-10-24 21:20:17,008][main][INFO] - [Epoch 8] All losses: [[diffusion=0.0607502 ; kl=3693.54 ; lpips=0.130101 ; repa=0.474199]] [2025-10-24 21:23:44,408][main][INFO] - [Epoch 9] Test metrics: [[MSE=21.7 | MAE=0.1117 | LPIPS=0.1145 | PSNR=16.64 | SSIM=0.6294 | dreamsim=0.1627 | FID=7.034]] [2025-10-24 21:23:44,410][main][INFO] - [Epoch 9] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1145 | max_PSNR=18.49 | max_SSIM=0.6294 | min_dreamsim=0.1627 | min_FID=7.034]] [2025-10-24 21:23:44,411][main][DEBUG] - Writing images to disk... [2025-10-24 21:23:45,508][main][DEBUG] - Image(s) saved on disk [2025-10-24 21:23:45,708][main][INFO] - End of epoch timers: [T_train=09:23:27 | T_epoch=01:02:36 | T_eval=00:31:16 | T_total=09:55:50] [2025-10-24 21:23:45,709][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 21:23:48,374][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 21:23:51,020][main][INFO] - --- [2025-10-24 21:23:51,020][main][INFO] - [T_total=09:55:55 | T_train=09:23:27] Start epoch 9 [2025-10-24 22:26:26,704][main][INFO] - [T_total=10:58:31 | T_train=10:26:03 | T_epoch=01:02:35] End of epoch 9 (66660 steps) train loss 0.245806 [2025-10-24 22:26:26,706][main][INFO] - [Epoch 9] All losses: [[diffusion=0.0604599 ; kl=3695.63 ; lpips=0.128417 ; repa=0.469767]] [2025-10-24 22:29:54,018][main][INFO] - [Epoch 10] Test metrics: [[MSE=21.54 | MAE=0.1112 | LPIPS=0.1129 | PSNR=16.67 | SSIM=0.6308 | dreamsim=0.1598 | FID=6.658]] [2025-10-24 22:29:54,020][main][INFO] - [Epoch 10] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1129 | max_PSNR=18.49 | max_SSIM=0.6308 | min_dreamsim=0.1598 | min_FID=6.658]] [2025-10-24 22:29:54,021][main][DEBUG] - Writing images to disk... [2025-10-24 22:29:55,134][main][DEBUG] - Image(s) saved on disk [2025-10-24 22:29:55,337][main][INFO] - End of epoch timers: [T_train=10:26:03 | T_epoch=01:02:35 | T_eval=00:34:44 | T_total=11:01:59] [2025-10-24 22:29:55,338][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 22:29:58,129][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 22:30:00,883][main][INFO] - --- [2025-10-24 22:30:00,884][main][INFO] - [T_total=11:02:05 | T_train=10:26:03] Start epoch 10 [2025-10-24 23:32:38,551][main][INFO] - [T_total=12:04:42 | T_train=11:28:40 | T_epoch=01:02:37] End of epoch 10 (73326 steps) train loss 0.243893 [2025-10-24 23:32:38,553][main][INFO] - [Epoch 10] All losses: [[diffusion=0.0602009 ; kl=3698.58 ; lpips=0.126997 ; repa=0.465981]] [2025-10-24 23:36:06,224][main][INFO] - [Epoch 11] Test metrics: [[MSE=21.29 | MAE=0.1104 | LPIPS=0.1112 | PSNR=16.72 | SSIM=0.6335 | dreamsim=0.1568 | FID=6.331]] [2025-10-24 23:36:06,230][main][INFO] - [Epoch 11] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1112 | max_PSNR=18.49 | max_SSIM=0.6335 | min_dreamsim=0.1568 | min_FID=6.331]] [2025-10-24 23:36:06,231][main][DEBUG] - Writing images to disk... [2025-10-24 23:36:07,086][main][DEBUG] - Image(s) saved on disk [2025-10-24 23:36:07,296][main][INFO] - End of epoch timers: [T_train=11:28:40 | T_epoch=01:02:37 | T_eval=00:38:13 | T_total=12:08:11] [2025-10-24 23:36:07,298][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-24 23:36:10,288][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-24 23:36:12,899][main][INFO] - --- [2025-10-24 23:36:12,900][main][INFO] - [T_total=12:08:17 | T_train=11:28:40] Start epoch 11 [2025-10-25 00:38:51,954][main][INFO] - [T_total=13:10:56 | T_train=12:31:19 | T_epoch=01:02:39] End of epoch 11 (79992 steps) train loss 0.242062 [2025-10-25 00:38:51,955][main][INFO] - [Epoch 11] All losses: [[diffusion=0.0598045 ; kl=3702.03 ; lpips=0.125852 ; repa=0.46252]] [2025-10-25 00:42:19,563][main][INFO] - [Epoch 12] Test metrics: [[MSE=21.05 | MAE=0.1097 | LPIPS=0.1098 | PSNR=16.77 | SSIM=0.6344 | dreamsim=0.1546 | FID=6.035]] [2025-10-25 00:42:19,565][main][INFO] - [Epoch 12] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1098 | max_PSNR=18.49 | max_SSIM=0.6344 | min_dreamsim=0.1546 | min_FID=6.035]] [2025-10-25 00:42:19,566][main][DEBUG] - Writing images to disk... [2025-10-25 00:42:20,661][main][DEBUG] - Image(s) saved on disk [2025-10-25 00:42:20,894][main][INFO] - End of epoch timers: [T_train=12:31:19 | T_epoch=01:02:39 | T_eval=00:41:42 | T_total=13:14:25] [2025-10-25 00:42:20,895][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-25 00:42:23,582][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-25 00:42:26,171][main][INFO] - --- [2025-10-25 00:42:26,172][main][INFO] - [T_total=13:14:30 | T_train=12:31:19] Start epoch 12 [2025-10-25 01:45:03,014][main][INFO] - [T_total=14:17:07 | T_train=13:33:56 | T_epoch=01:02:36] End of epoch 12 (86658 steps) train loss 0.240598 [2025-10-25 01:45:03,015][main][INFO] - [Epoch 12] All losses: [[diffusion=0.0596262 ; kl=3704.82 ; lpips=0.124782 ; repa=0.459504]] [2025-10-25 01:48:30,676][main][INFO] - [Epoch 13] Test metrics: [[MSE=21.07 | MAE=0.1098 | LPIPS=0.1087 | PSNR=16.76 | SSIM=0.6359 | dreamsim=0.1527 | FID=5.793]] [2025-10-25 01:48:30,678][main][INFO] - [Epoch 13] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1087 | max_PSNR=18.49 | max_SSIM=0.6359 | min_dreamsim=0.1527 | min_FID=5.793]] [2025-10-25 01:48:30,679][main][DEBUG] - Writing images to disk... [2025-10-25 01:48:31,757][main][DEBUG] - Image(s) saved on disk [2025-10-25 01:48:31,960][main][INFO] - End of epoch timers: [T_train=13:33:56 | T_epoch=01:02:36 | T_eval=00:45:11 | T_total=14:20:36] [2025-10-25 01:48:31,961][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-25 01:48:34,485][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-25 01:48:37,247][main][INFO] - --- [2025-10-25 01:48:37,248][main][INFO] - [T_total=14:20:41 | T_train=13:33:56] Start epoch 13 [2025-10-25 02:51:12,948][main][INFO] - [T_total=15:23:17 | T_train=14:36:32 | T_epoch=01:02:35] End of epoch 13 (93324 steps) train loss 0.239412 [2025-10-25 02:51:12,949][main][INFO] - [Epoch 13] All losses: [[diffusion=0.0596692 ; kl=3706.94 ; lpips=0.123694 ; repa=0.456758]] [2025-10-25 02:54:40,598][main][INFO] - [Epoch 14] Test metrics: [[MSE=20.86 | MAE=0.1092 | LPIPS=0.1076 | PSNR=16.81 | SSIM=0.6381 | dreamsim=0.1507 | FID=5.573]] [2025-10-25 02:54:40,600][main][INFO] - [Epoch 14] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1076 | max_PSNR=18.49 | max_SSIM=0.6381 | min_dreamsim=0.1507 | min_FID=5.573]] [2025-10-25 02:54:40,605][main][DEBUG] - Writing images to disk... [2025-10-25 02:54:41,487][main][DEBUG] - Image(s) saved on disk [2025-10-25 02:54:41,728][main][INFO] - End of epoch timers: [T_train=14:36:32 | T_epoch=01:02:35 | T_eval=00:48:39 | T_total=15:26:46] [2025-10-25 02:54:41,731][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-25 02:54:45,053][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-25 02:54:47,715][main][INFO] - --- [2025-10-25 02:54:47,717][main][INFO] - [T_total=15:26:52 | T_train=14:36:32] Start epoch 14 [2025-10-25 03:57:24,404][main][INFO] - [T_total=16:29:28 | T_train=15:39:09 | T_epoch=01:02:36] End of epoch 14 (99990 steps) train loss 0.238048 [2025-10-25 03:57:24,406][main][INFO] - [Epoch 14] All losses: [[diffusion=0.0592936 ; kl=3709.87 ; lpips=0.122931 ; repa=0.454315]] [2025-10-25 04:00:51,619][main][INFO] - [Epoch 15] Test metrics: [[MSE=20.7 | MAE=0.1087 | LPIPS=0.1065 | PSNR=16.84 | SSIM=0.6397 | dreamsim=0.149 | FID=5.367]] [2025-10-25 04:00:51,621][main][INFO] - [Epoch 15] Best metrics: [[min_MSE=14.16 | min_MAE=0.0884 | min_LPIPS=0.1065 | max_PSNR=18.49 | max_SSIM=0.6397 | min_dreamsim=0.149 | min_FID=5.367]] [2025-10-25 04:00:51,622][main][DEBUG] - Writing images to disk... [2025-10-25 04:00:52,707][main][DEBUG] - Image(s) saved on disk [2025-10-25 04:00:52,907][main][INFO] - End of epoch timers: [T_train=15:39:09 | T_epoch=01:02:36 | T_eval=00:52:07 | T_total=16:32:57] [2025-10-25 04:00:52,908][main][INFO] - Storing model checkpoint inside /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/last [2025-10-25 04:00:55,517][main][INFO] - Best FID so far, storing a copy of the model checkpoint to /workspace/DC_SSDAE/runs/jobs/train_enc_vq_f8c4_FM/checkpoints/best [2025-10-25 04:00:57,799][main][INFO] - --- [2025-10-25 04:00:57,800][main][INFO] - [T_total=16:33:02 | T_train=15:39:09] Start epoch 15