File size: 66,279 Bytes
56879e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 |
2024-08-16,03:25:52 | INFO | Running with a single process. Device cuda:0.
2024-08-16,03:25:52 | INFO | Loaded Align-fMRI-Encoder-small model config.
2024-08-16,03:25:54 | INFO | Model:
2024-08-16,03:25:54 | INFO | CustomTextCLIP(
(visual): VisionTransformer(
(conv1): Conv1d(1, 768, kernel_size=(32,), stride=(32,), bias=False)
(patch_dropout): Identity()
(ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(transformer): Transformer(
(resblocks): ModuleList(
(0-11): 12 x ResidualAttentionBlock(
(ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True)
)
(ls_1): Identity()
(ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(mlp): Sequential(
(c_fc): Linear(in_features=768, out_features=3072, bias=True)
(gelu): GELU(approximate='none')
(c_proj): Linear(in_features=3072, out_features=768, bias=True)
)
(ls_2): Identity()
)
)
)
(ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(text): HFTextEncoder(
(transformer): RobertaModel(
(embeddings): RobertaEmbeddings(
(word_embeddings): Embedding(50265, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(token_type_embeddings): Embedding(1, 768)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): RobertaEncoder(
(layer): ModuleList(
(0-11): 12 x RobertaLayer(
(attention): RobertaAttention(
(self): RobertaSelfAttention(
(query): Linear(in_features=768, out_features=768, bias=True)
(key): Linear(in_features=768, out_features=768, bias=True)
(value): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(output): RobertaSelfOutput(
(dense): Linear(in_features=768, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
(intermediate): RobertaIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): RobertaOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
)
)
(pooler): MeanPooler()
(proj): Sequential(
(0): Linear(in_features=768, out_features=640, bias=False)
(1): GELU(approximate='none')
(2): Linear(in_features=640, out_features=512, bias=False)
)
)
)
2024-08-16,03:25:54 | INFO | Params:
2024-08-16,03:25:54 | INFO | accum_freq: 1
2024-08-16,03:25:54 | INFO | aug_cfg: {}
2024-08-16,03:25:54 | INFO | batch_size: 256
2024-08-16,03:25:54 | INFO | beta1: 0.9
2024-08-16,03:25:54 | INFO | beta2: 0.999
2024-08-16,03:25:54 | INFO | checkpoint_path: ./logs/2024_08_16-03_25_52-model_Align-fMRI-Encoder-small-lr_0.0005-b_256-j_4-p_amp/checkpoints
2024-08-16,03:25:54 | INFO | coca_caption_loss_weight: 2.0
2024-08-16,03:25:54 | INFO | coca_contrastive_loss_weight: 1.0
2024-08-16,03:25:54 | INFO | copy_codebase: False
2024-08-16,03:25:54 | INFO | csv_caption_key: title
2024-08-16,03:25:54 | INFO | csv_img_key: filepath
2024-08-16,03:25:54 | INFO | csv_separator: ,
2024-08-16,03:25:54 | INFO | dataset_resampled: False
2024-08-16,03:25:54 | INFO | dataset_type: auto
2024-08-16,03:25:54 | INFO | ddp_static_graph: False
2024-08-16,03:25:54 | INFO | debug: False
2024-08-16,03:25:54 | INFO | delete_previous_checkpoint: False
2024-08-16,03:25:54 | INFO | device: cuda:0
2024-08-16,03:25:54 | INFO | dist_backend: nccl
2024-08-16,03:25:54 | INFO | dist_url: env://
2024-08-16,03:25:54 | INFO | distill: False
2024-08-16,03:25:54 | INFO | distill_model: None
2024-08-16,03:25:54 | INFO | distill_pretrained: None
2024-08-16,03:25:54 | INFO | distributed: False
2024-08-16,03:25:54 | INFO | epochs: 100
2024-08-16,03:25:54 | INFO | epochs_cooldown: None
2024-08-16,03:25:54 | INFO | eps: 1e-08
2024-08-16,03:25:54 | INFO | force_custom_text: False
2024-08-16,03:25:54 | INFO | force_image_size: None
2024-08-16,03:25:54 | INFO | force_patch_dropout: None
2024-08-16,03:25:54 | INFO | force_quick_gelu: False
2024-08-16,03:25:54 | INFO | gather_with_grad: False
2024-08-16,03:25:54 | INFO | grad_checkpointing: False
2024-08-16,03:25:54 | INFO | grad_clip_norm: None
2024-08-16,03:25:54 | INFO | horovod: False
2024-08-16,03:25:54 | INFO | image_interpolation: None
2024-08-16,03:25:54 | INFO | image_mean: None
2024-08-16,03:25:54 | INFO | image_resize_mode: None
2024-08-16,03:25:54 | INFO | image_std: None
2024-08-16,03:25:54 | INFO | imagenet_v2: None
2024-08-16,03:25:54 | INFO | imagenet_val: None
2024-08-16,03:25:54 | INFO | local_loss: False
2024-08-16,03:25:54 | INFO | local_rank: 0
2024-08-16,03:25:54 | INFO | lock_image: False
2024-08-16,03:25:54 | INFO | lock_image_freeze_bn_stats: False
2024-08-16,03:25:54 | INFO | lock_image_unlocked_groups: 0
2024-08-16,03:25:54 | INFO | lock_text: True
2024-08-16,03:25:54 | INFO | lock_text_freeze_layer_norm: False
2024-08-16,03:25:54 | INFO | lock_text_unlocked_layers: 0
2024-08-16,03:25:54 | INFO | log_every_n_steps: 100
2024-08-16,03:25:54 | INFO | log_level: 20
2024-08-16,03:25:54 | INFO | log_local: False
2024-08-16,03:25:54 | INFO | log_path: ./logs/2024_08_16-03_25_52-model_Align-fMRI-Encoder-small-lr_0.0005-b_256-j_4-p_amp/out.log
2024-08-16,03:25:54 | INFO | logs: ./logs/
2024-08-16,03:25:54 | INFO | lr: 0.0005
2024-08-16,03:25:54 | INFO | lr_cooldown_end: 0.0
2024-08-16,03:25:54 | INFO | lr_cooldown_power: 1.0
2024-08-16,03:25:54 | INFO | lr_scheduler: cosine
2024-08-16,03:25:54 | INFO | model: Align-fMRI-Encoder-small
2024-08-16,03:25:54 | INFO | name: 2024_08_16-03_25_52-model_Align-fMRI-Encoder-small-lr_0.0005-b_256-j_4-p_amp
2024-08-16,03:25:54 | INFO | no_set_device_rank: False
2024-08-16,03:25:54 | INFO | precision: amp
2024-08-16,03:25:54 | INFO | pretrained:
2024-08-16,03:25:54 | INFO | pretrained_image: False
2024-08-16,03:25:54 | INFO | rank: 0
2024-08-16,03:25:54 | INFO | remote_sync: None
2024-08-16,03:25:54 | INFO | remote_sync_frequency: 300
2024-08-16,03:25:54 | INFO | remote_sync_protocol: s3
2024-08-16,03:25:54 | INFO | report_to:
2024-08-16,03:25:54 | INFO | resume: None
2024-08-16,03:25:54 | INFO | save_frequency: 1
2024-08-16,03:25:54 | INFO | save_most_recent: False
2024-08-16,03:25:54 | INFO | seed: 0
2024-08-16,03:25:54 | INFO | siglip: False
2024-08-16,03:25:54 | INFO | skip_scheduler: False
2024-08-16,03:25:54 | INFO | tensorboard: False
2024-08-16,03:25:54 | INFO | tensorboard_path:
2024-08-16,03:25:54 | INFO | torchcompile: False
2024-08-16,03:25:54 | INFO | torchscript: False
2024-08-16,03:25:54 | INFO | trace: False
2024-08-16,03:25:54 | INFO | train_data: /root/autodl-tmp/.autodl/Projects/fMRI2TextAligner/notebooks/train.csv
2024-08-16,03:25:54 | INFO | train_data_upsampling_factors: None
2024-08-16,03:25:54 | INFO | train_num_samples: None
2024-08-16,03:25:54 | INFO | use_bn_sync: False
2024-08-16,03:25:54 | INFO | use_bnb_linear: None
2024-08-16,03:25:54 | INFO | val_data: /root/autodl-tmp/.autodl/Projects/fMRI2TextAligner/notebooks/val.csv
2024-08-16,03:25:54 | INFO | val_frequency: 1
2024-08-16,03:25:54 | INFO | val_num_samples: None
2024-08-16,03:25:54 | INFO | wandb: False
2024-08-16,03:25:54 | INFO | wandb_notes:
2024-08-16,03:25:54 | INFO | wandb_project_name: open-clip
2024-08-16,03:25:54 | INFO | warmup: 10000
2024-08-16,03:25:54 | INFO | wd: 0.2
2024-08-16,03:25:54 | INFO | workers: 4
2024-08-16,03:25:54 | INFO | world_size: 1
2024-08-16,03:25:54 | INFO | zeroshot_frequency: 2
2024-08-16,03:25:58 | INFO | Start epoch 0
2024-08-16,03:26:01 | INFO | Train Epoch: 0 [ 256/27000 (1%)] Data (t): 1.639 Batch (t): 3.538, 72.3607/s, 72.3607/s/gpu LR: 0.000000 Logit Scale: 14.286 Contrastive_loss: 5.5485 (5.5485) Loss: 5.5485 (5.5485)
2024-08-16,03:28:06 | INFO | Train Epoch: 0 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.520/s, 204.520/s/gpu LR: 0.000005 Logit Scale: 14.285 Contrastive_loss: 5.5459 (5.5472) Loss: 5.5459 (5.5472)
2024-08-16,03:28:11 | INFO | Train Epoch: 0 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.359/s, 204.359/s/gpu LR: 0.000005 Logit Scale: 14.285 Contrastive_loss: 5.5479 (5.5474) Loss: 5.5479 (5.5474)
2024-08-16,03:28:13 | INFO | Eval Epoch: 1 [256 / 3000] Clip Loss: 5.542412
2024-08-16,03:28:18 | INFO | Eval Epoch: 1 image_to_text_mean_rank: 1451.4443 image_to_text_median_rank: 1410.0000 image_to_text_R@1: 0.0003 image_to_text_R@5: 0.0023 image_to_text_R@10: 0.0060 text_to_image_mean_rank: 1439.4327 text_to_image_median_rank: 1409.0000 text_to_image_R@1: 0.0007 text_to_image_R@5: 0.0020 text_to_image_R@10: 0.0043 clip_val_loss: 5.5223 epoch: 1.0000 num_samples: 3000.0000
2024-08-16,03:28:19 | INFO | Start epoch 1
2024-08-16,03:28:22 | INFO | Train Epoch: 1 [ 256/27000 (1%)] Data (t): 1.447 Batch (t): 2.695, 95.0061/s, 95.0061/s/gpu LR: 0.000005 Logit Scale: 14.285 Contrastive_loss: 5.5397 (5.5397) Loss: 5.5397 (5.5397)
2024-08-16,03:30:27 | INFO | Train Epoch: 1 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.426/s, 204.426/s/gpu LR: 0.000010 Logit Scale: 14.290 Contrastive_loss: 5.4991 (5.5194) Loss: 5.4991 (5.5194)
2024-08-16,03:30:32 | INFO | Train Epoch: 1 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.419/s, 204.419/s/gpu LR: 0.000010 Logit Scale: 14.290 Contrastive_loss: 5.4470 (5.4953) Loss: 5.4470 (5.4953)
2024-08-16,03:30:34 | INFO | Eval Epoch: 2 [256 / 3000] Clip Loss: 5.452873
2024-08-16,03:30:38 | INFO | Eval Epoch: 2 image_to_text_mean_rank: 1193.8437 image_to_text_median_rank: 1062.0000 image_to_text_R@1: 0.0013 image_to_text_R@5: 0.0037 image_to_text_R@10: 0.0067 text_to_image_mean_rank: 1196.7497 text_to_image_median_rank: 1078.0000 text_to_image_R@1: 0.0013 text_to_image_R@5: 0.0060 text_to_image_R@10: 0.0090 clip_val_loss: 5.4537 epoch: 2.0000 num_samples: 3000.0000
2024-08-16,03:30:40 | INFO | Start epoch 2
2024-08-16,03:30:42 | INFO | Train Epoch: 2 [ 256/27000 (1%)] Data (t): 1.420 Batch (t): 2.666, 96.0263/s, 96.0263/s/gpu LR: 0.000011 Logit Scale: 14.290 Contrastive_loss: 5.4566 (5.4566) Loss: 5.4566 (5.4566)
2024-08-16,03:32:48 | INFO | Train Epoch: 2 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.495/s, 204.495/s/gpu LR: 0.000016 Logit Scale: 14.324 Contrastive_loss: 5.0180 (5.2373) Loss: 5.0180 (5.2373)
2024-08-16,03:32:53 | INFO | Train Epoch: 2 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.358/s, 204.358/s/gpu LR: 0.000016 Logit Scale: 14.325 Contrastive_loss: 5.1190 (5.1979) Loss: 5.1190 (5.1979)
2024-08-16,03:32:54 | INFO | Eval Epoch: 3 [256 / 3000] Clip Loss: 5.063042
2024-08-16,03:32:59 | INFO | Eval Epoch: 3 image_to_text_mean_rank: 782.7933 image_to_text_median_rank: 592.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0083 image_to_text_R@10: 0.0157 text_to_image_mean_rank: 727.7393 text_to_image_median_rank: 536.0000 text_to_image_R@1: 0.0030 text_to_image_R@5: 0.0117 text_to_image_R@10: 0.0230 clip_val_loss: 5.0752 epoch: 3.0000 num_samples: 3000.0000
2024-08-16,03:33:00 | INFO | Start epoch 3
2024-08-16,03:33:03 | INFO | Train Epoch: 3 [ 256/27000 (1%)] Data (t): 1.597 Batch (t): 2.842, 90.0810/s, 90.0810/s/gpu LR: 0.000016 Logit Scale: 14.326 Contrastive_loss: 5.1011 (5.1011) Loss: 5.1011 (5.1011)
2024-08-16,03:35:09 | INFO | Train Epoch: 3 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.253, 204.414/s, 204.414/s/gpu LR: 0.000021 Logit Scale: 14.339 Contrastive_loss: 4.9292 (5.0151) Loss: 4.9292 (5.0151)
2024-08-16,03:35:14 | INFO | Train Epoch: 3 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.467/s, 204.467/s/gpu LR: 0.000021 Logit Scale: 14.339 Contrastive_loss: 4.8818 (4.9707) Loss: 4.8818 (4.9707)
2024-08-16,03:35:15 | INFO | Eval Epoch: 4 [256 / 3000] Clip Loss: 4.868340
2024-08-16,03:35:20 | INFO | Eval Epoch: 4 image_to_text_mean_rank: 683.8327 image_to_text_median_rank: 457.0000 image_to_text_R@1: 0.0043 image_to_text_R@5: 0.0123 image_to_text_R@10: 0.0230 text_to_image_mean_rank: 612.9693 text_to_image_median_rank: 408.0000 text_to_image_R@1: 0.0033 text_to_image_R@5: 0.0143 text_to_image_R@10: 0.0283 clip_val_loss: 4.9190 epoch: 4.0000 num_samples: 3000.0000
2024-08-16,03:35:21 | INFO | Start epoch 4
2024-08-16,03:35:24 | INFO | Train Epoch: 4 [ 256/27000 (1%)] Data (t): 1.489 Batch (t): 2.735, 93.5960/s, 93.5960/s/gpu LR: 0.000021 Logit Scale: 14.339 Contrastive_loss: 4.6774 (4.6774) Loss: 4.6774 (4.6774)
2024-08-16,03:37:29 | INFO | Train Epoch: 4 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.427/s, 204.427/s/gpu LR: 0.000026 Logit Scale: 14.352 Contrastive_loss: 4.6595 (4.6684) Loss: 4.6595 (4.6684)
2024-08-16,03:37:34 | INFO | Train Epoch: 4 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.444/s, 204.444/s/gpu LR: 0.000026 Logit Scale: 14.352 Contrastive_loss: 4.7835 (4.7068) Loss: 4.7835 (4.7068)
2024-08-16,03:37:36 | INFO | Eval Epoch: 5 [256 / 3000] Clip Loss: 4.786659
2024-08-16,03:37:41 | INFO | Eval Epoch: 5 image_to_text_mean_rank: 620.0710 image_to_text_median_rank: 405.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0150 image_to_text_R@10: 0.0250 text_to_image_mean_rank: 564.3297 text_to_image_median_rank: 358.0000 text_to_image_R@1: 0.0047 text_to_image_R@5: 0.0173 text_to_image_R@10: 0.0367 clip_val_loss: 4.8233 epoch: 5.0000 num_samples: 3000.0000
2024-08-16,03:37:42 | INFO | Start epoch 5
2024-08-16,03:37:45 | INFO | Train Epoch: 5 [ 256/27000 (1%)] Data (t): 1.495 Batch (t): 2.740, 93.4321/s, 93.4321/s/gpu LR: 0.000026 Logit Scale: 14.352 Contrastive_loss: 4.5845 (4.5845) Loss: 4.5845 (4.5845)
2024-08-16,03:39:50 | INFO | Train Epoch: 5 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.440/s, 204.440/s/gpu LR: 0.000031 Logit Scale: 14.392 Contrastive_loss: 4.5224 (4.5534) Loss: 4.5224 (4.5534)
2024-08-16,03:39:55 | INFO | Train Epoch: 5 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.301/s, 204.301/s/gpu LR: 0.000031 Logit Scale: 14.393 Contrastive_loss: 4.4956 (4.5342) Loss: 4.4956 (4.5342)
2024-08-16,03:39:56 | INFO | Eval Epoch: 6 [256 / 3000] Clip Loss: 4.666817
2024-08-16,03:40:01 | INFO | Eval Epoch: 6 image_to_text_mean_rank: 560.2250 image_to_text_median_rank: 363.0000 image_to_text_R@1: 0.0037 image_to_text_R@5: 0.0190 image_to_text_R@10: 0.0360 text_to_image_mean_rank: 524.7307 text_to_image_median_rank: 326.0000 text_to_image_R@1: 0.0037 text_to_image_R@5: 0.0223 text_to_image_R@10: 0.0403 clip_val_loss: 4.7456 epoch: 6.0000 num_samples: 3000.0000
2024-08-16,03:40:03 | INFO | Start epoch 6
2024-08-16,03:40:05 | INFO | Train Epoch: 6 [ 256/27000 (1%)] Data (t): 1.443 Batch (t): 2.691, 95.1276/s, 95.1276/s/gpu LR: 0.000032 Logit Scale: 14.394 Contrastive_loss: 4.1144 (4.1144) Loss: 4.1144 (4.1144)
2024-08-16,03:42:10 | INFO | Train Epoch: 6 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.240/s, 204.240/s/gpu LR: 0.000037 Logit Scale: 14.484 Contrastive_loss: 4.3783 (4.2463) Loss: 4.3783 (4.2463)
2024-08-16,03:42:15 | INFO | Train Epoch: 6 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.514/s, 204.514/s/gpu LR: 0.000037 Logit Scale: 14.486 Contrastive_loss: 4.4021 (4.2983) Loss: 4.4021 (4.2983)
2024-08-16,03:42:17 | INFO | Eval Epoch: 7 [256 / 3000] Clip Loss: 4.655993
2024-08-16,03:42:22 | INFO | Eval Epoch: 7 image_to_text_mean_rank: 563.7200 image_to_text_median_rank: 352.0000 image_to_text_R@1: 0.0037 image_to_text_R@5: 0.0177 image_to_text_R@10: 0.0317 text_to_image_mean_rank: 515.4990 text_to_image_median_rank: 306.0000 text_to_image_R@1: 0.0067 text_to_image_R@5: 0.0250 text_to_image_R@10: 0.0453 clip_val_loss: 4.7377 epoch: 7.0000 num_samples: 3000.0000
2024-08-16,03:42:23 | INFO | Start epoch 7
2024-08-16,03:42:26 | INFO | Train Epoch: 7 [ 256/27000 (1%)] Data (t): 1.452 Batch (t): 2.698, 94.8848/s, 94.8848/s/gpu LR: 0.000037 Logit Scale: 14.487 Contrastive_loss: 3.9120 (3.9120) Loss: 3.9120 (3.9120)
2024-08-16,03:44:31 | INFO | Train Epoch: 7 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.415/s, 204.415/s/gpu LR: 0.000042 Logit Scale: 14.608 Contrastive_loss: 3.9964 (3.9542) Loss: 3.9964 (3.9542)
2024-08-16,03:44:36 | INFO | Train Epoch: 7 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.414/s, 204.414/s/gpu LR: 0.000042 Logit Scale: 14.612 Contrastive_loss: 3.9585 (3.9556) Loss: 3.9585 (3.9556)
2024-08-16,03:44:38 | INFO | Eval Epoch: 8 [256 / 3000] Clip Loss: 4.634455
2024-08-16,03:44:42 | INFO | Eval Epoch: 8 image_to_text_mean_rank: 551.6537 image_to_text_median_rank: 340.0000 image_to_text_R@1: 0.0050 image_to_text_R@5: 0.0223 image_to_text_R@10: 0.0390 text_to_image_mean_rank: 516.1967 text_to_image_median_rank: 309.0000 text_to_image_R@1: 0.0057 text_to_image_R@5: 0.0260 text_to_image_R@10: 0.0487 clip_val_loss: 4.7750 epoch: 8.0000 num_samples: 3000.0000
2024-08-16,03:44:44 | INFO | Start epoch 8
2024-08-16,03:44:47 | INFO | Train Epoch: 8 [ 256/27000 (1%)] Data (t): 1.500 Batch (t): 2.746, 93.2370/s, 93.2370/s/gpu LR: 0.000042 Logit Scale: 14.613 Contrastive_loss: 3.6235 (3.6235) Loss: 3.6235 (3.6235)
2024-08-16,03:46:52 | INFO | Train Epoch: 8 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.478/s, 204.478/s/gpu LR: 0.000047 Logit Scale: 14.756 Contrastive_loss: 3.9351 (3.7793) Loss: 3.9351 (3.7793)
2024-08-16,03:46:57 | INFO | Train Epoch: 8 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.524/s, 204.524/s/gpu LR: 0.000047 Logit Scale: 14.760 Contrastive_loss: 3.8363 (3.7983) Loss: 3.8363 (3.7983)
2024-08-16,03:46:58 | INFO | Eval Epoch: 9 [256 / 3000] Clip Loss: 4.764020
2024-08-16,03:47:03 | INFO | Eval Epoch: 9 image_to_text_mean_rank: 587.4340 image_to_text_median_rank: 353.0000 image_to_text_R@1: 0.0040 image_to_text_R@5: 0.0187 image_to_text_R@10: 0.0360 text_to_image_mean_rank: 546.2733 text_to_image_median_rank: 318.0000 text_to_image_R@1: 0.0053 text_to_image_R@5: 0.0230 text_to_image_R@10: 0.0460 clip_val_loss: 4.8689 epoch: 9.0000 num_samples: 3000.0000
2024-08-16,03:47:04 | INFO | Start epoch 9
2024-08-16,03:47:07 | INFO | Train Epoch: 9 [ 256/27000 (1%)] Data (t): 1.432 Batch (t): 2.678, 95.6108/s, 95.6108/s/gpu LR: 0.000047 Logit Scale: 14.761 Contrastive_loss: 3.1292 (3.1292) Loss: 3.1292 (3.1292)
2024-08-16,03:49:12 | INFO | Train Epoch: 9 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.417/s, 204.417/s/gpu LR: 0.000052 Logit Scale: 14.924 Contrastive_loss: 3.3670 (3.2481) Loss: 3.3670 (3.2481)
2024-08-16,03:49:17 | INFO | Train Epoch: 9 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.451/s, 204.451/s/gpu LR: 0.000053 Logit Scale: 14.929 Contrastive_loss: 3.4293 (3.3085) Loss: 3.4293 (3.3085)
2024-08-16,03:49:19 | INFO | Eval Epoch: 10 [256 / 3000] Clip Loss: 4.830852
2024-08-16,03:49:24 | INFO | Eval Epoch: 10 image_to_text_mean_rank: 604.2760 image_to_text_median_rank: 366.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0187 image_to_text_R@10: 0.0343 text_to_image_mean_rank: 571.8533 text_to_image_median_rank: 339.0000 text_to_image_R@1: 0.0050 text_to_image_R@5: 0.0233 text_to_image_R@10: 0.0443 clip_val_loss: 4.9887 epoch: 10.0000 num_samples: 3000.0000
2024-08-16,03:49:25 | INFO | Start epoch 10
2024-08-16,03:49:28 | INFO | Train Epoch: 10 [ 256/27000 (1%)] Data (t): 1.458 Batch (t): 2.703, 94.7107/s, 94.7107/s/gpu LR: 0.000053 Logit Scale: 14.930 Contrastive_loss: 2.6423 (2.6423) Loss: 2.6423 (2.6423)
2024-08-16,03:51:33 | INFO | Train Epoch: 10 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.377/s, 204.377/s/gpu LR: 0.000058 Logit Scale: 15.109 Contrastive_loss: 2.7973 (2.7198) Loss: 2.7973 (2.7198)
2024-08-16,03:51:38 | INFO | Train Epoch: 10 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.584/s, 204.584/s/gpu LR: 0.000058 Logit Scale: 15.115 Contrastive_loss: 3.1476 (2.8624) Loss: 3.1476 (2.8624)
2024-08-16,03:51:40 | INFO | Eval Epoch: 11 [256 / 3000] Clip Loss: 4.916827
2024-08-16,03:51:44 | INFO | Eval Epoch: 11 image_to_text_mean_rank: 645.3330 image_to_text_median_rank: 392.0000 image_to_text_R@1: 0.0053 image_to_text_R@5: 0.0183 image_to_text_R@10: 0.0343 text_to_image_mean_rank: 606.6477 text_to_image_median_rank: 378.0000 text_to_image_R@1: 0.0057 text_to_image_R@5: 0.0220 text_to_image_R@10: 0.0393 clip_val_loss: 5.1492 epoch: 11.0000 num_samples: 3000.0000
2024-08-16,03:51:46 | INFO | Start epoch 11
2024-08-16,03:51:48 | INFO | Train Epoch: 11 [ 256/27000 (1%)] Data (t): 1.468 Batch (t): 2.714, 94.3351/s, 94.3351/s/gpu LR: 0.000058 Logit Scale: 15.116 Contrastive_loss: 2.1677 (2.1677) Loss: 2.1677 (2.1677)
2024-08-16,03:53:54 | INFO | Train Epoch: 11 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.580/s, 204.580/s/gpu LR: 0.000063 Logit Scale: 15.304 Contrastive_loss: 2.1280 (2.1479) Loss: 2.1280 (2.1479)
2024-08-16,03:53:59 | INFO | Train Epoch: 11 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.443/s, 204.443/s/gpu LR: 0.000063 Logit Scale: 15.311 Contrastive_loss: 2.1967 (2.1642) Loss: 2.1967 (2.1642)
2024-08-16,03:54:00 | INFO | Eval Epoch: 12 [256 / 3000] Clip Loss: 5.227501
2024-08-16,03:54:05 | INFO | Eval Epoch: 12 image_to_text_mean_rank: 701.0917 image_to_text_median_rank: 446.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0133 image_to_text_R@10: 0.0277 text_to_image_mean_rank: 672.1147 text_to_image_median_rank: 418.0000 text_to_image_R@1: 0.0060 text_to_image_R@5: 0.0197 text_to_image_R@10: 0.0397 clip_val_loss: 5.4125 epoch: 12.0000 num_samples: 3000.0000
2024-08-16,03:54:06 | INFO | Start epoch 12
2024-08-16,03:54:09 | INFO | Train Epoch: 12 [ 256/27000 (1%)] Data (t): 1.458 Batch (t): 2.704, 94.6843/s, 94.6843/s/gpu LR: 0.000063 Logit Scale: 15.313 Contrastive_loss: 1.4394 (1.4394) Loss: 1.4394 (1.4394)
2024-08-16,03:56:14 | INFO | Train Epoch: 12 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.444/s, 204.444/s/gpu LR: 0.000068 Logit Scale: 15.499 Contrastive_loss: 1.4908 (1.4651) Loss: 1.4908 (1.4651)
2024-08-16,03:56:19 | INFO | Train Epoch: 12 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.382/s, 204.382/s/gpu LR: 0.000068 Logit Scale: 15.506 Contrastive_loss: 1.5923 (1.5075) Loss: 1.5923 (1.5075)
2024-08-16,03:56:21 | INFO | Eval Epoch: 13 [256 / 3000] Clip Loss: 5.440553
2024-08-16,03:56:26 | INFO | Eval Epoch: 13 image_to_text_mean_rank: 746.4853 image_to_text_median_rank: 468.0000 image_to_text_R@1: 0.0023 image_to_text_R@5: 0.0130 image_to_text_R@10: 0.0290 text_to_image_mean_rank: 718.4413 text_to_image_median_rank: 455.0000 text_to_image_R@1: 0.0043 text_to_image_R@5: 0.0197 text_to_image_R@10: 0.0323 clip_val_loss: 5.6023 epoch: 13.0000 num_samples: 3000.0000
2024-08-16,03:56:27 | INFO | Start epoch 13
2024-08-16,03:56:30 | INFO | Train Epoch: 13 [ 256/27000 (1%)] Data (t): 1.452 Batch (t): 2.698, 94.8883/s, 94.8883/s/gpu LR: 0.000068 Logit Scale: 15.507 Contrastive_loss: 0.92882 (0.92882) Loss: 0.92882 (0.92882)
2024-08-16,03:58:35 | INFO | Train Epoch: 13 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.386/s, 204.386/s/gpu LR: 0.000073 Logit Scale: 15.674 Contrastive_loss: 0.86548 (0.89715) Loss: 0.86548 (0.89715)
2024-08-16,03:58:40 | INFO | Train Epoch: 13 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.431/s, 204.431/s/gpu LR: 0.000073 Logit Scale: 15.681 Contrastive_loss: 0.90084 (0.89838) Loss: 0.90084 (0.89838)
2024-08-16,03:58:42 | INFO | Eval Epoch: 14 [256 / 3000] Clip Loss: 5.552146
2024-08-16,03:58:46 | INFO | Eval Epoch: 14 image_to_text_mean_rank: 811.2757 image_to_text_median_rank: 553.0000 image_to_text_R@1: 0.0027 image_to_text_R@5: 0.0127 image_to_text_R@10: 0.0207 text_to_image_mean_rank: 788.4850 text_to_image_median_rank: 516.0000 text_to_image_R@1: 0.0047 text_to_image_R@5: 0.0197 text_to_image_R@10: 0.0333 clip_val_loss: 5.8483 epoch: 14.0000 num_samples: 3000.0000
2024-08-16,03:58:48 | INFO | Start epoch 14
2024-08-16,03:58:50 | INFO | Train Epoch: 14 [ 256/27000 (1%)] Data (t): 1.521 Batch (t): 2.767, 92.5317/s, 92.5317/s/gpu LR: 0.000074 Logit Scale: 15.682 Contrastive_loss: 0.52879 (0.52879) Loss: 0.52879 (0.52879)
2024-08-16,04:00:56 | INFO | Train Epoch: 14 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.407/s, 204.407/s/gpu LR: 0.000079 Logit Scale: 15.819 Contrastive_loss: 0.53437 (0.53158) Loss: 0.53437 (0.53158)
2024-08-16,04:01:01 | INFO | Train Epoch: 14 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.393/s, 204.393/s/gpu LR: 0.000079 Logit Scale: 15.825 Contrastive_loss: 0.52500 (0.52939) Loss: 0.52500 (0.52939)
2024-08-16,04:01:02 | INFO | Eval Epoch: 15 [256 / 3000] Clip Loss: 5.805412
2024-08-16,04:01:07 | INFO | Eval Epoch: 15 image_to_text_mean_rank: 865.4823 image_to_text_median_rank: 598.0000 image_to_text_R@1: 0.0050 image_to_text_R@5: 0.0143 image_to_text_R@10: 0.0270 text_to_image_mean_rank: 846.2273 text_to_image_median_rank: 575.0000 text_to_image_R@1: 0.0033 text_to_image_R@5: 0.0163 text_to_image_R@10: 0.0297 clip_val_loss: 6.0065 epoch: 15.0000 num_samples: 3000.0000
2024-08-16,04:01:08 | INFO | Start epoch 15
2024-08-16,04:01:11 | INFO | Train Epoch: 15 [ 256/27000 (1%)] Data (t): 1.513 Batch (t): 2.760, 92.7525/s, 92.7525/s/gpu LR: 0.000079 Logit Scale: 15.826 Contrastive_loss: 0.34025 (0.34025) Loss: 0.34025 (0.34025)
2024-08-16,04:03:16 | INFO | Train Epoch: 15 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.496/s, 204.496/s/gpu LR: 0.000084 Logit Scale: 15.934 Contrastive_loss: 0.29246 (0.31636) Loss: 0.29246 (0.31636)
2024-08-16,04:03:21 | INFO | Train Epoch: 15 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.414/s, 204.414/s/gpu LR: 0.000084 Logit Scale: 15.938 Contrastive_loss: 0.26719 (0.29997) Loss: 0.26719 (0.29997)
2024-08-16,04:03:23 | INFO | Eval Epoch: 16 [256 / 3000] Clip Loss: 6.014488
2024-08-16,04:03:28 | INFO | Eval Epoch: 16 image_to_text_mean_rank: 887.4880 image_to_text_median_rank: 627.0000 image_to_text_R@1: 0.0030 image_to_text_R@5: 0.0140 image_to_text_R@10: 0.0260 text_to_image_mean_rank: 880.3730 text_to_image_median_rank: 619.0000 text_to_image_R@1: 0.0037 text_to_image_R@5: 0.0183 text_to_image_R@10: 0.0320 clip_val_loss: 6.1342 epoch: 16.0000 num_samples: 3000.0000
2024-08-16,04:03:29 | INFO | Start epoch 16
2024-08-16,04:03:32 | INFO | Train Epoch: 16 [ 256/27000 (1%)] Data (t): 1.600 Batch (t): 2.846, 89.9529/s, 89.9529/s/gpu LR: 0.000084 Logit Scale: 15.939 Contrastive_loss: 0.20845 (0.20845) Loss: 0.20845 (0.20845)
2024-08-16,04:05:37 | INFO | Train Epoch: 16 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.440/s, 204.440/s/gpu LR: 0.000089 Logit Scale: 16.026 Contrastive_loss: 0.21264 (0.21055) Loss: 0.21264 (0.21055)
2024-08-16,04:05:42 | INFO | Train Epoch: 16 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.576/s, 204.576/s/gpu LR: 0.000089 Logit Scale: 16.030 Contrastive_loss: 0.18559 (0.20223) Loss: 0.18559 (0.20223)
2024-08-16,04:05:44 | INFO | Eval Epoch: 17 [256 / 3000] Clip Loss: 6.084400
2024-08-16,04:05:49 | INFO | Eval Epoch: 17 image_to_text_mean_rank: 935.7397 image_to_text_median_rank: 700.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0137 image_to_text_R@10: 0.0237 text_to_image_mean_rank: 929.2380 text_to_image_median_rank: 691.0000 text_to_image_R@1: 0.0040 text_to_image_R@5: 0.0160 text_to_image_R@10: 0.0290 clip_val_loss: 6.2851 epoch: 17.0000 num_samples: 3000.0000
2024-08-16,04:05:50 | INFO | Start epoch 17
2024-08-16,04:05:53 | INFO | Train Epoch: 17 [ 256/27000 (1%)] Data (t): 1.469 Batch (t): 2.715, 94.3037/s, 94.3037/s/gpu LR: 0.000089 Logit Scale: 16.031 Contrastive_loss: 0.13997 (0.13997) Loss: 0.13997 (0.13997)
2024-08-16,04:07:58 | INFO | Train Epoch: 17 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.280/s, 204.280/s/gpu LR: 0.000094 Logit Scale: 16.107 Contrastive_loss: 0.14680 (0.14339) Loss: 0.14680 (0.14339)
2024-08-16,04:08:03 | INFO | Train Epoch: 17 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.427/s, 204.427/s/gpu LR: 0.000095 Logit Scale: 16.110 Contrastive_loss: 0.14784 (0.14487) Loss: 0.14784 (0.14487)
2024-08-16,04:08:05 | INFO | Eval Epoch: 18 [256 / 3000] Clip Loss: 6.179710
2024-08-16,04:08:09 | INFO | Eval Epoch: 18 image_to_text_mean_rank: 959.2040 image_to_text_median_rank: 740.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0143 image_to_text_R@10: 0.0240 text_to_image_mean_rank: 952.1377 text_to_image_median_rank: 737.0000 text_to_image_R@1: 0.0017 text_to_image_R@5: 0.0143 text_to_image_R@10: 0.0293 clip_val_loss: 6.3885 epoch: 18.0000 num_samples: 3000.0000
2024-08-16,04:08:11 | INFO | Start epoch 18
2024-08-16,04:08:13 | INFO | Train Epoch: 18 [ 256/27000 (1%)] Data (t): 1.515 Batch (t): 2.760, 92.7391/s, 92.7391/s/gpu LR: 0.000095 Logit Scale: 16.111 Contrastive_loss: 0.10571 (0.10571) Loss: 0.10571 (0.10571)
2024-08-16,04:10:19 | INFO | Train Epoch: 18 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.597/s, 204.597/s/gpu LR: 0.000100 Logit Scale: 16.180 Contrastive_loss: 0.11256 (0.10913) Loss: 0.11256 (0.10913)
2024-08-16,04:10:24 | INFO | Train Epoch: 18 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.521/s, 204.521/s/gpu LR: 0.000100 Logit Scale: 16.183 Contrastive_loss: 0.10602 (0.10809) Loss: 0.10602 (0.10809)
2024-08-16,04:10:25 | INFO | Eval Epoch: 19 [256 / 3000] Clip Loss: 6.244460
2024-08-16,04:10:30 | INFO | Eval Epoch: 19 image_to_text_mean_rank: 984.8337 image_to_text_median_rank: 751.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0107 image_to_text_R@10: 0.0230 text_to_image_mean_rank: 974.3037 text_to_image_median_rank: 724.0000 text_to_image_R@1: 0.0030 text_to_image_R@5: 0.0130 text_to_image_R@10: 0.0267 clip_val_loss: 6.4495 epoch: 19.0000 num_samples: 3000.0000
2024-08-16,04:10:31 | INFO | Start epoch 19
2024-08-16,04:10:34 | INFO | Train Epoch: 19 [ 256/27000 (1%)] Data (t): 1.482 Batch (t): 2.730, 93.7704/s, 93.7704/s/gpu LR: 0.000100 Logit Scale: 16.184 Contrastive_loss: 0.091815 (0.091815) Loss: 0.091815 (0.091815)
2024-08-16,04:12:39 | INFO | Train Epoch: 19 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.414/s, 204.414/s/gpu LR: 0.000105 Logit Scale: 16.250 Contrastive_loss: 0.10051 (0.096161) Loss: 0.10051 (0.096161)
2024-08-16,04:12:44 | INFO | Train Epoch: 19 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.567/s, 204.567/s/gpu LR: 0.000105 Logit Scale: 16.253 Contrastive_loss: 0.083744 (0.092022) Loss: 0.083744 (0.092022)
2024-08-16,04:12:46 | INFO | Eval Epoch: 20 [256 / 3000] Clip Loss: 6.373804
2024-08-16,04:12:51 | INFO | Eval Epoch: 20 image_to_text_mean_rank: 998.9283 image_to_text_median_rank: 775.0000 image_to_text_R@1: 0.0023 image_to_text_R@5: 0.0120 image_to_text_R@10: 0.0223 text_to_image_mean_rank: 995.4970 text_to_image_median_rank: 768.0000 text_to_image_R@1: 0.0023 text_to_image_R@5: 0.0143 text_to_image_R@10: 0.0240 clip_val_loss: 6.5515 epoch: 20.0000 num_samples: 3000.0000
2024-08-16,04:12:52 | INFO | Start epoch 20
2024-08-16,04:12:55 | INFO | Train Epoch: 20 [ 256/27000 (1%)] Data (t): 1.470 Batch (t): 2.715, 94.2863/s, 94.2863/s/gpu LR: 0.000105 Logit Scale: 16.254 Contrastive_loss: 0.082728 (0.082728) Loss: 0.082728 (0.082728)
2024-08-16,04:15:00 | INFO | Train Epoch: 20 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.462/s, 204.462/s/gpu LR: 0.000110 Logit Scale: 16.322 Contrastive_loss: 0.092753 (0.087741) Loss: 0.092753 (0.087741)
2024-08-16,04:15:05 | INFO | Train Epoch: 20 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.462/s, 204.462/s/gpu LR: 0.000110 Logit Scale: 16.325 Contrastive_loss: 0.10030 (0.091928) Loss: 0.10030 (0.091928)
2024-08-16,04:15:07 | INFO | Eval Epoch: 21 [256 / 3000] Clip Loss: 6.443340
2024-08-16,04:15:11 | INFO | Eval Epoch: 21 image_to_text_mean_rank: 1025.5073 image_to_text_median_rank: 807.0000 image_to_text_R@1: 0.0023 image_to_text_R@5: 0.0103 image_to_text_R@10: 0.0207 text_to_image_mean_rank: 1019.2447 text_to_image_median_rank: 826.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0127 text_to_image_R@10: 0.0250 clip_val_loss: 6.6143 epoch: 21.0000 num_samples: 3000.0000
2024-08-16,04:15:13 | INFO | Start epoch 21
2024-08-16,04:15:15 | INFO | Train Epoch: 21 [ 256/27000 (1%)] Data (t): 1.476 Batch (t): 2.722, 94.0350/s, 94.0350/s/gpu LR: 0.000110 Logit Scale: 16.325 Contrastive_loss: 0.067669 (0.067669) Loss: 0.067669 (0.067669)
2024-08-16,04:17:21 | INFO | Train Epoch: 21 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.508/s, 204.508/s/gpu LR: 0.000115 Logit Scale: 16.396 Contrastive_loss: 0.081561 (0.074615) Loss: 0.081561 (0.074615)
2024-08-16,04:17:26 | INFO | Train Epoch: 21 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.474/s, 204.474/s/gpu LR: 0.000116 Logit Scale: 16.399 Contrastive_loss: 0.089710 (0.079647) Loss: 0.089710 (0.079647)
2024-08-16,04:17:27 | INFO | Eval Epoch: 22 [256 / 3000] Clip Loss: 6.464468
2024-08-16,04:17:32 | INFO | Eval Epoch: 22 image_to_text_mean_rank: 1013.4487 image_to_text_median_rank: 788.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0103 image_to_text_R@10: 0.0203 text_to_image_mean_rank: 1005.7700 text_to_image_median_rank: 764.0000 text_to_image_R@1: 0.0017 text_to_image_R@5: 0.0140 text_to_image_R@10: 0.0250 clip_val_loss: 6.6335 epoch: 22.0000 num_samples: 3000.0000
2024-08-16,04:17:33 | INFO | Start epoch 22
2024-08-16,04:17:36 | INFO | Train Epoch: 22 [ 256/27000 (1%)] Data (t): 1.477 Batch (t): 2.724, 93.9871/s, 93.9871/s/gpu LR: 0.000116 Logit Scale: 16.400 Contrastive_loss: 0.075971 (0.075971) Loss: 0.075971 (0.075971)
2024-08-16,04:19:41 | INFO | Train Epoch: 22 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.527/s, 204.527/s/gpu LR: 0.000121 Logit Scale: 16.478 Contrastive_loss: 0.094408 (0.085189) Loss: 0.094408 (0.085189)
2024-08-16,04:19:46 | INFO | Train Epoch: 22 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.417/s, 204.417/s/gpu LR: 0.000121 Logit Scale: 16.482 Contrastive_loss: 0.098228 (0.089536) Loss: 0.098228 (0.089536)
2024-08-16,04:19:48 | INFO | Eval Epoch: 23 [256 / 3000] Clip Loss: 6.571661
2024-08-16,04:19:53 | INFO | Eval Epoch: 23 image_to_text_mean_rank: 996.3893 image_to_text_median_rank: 789.0000 image_to_text_R@1: 0.0037 image_to_text_R@5: 0.0113 image_to_text_R@10: 0.0207 text_to_image_mean_rank: 992.0480 text_to_image_median_rank: 786.0000 text_to_image_R@1: 0.0030 text_to_image_R@5: 0.0147 text_to_image_R@10: 0.0270 clip_val_loss: 6.6228 epoch: 23.0000 num_samples: 3000.0000
2024-08-16,04:19:54 | INFO | Start epoch 23
2024-08-16,04:19:57 | INFO | Train Epoch: 23 [ 256/27000 (1%)] Data (t): 1.547 Batch (t): 2.793, 91.6730/s, 91.6730/s/gpu LR: 0.000121 Logit Scale: 16.483 Contrastive_loss: 0.078356 (0.078356) Loss: 0.078356 (0.078356)
2024-08-16,04:22:02 | INFO | Train Epoch: 23 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.462/s, 204.462/s/gpu LR: 0.000126 Logit Scale: 16.572 Contrastive_loss: 0.090556 (0.084456) Loss: 0.090556 (0.084456)
2024-08-16,04:22:07 | INFO | Train Epoch: 23 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.327/s, 204.327/s/gpu LR: 0.000126 Logit Scale: 16.576 Contrastive_loss: 0.086313 (0.085075) Loss: 0.086313 (0.085075)
2024-08-16,04:22:09 | INFO | Eval Epoch: 24 [256 / 3000] Clip Loss: 6.592369
2024-08-16,04:22:14 | INFO | Eval Epoch: 24 image_to_text_mean_rank: 1030.9110 image_to_text_median_rank: 812.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0127 image_to_text_R@10: 0.0210 text_to_image_mean_rank: 1022.6720 text_to_image_median_rank: 805.0000 text_to_image_R@1: 0.0030 text_to_image_R@5: 0.0163 text_to_image_R@10: 0.0290 clip_val_loss: 6.7181 epoch: 24.0000 num_samples: 3000.0000
2024-08-16,04:22:15 | INFO | Start epoch 24
2024-08-16,04:22:18 | INFO | Train Epoch: 24 [ 256/27000 (1%)] Data (t): 1.778 Batch (t): 3.023, 84.6777/s, 84.6777/s/gpu LR: 0.000126 Logit Scale: 16.577 Contrastive_loss: 0.079637 (0.079637) Loss: 0.079637 (0.079637)
2024-08-16,04:24:23 | INFO | Train Epoch: 24 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.535/s, 204.535/s/gpu LR: 0.000131 Logit Scale: 16.685 Contrastive_loss: 0.11149 (0.095563) Loss: 0.11149 (0.095563)
2024-08-16,04:24:28 | INFO | Train Epoch: 24 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.539/s, 204.539/s/gpu LR: 0.000131 Logit Scale: 16.690 Contrastive_loss: 0.10690 (0.099341) Loss: 0.10690 (0.099341)
2024-08-16,04:24:30 | INFO | Eval Epoch: 25 [256 / 3000] Clip Loss: 6.576798
2024-08-16,04:24:35 | INFO | Eval Epoch: 25 image_to_text_mean_rank: 1011.1870 image_to_text_median_rank: 780.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0097 image_to_text_R@10: 0.0180 text_to_image_mean_rank: 994.8833 text_to_image_median_rank: 768.0000 text_to_image_R@1: 0.0023 text_to_image_R@5: 0.0147 text_to_image_R@10: 0.0243 clip_val_loss: 6.6493 epoch: 25.0000 num_samples: 3000.0000
2024-08-16,04:24:36 | INFO | Start epoch 25
2024-08-16,04:24:39 | INFO | Train Epoch: 25 [ 256/27000 (1%)] Data (t): 1.483 Batch (t): 2.726, 93.8938/s, 93.8938/s/gpu LR: 0.000131 Logit Scale: 16.691 Contrastive_loss: 0.094896 (0.094896) Loss: 0.094896 (0.094896)
2024-08-16,04:26:44 | INFO | Train Epoch: 25 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.456/s, 204.456/s/gpu LR: 0.000136 Logit Scale: 16.831 Contrastive_loss: 0.24732 (0.17111) Loss: 0.24732 (0.17111)
2024-08-16,04:26:49 | INFO | Train Epoch: 25 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.546/s, 204.546/s/gpu LR: 0.000137 Logit Scale: 16.840 Contrastive_loss: 0.42561 (0.25594) Loss: 0.42561 (0.25594)
2024-08-16,04:26:50 | INFO | Eval Epoch: 26 [256 / 3000] Clip Loss: 6.370559
2024-08-16,04:26:55 | INFO | Eval Epoch: 26 image_to_text_mean_rank: 1114.0630 image_to_text_median_rank: 930.0000 image_to_text_R@1: 0.0010 image_to_text_R@5: 0.0053 image_to_text_R@10: 0.0103 text_to_image_mean_rank: 1001.3303 text_to_image_median_rank: 768.0000 text_to_image_R@1: 0.0023 text_to_image_R@5: 0.0113 text_to_image_R@10: 0.0217 clip_val_loss: 6.6953 epoch: 26.0000 num_samples: 3000.0000
2024-08-16,04:26:57 | INFO | Start epoch 26
2024-08-16,04:26:59 | INFO | Train Epoch: 26 [ 256/27000 (1%)] Data (t): 1.514 Batch (t): 2.756, 92.8774/s, 92.8774/s/gpu LR: 0.000137 Logit Scale: 16.842 Contrastive_loss: 0.68641 (0.68641) Loss: 0.68641 (0.68641)
2024-08-16,04:29:04 | INFO | Train Epoch: 26 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.423/s, 204.423/s/gpu LR: 0.000142 Logit Scale: 16.877 Contrastive_loss: 2.6772 (1.6818) Loss: 2.6772 (1.6818)
2024-08-16,04:29:09 | INFO | Train Epoch: 26 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.620/s, 204.620/s/gpu LR: 0.000142 Logit Scale: 16.883 Contrastive_loss: 2.6573 (2.0070) Loss: 2.6573 (2.0070)
2024-08-16,04:29:11 | INFO | Eval Epoch: 27 [256 / 3000] Clip Loss: 5.865312
2024-08-16,04:29:16 | INFO | Eval Epoch: 27 image_to_text_mean_rank: 782.6180 image_to_text_median_rank: 546.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0103 image_to_text_R@10: 0.0223 text_to_image_mean_rank: 727.9463 text_to_image_median_rank: 478.0000 text_to_image_R@1: 0.0047 text_to_image_R@5: 0.0193 text_to_image_R@10: 0.0297 clip_val_loss: 5.9258 epoch: 27.0000 num_samples: 3000.0000
2024-08-16,04:29:17 | INFO | Start epoch 27
2024-08-16,04:29:20 | INFO | Train Epoch: 27 [ 256/27000 (1%)] Data (t): 1.459 Batch (t): 2.705, 94.6292/s, 94.6292/s/gpu LR: 0.000142 Logit Scale: 16.884 Contrastive_loss: 1.7112 (1.7112) Loss: 1.7112 (1.7112)
2024-08-16,04:31:25 | INFO | Train Epoch: 27 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.360/s, 204.360/s/gpu LR: 0.000147 Logit Scale: 17.268 Contrastive_loss: 0.84065 (1.2759) Loss: 0.84065 (1.2759)
2024-08-16,04:31:30 | INFO | Train Epoch: 27 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.422/s, 204.422/s/gpu LR: 0.000147 Logit Scale: 17.283 Contrastive_loss: 0.70991 (1.0872) Loss: 0.70991 (1.0872)
2024-08-16,04:31:32 | INFO | Eval Epoch: 28 [256 / 3000] Clip Loss: 6.629393
2024-08-16,04:31:37 | INFO | Eval Epoch: 28 image_to_text_mean_rank: 896.8010 image_to_text_median_rank: 667.0000 image_to_text_R@1: 0.0013 image_to_text_R@5: 0.0097 image_to_text_R@10: 0.0167 text_to_image_mean_rank: 876.8693 text_to_image_median_rank: 625.0000 text_to_image_R@1: 0.0020 text_to_image_R@5: 0.0127 text_to_image_R@10: 0.0240 clip_val_loss: 6.5428 epoch: 28.0000 num_samples: 3000.0000
2024-08-16,04:31:38 | INFO | Start epoch 28
2024-08-16,04:31:41 | INFO | Train Epoch: 28 [ 256/27000 (1%)] Data (t): 1.474 Batch (t): 2.719, 94.1445/s, 94.1445/s/gpu LR: 0.000147 Logit Scale: 17.287 Contrastive_loss: 0.30942 (0.30942) Loss: 0.30942 (0.30942)
2024-08-16,04:33:46 | INFO | Train Epoch: 28 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.480/s, 204.480/s/gpu LR: 0.000152 Logit Scale: 17.513 Contrastive_loss: 0.14982 (0.22962) Loss: 0.14982 (0.22962)
2024-08-16,04:33:51 | INFO | Train Epoch: 28 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.602/s, 204.602/s/gpu LR: 0.000152 Logit Scale: 17.520 Contrastive_loss: 0.17185 (0.21036) Loss: 0.17185 (0.21036)
2024-08-16,04:33:53 | INFO | Eval Epoch: 29 [256 / 3000] Clip Loss: 6.860646
2024-08-16,04:33:57 | INFO | Eval Epoch: 29 image_to_text_mean_rank: 948.3863 image_to_text_median_rank: 727.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0110 image_to_text_R@10: 0.0230 text_to_image_mean_rank: 941.6530 text_to_image_median_rank: 707.0000 text_to_image_R@1: 0.0023 text_to_image_R@5: 0.0147 text_to_image_R@10: 0.0250 clip_val_loss: 6.7876 epoch: 29.0000 num_samples: 3000.0000
2024-08-16,04:33:59 | INFO | Start epoch 29
2024-08-16,04:34:01 | INFO | Train Epoch: 29 [ 256/27000 (1%)] Data (t): 1.484 Batch (t): 2.730, 93.7852/s, 93.7852/s/gpu LR: 0.000152 Logit Scale: 17.521 Contrastive_loss: 0.073008 (0.073008) Loss: 0.073008 (0.073008)
2024-08-16,04:36:06 | INFO | Train Epoch: 29 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.650/s, 204.650/s/gpu LR: 0.000157 Logit Scale: 17.624 Contrastive_loss: 0.061296 (0.067152) Loss: 0.061296 (0.067152)
2024-08-16,04:36:11 | INFO | Train Epoch: 29 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.599/s, 204.599/s/gpu LR: 0.000158 Logit Scale: 17.627 Contrastive_loss: 0.069361 (0.067888) Loss: 0.069361 (0.067888)
2024-08-16,04:36:13 | INFO | Eval Epoch: 30 [256 / 3000] Clip Loss: 7.037732
2024-08-16,04:36:18 | INFO | Eval Epoch: 30 image_to_text_mean_rank: 987.8290 image_to_text_median_rank: 769.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0083 image_to_text_R@10: 0.0157 text_to_image_mean_rank: 982.1127 text_to_image_median_rank: 774.0000 text_to_image_R@1: 0.0037 text_to_image_R@5: 0.0110 text_to_image_R@10: 0.0200 clip_val_loss: 6.9645 epoch: 30.0000 num_samples: 3000.0000
2024-08-16,04:36:19 | INFO | Start epoch 30
2024-08-16,04:36:22 | INFO | Train Epoch: 30 [ 256/27000 (1%)] Data (t): 1.468 Batch (t): 2.714, 94.3415/s, 94.3415/s/gpu LR: 0.000158 Logit Scale: 17.628 Contrastive_loss: 0.040057 (0.040057) Loss: 0.040057 (0.040057)
2024-08-16,04:38:27 | INFO | Train Epoch: 30 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.454/s, 204.454/s/gpu LR: 0.000163 Logit Scale: 17.701 Contrastive_loss: 0.035407 (0.037732) Loss: 0.035407 (0.037732)
2024-08-16,04:38:32 | INFO | Train Epoch: 30 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.606/s, 204.606/s/gpu LR: 0.000163 Logit Scale: 17.704 Contrastive_loss: 0.046378 (0.040614) Loss: 0.046378 (0.040614)
2024-08-16,04:38:34 | INFO | Eval Epoch: 31 [256 / 3000] Clip Loss: 7.103376
2024-08-16,04:38:38 | INFO | Eval Epoch: 31 image_to_text_mean_rank: 1028.7160 image_to_text_median_rank: 827.0000 image_to_text_R@1: 0.0030 image_to_text_R@5: 0.0087 image_to_text_R@10: 0.0193 text_to_image_mean_rank: 1023.7647 text_to_image_median_rank: 810.0000 text_to_image_R@1: 0.0020 text_to_image_R@5: 0.0117 text_to_image_R@10: 0.0193 clip_val_loss: 7.0932 epoch: 31.0000 num_samples: 3000.0000
2024-08-16,04:38:40 | INFO | Start epoch 31
2024-08-16,04:38:42 | INFO | Train Epoch: 31 [ 256/27000 (1%)] Data (t): 1.446 Batch (t): 2.692, 95.1004/s, 95.1004/s/gpu LR: 0.000163 Logit Scale: 17.704 Contrastive_loss: 0.045176 (0.045176) Loss: 0.045176 (0.045176)
2024-08-16,04:40:47 | INFO | Train Epoch: 31 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.631/s, 204.631/s/gpu LR: 0.000168 Logit Scale: 17.770 Contrastive_loss: 0.063451 (0.054314) Loss: 0.063451 (0.054314)
2024-08-16,04:40:53 | INFO | Train Epoch: 31 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.576/s, 204.576/s/gpu LR: 0.000168 Logit Scale: 17.772 Contrastive_loss: 0.025344 (0.044657) Loss: 0.025344 (0.044657)
2024-08-16,04:40:54 | INFO | Eval Epoch: 32 [256 / 3000] Clip Loss: 7.116255
2024-08-16,04:40:59 | INFO | Eval Epoch: 32 image_to_text_mean_rank: 1041.3660 image_to_text_median_rank: 826.0000 image_to_text_R@1: 0.0013 image_to_text_R@5: 0.0103 image_to_text_R@10: 0.0180 text_to_image_mean_rank: 1038.6310 text_to_image_median_rank: 819.0000 text_to_image_R@1: 0.0020 text_to_image_R@5: 0.0107 text_to_image_R@10: 0.0223 clip_val_loss: 7.1470 epoch: 32.0000 num_samples: 3000.0000
2024-08-16,04:41:00 | INFO | Start epoch 32
2024-08-16,04:41:03 | INFO | Train Epoch: 32 [ 256/27000 (1%)] Data (t): 1.481 Batch (t): 2.728, 93.8307/s, 93.8307/s/gpu LR: 0.000168 Logit Scale: 17.773 Contrastive_loss: 0.024506 (0.024506) Loss: 0.024506 (0.024506)
2024-08-16,04:43:08 | INFO | Train Epoch: 32 [25856/27000 (96%)] Data (t): 0.001 Batch (t): 1.251, 204.585/s, 204.585/s/gpu LR: 0.000173 Logit Scale: 17.837 Contrastive_loss: 0.042150 (0.033328) Loss: 0.042150 (0.033328)
2024-08-16,04:43:13 | INFO | Train Epoch: 32 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.710/s, 204.710/s/gpu LR: 0.000173 Logit Scale: 17.840 Contrastive_loss: 0.021264 (0.029307) Loss: 0.021264 (0.029307)
2024-08-16,04:43:15 | INFO | Eval Epoch: 33 [256 / 3000] Clip Loss: 7.302838
2024-08-16,04:43:20 | INFO | Eval Epoch: 33 image_to_text_mean_rank: 1051.0060 image_to_text_median_rank: 844.0000 image_to_text_R@1: 0.0010 image_to_text_R@5: 0.0083 image_to_text_R@10: 0.0143 text_to_image_mean_rank: 1047.7423 text_to_image_median_rank: 858.0000 text_to_image_R@1: 0.0017 text_to_image_R@5: 0.0107 text_to_image_R@10: 0.0167 clip_val_loss: 7.2055 epoch: 33.0000 num_samples: 3000.0000
2024-08-16,04:43:21 | INFO | Start epoch 33
2024-08-16,04:43:24 | INFO | Train Epoch: 33 [ 256/27000 (1%)] Data (t): 1.471 Batch (t): 2.714, 94.3115/s, 94.3115/s/gpu LR: 0.000173 Logit Scale: 17.840 Contrastive_loss: 0.036071 (0.036071) Loss: 0.036071 (0.036071)
2024-08-16,04:45:29 | INFO | Train Epoch: 33 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.581/s, 204.581/s/gpu LR: 0.000178 Logit Scale: 17.908 Contrastive_loss: 0.032550 (0.034310) Loss: 0.032550 (0.034310)
2024-08-16,04:45:34 | INFO | Train Epoch: 33 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.538/s, 204.538/s/gpu LR: 0.000179 Logit Scale: 17.911 Contrastive_loss: 0.045624 (0.038082) Loss: 0.045624 (0.038082)
2024-08-16,04:45:36 | INFO | Eval Epoch: 34 [256 / 3000] Clip Loss: 7.142300
2024-08-16,04:45:40 | INFO | Eval Epoch: 34 image_to_text_mean_rank: 1043.7917 image_to_text_median_rank: 850.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0097 image_to_text_R@10: 0.0183 text_to_image_mean_rank: 1041.0867 text_to_image_median_rank: 850.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0113 text_to_image_R@10: 0.0200 clip_val_loss: 7.1728 epoch: 34.0000 num_samples: 3000.0000
2024-08-16,04:45:42 | INFO | Start epoch 34
2024-08-16,04:45:44 | INFO | Train Epoch: 34 [ 256/27000 (1%)] Data (t): 1.475 Batch (t): 2.721, 94.0990/s, 94.0990/s/gpu LR: 0.000179 Logit Scale: 17.911 Contrastive_loss: 0.030825 (0.030825) Loss: 0.030825 (0.030825)
2024-08-16,04:47:49 | INFO | Train Epoch: 34 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.590/s, 204.590/s/gpu LR: 0.000184 Logit Scale: 17.981 Contrastive_loss: 0.046251 (0.038538) Loss: 0.046251 (0.038538)
2024-08-16,04:47:54 | INFO | Train Epoch: 34 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.829/s, 204.829/s/gpu LR: 0.000184 Logit Scale: 17.984 Contrastive_loss: 0.041009 (0.039362) Loss: 0.041009 (0.039362)
2024-08-16,04:47:56 | INFO | Eval Epoch: 35 [256 / 3000] Clip Loss: 7.147564
2024-08-16,04:48:01 | INFO | Eval Epoch: 35 image_to_text_mean_rank: 1042.3890 image_to_text_median_rank: 852.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0090 image_to_text_R@10: 0.0173 text_to_image_mean_rank: 1035.9400 text_to_image_median_rank: 837.0000 text_to_image_R@1: 0.0020 text_to_image_R@5: 0.0110 text_to_image_R@10: 0.0207 clip_val_loss: 7.1603 epoch: 35.0000 num_samples: 3000.0000
2024-08-16,04:48:02 | INFO | Start epoch 35
2024-08-16,04:48:05 | INFO | Train Epoch: 35 [ 256/27000 (1%)] Data (t): 1.466 Batch (t): 2.711, 94.4327/s, 94.4327/s/gpu LR: 0.000184 Logit Scale: 17.985 Contrastive_loss: 0.032819 (0.032819) Loss: 0.032819 (0.032819)
2024-08-16,04:50:10 | INFO | Train Epoch: 35 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.250, 204.457/s, 204.457/s/gpu LR: 0.000189 Logit Scale: 18.066 Contrastive_loss: 0.047232 (0.040025) Loss: 0.047232 (0.040025)
2024-08-16,04:50:15 | INFO | Train Epoch: 35 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.589/s, 204.589/s/gpu LR: 0.000189 Logit Scale: 18.069 Contrastive_loss: 0.056291 (0.045447) Loss: 0.056291 (0.045447)
2024-08-16,04:50:17 | INFO | Eval Epoch: 36 [256 / 3000] Clip Loss: 7.278274
2024-08-16,04:50:22 | INFO | Eval Epoch: 36 image_to_text_mean_rank: 1058.5333 image_to_text_median_rank: 847.0000 image_to_text_R@1: 0.0030 image_to_text_R@5: 0.0147 image_to_text_R@10: 0.0223 text_to_image_mean_rank: 1052.5207 text_to_image_median_rank: 836.0000 text_to_image_R@1: 0.0023 text_to_image_R@5: 0.0133 text_to_image_R@10: 0.0233 clip_val_loss: 7.2311 epoch: 36.0000 num_samples: 3000.0000
2024-08-16,04:50:23 | INFO | Start epoch 36
2024-08-16,04:50:26 | INFO | Train Epoch: 36 [ 256/27000 (1%)] Data (t): 1.454 Batch (t): 2.699, 94.8579/s, 94.8579/s/gpu LR: 0.000189 Logit Scale: 18.070 Contrastive_loss: 0.030523 (0.030523) Loss: 0.030523 (0.030523)
2024-08-16,04:52:31 | INFO | Train Epoch: 36 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.636/s, 204.636/s/gpu LR: 0.000194 Logit Scale: 18.156 Contrastive_loss: 0.043999 (0.037261) Loss: 0.043999 (0.037261)
2024-08-16,04:52:36 | INFO | Train Epoch: 36 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.744/s, 204.744/s/gpu LR: 0.000194 Logit Scale: 18.160 Contrastive_loss: 0.037191 (0.037238) Loss: 0.037191 (0.037238)
2024-08-16,04:52:38 | INFO | Eval Epoch: 37 [256 / 3000] Clip Loss: 7.176402
2024-08-16,04:52:42 | INFO | Eval Epoch: 37 image_to_text_mean_rank: 1054.3230 image_to_text_median_rank: 875.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0090 image_to_text_R@10: 0.0180 text_to_image_mean_rank: 1051.4680 text_to_image_median_rank: 860.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0127 text_to_image_R@10: 0.0197 clip_val_loss: 7.2518 epoch: 37.0000 num_samples: 3000.0000
2024-08-16,04:52:44 | INFO | Start epoch 37
2024-08-16,04:52:46 | INFO | Train Epoch: 37 [ 256/27000 (1%)] Data (t): 1.471 Batch (t): 2.715, 94.2910/s, 94.2910/s/gpu LR: 0.000194 Logit Scale: 18.161 Contrastive_loss: 0.049741 (0.049741) Loss: 0.049741 (0.049741)
2024-08-16,04:54:51 | INFO | Train Epoch: 37 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.555/s, 204.555/s/gpu LR: 0.000199 Logit Scale: 18.264 Contrastive_loss: 0.054978 (0.052359) Loss: 0.054978 (0.052359)
2024-08-16,04:54:56 | INFO | Train Epoch: 37 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.389/s, 204.389/s/gpu LR: 0.000199 Logit Scale: 18.269 Contrastive_loss: 0.049099 (0.051273) Loss: 0.049099 (0.051273)
2024-08-16,04:54:58 | INFO | Eval Epoch: 38 [256 / 3000] Clip Loss: 7.144962
2024-08-16,04:55:03 | INFO | Eval Epoch: 38 image_to_text_mean_rank: 1057.0257 image_to_text_median_rank: 858.0000 image_to_text_R@1: 0.0023 image_to_text_R@5: 0.0100 image_to_text_R@10: 0.0217 text_to_image_mean_rank: 1049.7160 text_to_image_median_rank: 842.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0123 text_to_image_R@10: 0.0223 clip_val_loss: 7.2849 epoch: 38.0000 num_samples: 3000.0000
2024-08-16,04:55:04 | INFO | Start epoch 38
2024-08-16,04:55:07 | INFO | Train Epoch: 38 [ 256/27000 (1%)] Data (t): 1.472 Batch (t): 2.718, 94.1733/s, 94.1733/s/gpu LR: 0.000200 Logit Scale: 18.270 Contrastive_loss: 0.066994 (0.066994) Loss: 0.066994 (0.066994)
2024-08-16,04:57:12 | INFO | Train Epoch: 38 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.448/s, 204.448/s/gpu LR: 0.000205 Logit Scale: 18.393 Contrastive_loss: 0.075224 (0.071109) Loss: 0.075224 (0.071109)
2024-08-16,04:57:17 | INFO | Train Epoch: 38 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.568/s, 204.568/s/gpu LR: 0.000205 Logit Scale: 18.399 Contrastive_loss: 0.057282 (0.066500) Loss: 0.057282 (0.066500)
2024-08-16,04:57:19 | INFO | Eval Epoch: 39 [256 / 3000] Clip Loss: 7.157355
2024-08-16,04:57:24 | INFO | Eval Epoch: 39 image_to_text_mean_rank: 1028.1930 image_to_text_median_rank: 819.0000 image_to_text_R@1: 0.0033 image_to_text_R@5: 0.0100 image_to_text_R@10: 0.0190 text_to_image_mean_rank: 1020.0573 text_to_image_median_rank: 827.0000 text_to_image_R@1: 0.0043 text_to_image_R@5: 0.0140 text_to_image_R@10: 0.0230 clip_val_loss: 7.2089 epoch: 39.0000 num_samples: 3000.0000
2024-08-16,04:57:25 | INFO | Start epoch 39
2024-08-16,04:57:28 | INFO | Train Epoch: 39 [ 256/27000 (1%)] Data (t): 1.482 Batch (t): 2.727, 93.8673/s, 93.8673/s/gpu LR: 0.000205 Logit Scale: 18.400 Contrastive_loss: 0.056646 (0.056646) Loss: 0.056646 (0.056646)
2024-08-16,04:59:33 | INFO | Train Epoch: 39 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.481/s, 204.481/s/gpu LR: 0.000210 Logit Scale: 18.561 Contrastive_loss: 0.055771 (0.056209) Loss: 0.055771 (0.056209)
2024-08-16,04:59:38 | INFO | Train Epoch: 39 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.646/s, 204.646/s/gpu LR: 0.000210 Logit Scale: 18.567 Contrastive_loss: 0.057177 (0.056531) Loss: 0.057177 (0.056531)
2024-08-16,04:59:39 | INFO | Eval Epoch: 40 [256 / 3000] Clip Loss: 7.104994
2024-08-16,04:59:44 | INFO | Eval Epoch: 40 image_to_text_mean_rank: 1018.7757 image_to_text_median_rank: 782.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0103 image_to_text_R@10: 0.0183 text_to_image_mean_rank: 1008.3393 text_to_image_median_rank: 782.0000 text_to_image_R@1: 0.0017 text_to_image_R@5: 0.0150 text_to_image_R@10: 0.0247 clip_val_loss: 7.2402 epoch: 40.0000 num_samples: 3000.0000
2024-08-16,04:59:46 | INFO | Start epoch 40
2024-08-16,04:59:48 | INFO | Train Epoch: 40 [ 256/27000 (1%)] Data (t): 1.463 Batch (t): 2.709, 94.4969/s, 94.4969/s/gpu LR: 0.000210 Logit Scale: 18.569 Contrastive_loss: 0.090862 (0.090862) Loss: 0.090862 (0.090862)
2024-08-16,05:01:54 | INFO | Train Epoch: 40 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.537/s, 204.537/s/gpu LR: 0.000215 Logit Scale: 18.769 Contrastive_loss: 0.094119 (0.092491) Loss: 0.094119 (0.092491)
2024-08-16,05:01:59 | INFO | Train Epoch: 40 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.528/s, 204.528/s/gpu LR: 0.000215 Logit Scale: 18.779 Contrastive_loss: 0.10408 (0.096353) Loss: 0.10408 (0.096353)
2024-08-16,05:02:00 | INFO | Eval Epoch: 41 [256 / 3000] Clip Loss: 7.132548
2024-08-16,05:02:05 | INFO | Eval Epoch: 41 image_to_text_mean_rank: 1011.8893 image_to_text_median_rank: 803.0000 image_to_text_R@1: 0.0030 image_to_text_R@5: 0.0137 image_to_text_R@10: 0.0220 text_to_image_mean_rank: 999.8167 text_to_image_median_rank: 784.0000 text_to_image_R@1: 0.0033 text_to_image_R@5: 0.0133 text_to_image_R@10: 0.0220 clip_val_loss: 7.2351 epoch: 41.0000 num_samples: 3000.0000
2024-08-16,05:02:06 | INFO | Start epoch 41
2024-08-16,05:02:09 | INFO | Train Epoch: 41 [ 256/27000 (1%)] Data (t): 1.459 Batch (t): 2.706, 94.6002/s, 94.6002/s/gpu LR: 0.000215 Logit Scale: 18.782 Contrastive_loss: 0.085834 (0.085834) Loss: 0.085834 (0.085834)
2024-08-16,05:04:14 | INFO | Train Epoch: 41 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.523/s, 204.523/s/gpu LR: 0.000220 Logit Scale: 19.049 Contrastive_loss: 4.7974 (2.4416) Loss: 4.7974 (2.4416)
2024-08-16,05:04:19 | INFO | Train Epoch: 41 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.451/s, 204.451/s/gpu LR: 0.000221 Logit Scale: 19.044 Contrastive_loss: 4.6688 (3.1840) Loss: 4.6688 (3.1840)
2024-08-16,05:04:21 | INFO | Eval Epoch: 42 [256 / 3000] Clip Loss: 4.982571
2024-08-16,05:04:26 | INFO | Eval Epoch: 42 image_to_text_mean_rank: 754.0190 image_to_text_median_rank: 534.0000 image_to_text_R@1: 0.0013 image_to_text_R@5: 0.0113 image_to_text_R@10: 0.0213 text_to_image_mean_rank: 667.4670 text_to_image_median_rank: 441.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0110 text_to_image_R@10: 0.0227 clip_val_loss: 5.0287 epoch: 42.0000 num_samples: 3000.0000
2024-08-16,05:04:27 | INFO | Start epoch 42
2024-08-16,05:04:30 | INFO | Train Epoch: 42 [ 256/27000 (1%)] Data (t): 1.476 Batch (t): 2.722, 94.0346/s, 94.0346/s/gpu LR: 0.000221 Logit Scale: 19.043 Contrastive_loss: 4.5297 (4.5297) Loss: 4.5297 (4.5297)
2024-08-16,05:06:35 | INFO | Train Epoch: 42 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.332/s, 204.332/s/gpu LR: 0.000226 Logit Scale: 19.079 Contrastive_loss: 2.9309 (3.7303) Loss: 2.9309 (3.7303)
2024-08-16,05:06:40 | INFO | Train Epoch: 42 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.256/s, 204.256/s/gpu LR: 0.000226 Logit Scale: 19.079 Contrastive_loss: 3.1745 (3.5451) Loss: 3.1745 (3.5451)
2024-08-16,05:06:42 | INFO | Eval Epoch: 43 [256 / 3000] Clip Loss: 5.689608
2024-08-16,05:06:47 | INFO | Eval Epoch: 43 image_to_text_mean_rank: 761.4207 image_to_text_median_rank: 525.0000 image_to_text_R@1: 0.0027 image_to_text_R@5: 0.0130 image_to_text_R@10: 0.0227 text_to_image_mean_rank: 737.0500 text_to_image_median_rank: 503.0000 text_to_image_R@1: 0.0033 text_to_image_R@5: 0.0167 text_to_image_R@10: 0.0263 clip_val_loss: 5.9352 epoch: 43.0000 num_samples: 3000.0000
2024-08-16,05:06:48 | INFO | Start epoch 43
2024-08-16,05:06:51 | INFO | Train Epoch: 43 [ 256/27000 (1%)] Data (t): 1.538 Batch (t): 2.784, 91.9524/s, 91.9524/s/gpu LR: 0.000226 Logit Scale: 19.080 Contrastive_loss: 1.8070 (1.8070) Loss: 1.8070 (1.8070)
2024-08-16,05:08:56 | INFO | Train Epoch: 43 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.467/s, 204.467/s/gpu LR: 0.000231 Logit Scale: 19.650 Contrastive_loss: 1.5848 (1.6959) Loss: 1.5848 (1.6959)
2024-08-16,05:09:01 | INFO | Train Epoch: 43 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.253, 204.317/s, 204.317/s/gpu LR: 0.000231 Logit Scale: 19.674 Contrastive_loss: 1.6317 (1.6745) Loss: 1.6317 (1.6745)
2024-08-16,05:09:03 | INFO | Eval Epoch: 44 [256 / 3000] Clip Loss: 6.758216
2024-08-16,05:09:07 | INFO | Eval Epoch: 44 image_to_text_mean_rank: 887.7003 image_to_text_median_rank: 669.0000 image_to_text_R@1: 0.0010 image_to_text_R@5: 0.0093 image_to_text_R@10: 0.0173 text_to_image_mean_rank: 871.1403 text_to_image_median_rank: 657.0000 text_to_image_R@1: 0.0043 text_to_image_R@5: 0.0137 text_to_image_R@10: 0.0223 clip_val_loss: 6.9089 epoch: 44.0000 num_samples: 3000.0000
2024-08-16,05:09:09 | INFO | Start epoch 44
2024-08-16,05:09:11 | INFO | Train Epoch: 44 [ 256/27000 (1%)] Data (t): 1.472 Batch (t): 2.719, 94.1646/s, 94.1646/s/gpu LR: 0.000231 Logit Scale: 19.680 Contrastive_loss: 0.64078 (0.64078) Loss: 0.64078 (0.64078)
2024-08-16,05:11:17 | INFO | Train Epoch: 44 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.252, 204.727/s, 204.727/s/gpu LR: 0.000236 Logit Scale: 20.250 Contrastive_loss: 0.27891 (0.45985) Loss: 0.27891 (0.45985)
2024-08-16,05:11:22 | INFO | Train Epoch: 44 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.648/s, 204.648/s/gpu LR: 0.000236 Logit Scale: 20.270 Contrastive_loss: 0.23861 (0.38610) Loss: 0.23861 (0.38610)
2024-08-16,05:11:23 | INFO | Eval Epoch: 45 [256 / 3000] Clip Loss: 7.229187
2024-08-16,05:11:28 | INFO | Eval Epoch: 45 image_to_text_mean_rank: 923.1860 image_to_text_median_rank: 682.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0097 image_to_text_R@10: 0.0190 text_to_image_mean_rank: 912.5870 text_to_image_median_rank: 674.0000 text_to_image_R@1: 0.0027 text_to_image_R@5: 0.0107 text_to_image_R@10: 0.0200 clip_val_loss: 7.3445 epoch: 45.0000 num_samples: 3000.0000
2024-08-16,05:11:29 | INFO | Start epoch 45
2024-08-16,05:11:32 | INFO | Train Epoch: 45 [ 256/27000 (1%)] Data (t): 1.496 Batch (t): 2.740, 93.4189/s, 93.4189/s/gpu LR: 0.000236 Logit Scale: 20.275 Contrastive_loss: 0.11670 (0.11670) Loss: 0.11670 (0.11670)
2024-08-16,05:13:37 | INFO | Train Epoch: 45 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.578/s, 204.578/s/gpu LR: 0.000241 Logit Scale: 20.541 Contrastive_loss: 0.10041 (0.10856) Loss: 0.10041 (0.10856)
2024-08-16,05:13:42 | INFO | Train Epoch: 45 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.456/s, 204.456/s/gpu LR: 0.000242 Logit Scale: 20.549 Contrastive_loss: 0.039388 (0.085500) Loss: 0.039388 (0.085500)
2024-08-16,05:13:44 | INFO | Eval Epoch: 46 [256 / 3000] Clip Loss: 7.590382
2024-08-16,05:13:49 | INFO | Eval Epoch: 46 image_to_text_mean_rank: 955.1153 image_to_text_median_rank: 728.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0107 image_to_text_R@10: 0.0187 text_to_image_mean_rank: 954.1097 text_to_image_median_rank: 736.0000 text_to_image_R@1: 0.0013 text_to_image_R@5: 0.0130 text_to_image_R@10: 0.0230 clip_val_loss: 7.6622 epoch: 46.0000 num_samples: 3000.0000
2024-08-16,05:13:50 | INFO | Start epoch 46
2024-08-16,05:13:53 | INFO | Train Epoch: 46 [ 256/27000 (1%)] Data (t): 1.480 Batch (t): 2.723, 94.0089/s, 94.0089/s/gpu LR: 0.000242 Logit Scale: 20.551 Contrastive_loss: 0.056549 (0.056549) Loss: 0.056549 (0.056549)
2024-08-16,05:15:58 | INFO | Train Epoch: 46 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.542/s, 204.542/s/gpu LR: 0.000247 Logit Scale: 20.695 Contrastive_loss: 0.031992 (0.044271) Loss: 0.031992 (0.044271)
2024-08-16,05:16:03 | INFO | Train Epoch: 46 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.581/s, 204.581/s/gpu LR: 0.000247 Logit Scale: 20.701 Contrastive_loss: 0.046248 (0.044930) Loss: 0.046248 (0.044930)
2024-08-16,05:16:05 | INFO | Eval Epoch: 47 [256 / 3000] Clip Loss: 7.512671
2024-08-16,05:16:10 | INFO | Eval Epoch: 47 image_to_text_mean_rank: 973.5813 image_to_text_median_rank: 747.0000 image_to_text_R@1: 0.0017 image_to_text_R@5: 0.0097 image_to_text_R@10: 0.0177 text_to_image_mean_rank: 973.0503 text_to_image_median_rank: 743.0000 text_to_image_R@1: 0.0010 text_to_image_R@5: 0.0100 text_to_image_R@10: 0.0210 clip_val_loss: 7.7592 epoch: 47.0000 num_samples: 3000.0000
2024-08-16,05:16:11 | INFO | Start epoch 47
2024-08-16,05:16:14 | INFO | Train Epoch: 47 [ 256/27000 (1%)] Data (t): 1.489 Batch (t): 2.734, 93.6261/s, 93.6261/s/gpu LR: 0.000247 Logit Scale: 20.702 Contrastive_loss: 0.042419 (0.042419) Loss: 0.042419 (0.042419)
2024-08-16,05:18:19 | INFO | Train Epoch: 47 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.251, 204.663/s, 204.663/s/gpu LR: 0.000252 Logit Scale: 20.817 Contrastive_loss: 0.037442 (0.039930) Loss: 0.037442 (0.039930)
2024-08-16,05:18:24 | INFO | Train Epoch: 47 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.251, 204.630/s, 204.630/s/gpu LR: 0.000252 Logit Scale: 20.821 Contrastive_loss: 0.057514 (0.045792) Loss: 0.057514 (0.045792)
2024-08-16,05:18:25 | INFO | Eval Epoch: 48 [256 / 3000] Clip Loss: 7.678425
2024-08-16,05:18:30 | INFO | Eval Epoch: 48 image_to_text_mean_rank: 995.8417 image_to_text_median_rank: 768.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0120 image_to_text_R@10: 0.0183 text_to_image_mean_rank: 996.3487 text_to_image_median_rank: 770.0000 text_to_image_R@1: 0.0030 text_to_image_R@5: 0.0130 text_to_image_R@10: 0.0197 clip_val_loss: 7.8886 epoch: 48.0000 num_samples: 3000.0000
2024-08-16,05:18:32 | INFO | Start epoch 48
2024-08-16,05:18:34 | INFO | Train Epoch: 48 [ 256/27000 (1%)] Data (t): 1.486 Batch (t): 2.731, 93.7329/s, 93.7329/s/gpu LR: 0.000252 Logit Scale: 20.822 Contrastive_loss: 0.037203 (0.037203) Loss: 0.037203 (0.037203)
2024-08-16,05:20:39 | INFO | Train Epoch: 48 [25856/27000 (96%)] Data (t): 0.000 Batch (t): 1.250, 204.601/s, 204.601/s/gpu LR: 0.000257 Logit Scale: 20.930 Contrastive_loss: 0.030419 (0.033811) Loss: 0.030419 (0.033811)
2024-08-16,05:20:44 | INFO | Train Epoch: 48 [26880/27000 (100%)] Data (t): 0.002 Batch (t): 1.252, 204.401/s, 204.401/s/gpu LR: 0.000257 Logit Scale: 20.934 Contrastive_loss: 0.017202 (0.028274) Loss: 0.017202 (0.028274)
2024-08-16,05:20:46 | INFO | Eval Epoch: 49 [256 / 3000] Clip Loss: 7.609719
2024-08-16,05:20:51 | INFO | Eval Epoch: 49 image_to_text_mean_rank: 1004.3947 image_to_text_median_rank: 827.0000 image_to_text_R@1: 0.0020 image_to_text_R@5: 0.0107 image_to_text_R@10: 0.0193 text_to_image_mean_rank: 1001.3020 text_to_image_median_rank: 814.0000 text_to_image_R@1: 0.0020 text_to_image_R@5: 0.0107 text_to_image_R@10: 0.0200 clip_val_loss: 7.9246 epoch: 49.0000 num_samples: 3000.0000
2024-08-16,05:20:52 | INFO | Start epoch 49
2024-08-16,05:20:55 | INFO | Train Epoch: 49 [ 256/27000 (1%)] Data (t): 1.470 Batch (t): 2.716, 94.2600/s, 94.2600/s/gpu LR: 0.000257 Logit Scale: 20.936 Contrastive_loss: 0.031021 (0.031021) Loss: 0.031021 (0.031021)
|