| 2025-04-29,10:46:51 | INFO | No latest resume checkpoint found in /mnt/personal/zhudongy/datacomp_results/small/low_inter_only/checkpoints. | |
| 2025-04-29,10:46:53 | INFO | Running in distributed mode with multiple processes. Device: cuda:0.Process (global: 0, local 0), total 2. | |
| 2025-04-29,10:46:53 | INFO | Loaded ViT-B-32 model config. | |
| 2025-04-29,10:46:54 | INFO | Model: | |
| 2025-04-29,10:46:54 | INFO | CLIP( | |
| (visual): VisionTransformer( | |
| (patchnorm_pre_ln): Identity() | |
| (conv1): Conv2d(3, 768, kernel_size=(32, 32), stride=(32, 32), bias=False) | |
| (patch_dropout): Identity() | |
| (ln_pre): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (transformer): Transformer( | |
| (resblocks): ModuleList( | |
| (0): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (1): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (2): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (3): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (4): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (5): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (6): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (7): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (8): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (9): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (10): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (11): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=768, out_features=768, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=768, out_features=3072, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=3072, out_features=768, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| ) | |
| ) | |
| (ln_post): LayerNorm((768,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| (transformer): Transformer( | |
| (resblocks): ModuleList( | |
| (0): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (1): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (2): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (3): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (4): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (5): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (6): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (7): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (8): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (9): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (10): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| (11): ResidualAttentionBlock( | |
| (ln_1): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (attn): MultiheadAttention( | |
| (out_proj): NonDynamicallyQuantizableLinear(in_features=512, out_features=512, bias=True) | |
| ) | |
| (ls_1): Identity() | |
| (ln_2): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| (mlp): Sequential( | |
| (c_fc): Linear(in_features=512, out_features=2048, bias=True) | |
| (gelu): GELU(approximate='none') | |
| (c_proj): Linear(in_features=2048, out_features=512, bias=True) | |
| ) | |
| (ls_2): Identity() | |
| ) | |
| ) | |
| ) | |
| (token_embedding): Embedding(49408, 512) | |
| (ln_final): LayerNorm((512,), eps=1e-05, elementwise_affine=True) | |
| ) | |
| 2025-04-29,10:46:54 | INFO | Params: | |
| 2025-04-29,10:46:54 | INFO | accum_freq: 1 | |
| 2025-04-29,10:46:54 | INFO | aug_cfg: {} | |
| 2025-04-29,10:46:54 | INFO | batch_size: 2048 | |
| 2025-04-29,10:46:54 | INFO | beta1: 0.9 | |
| 2025-04-29,10:46:54 | INFO | beta2: 0.98 | |
| 2025-04-29,10:46:54 | INFO | checkpoint_path: /mnt/personal/zhudongy/datacomp_results/small/low_inter_only/checkpoints | |
| 2025-04-29,10:46:54 | INFO | coca_caption_loss_weight: 2.0 | |
| 2025-04-29,10:46:54 | INFO | coca_contrastive_loss_weight: 1.0 | |
| 2025-04-29,10:46:54 | INFO | copy_codebase: False | |
| 2025-04-29,10:46:54 | INFO | csv_caption_key: title | |
| 2025-04-29,10:46:54 | INFO | csv_img_key: filepath | |
| 2025-04-29,10:46:54 | INFO | csv_separator: | |
| 2025-04-29,10:46:54 | INFO | dataset_resampled: True | |
| 2025-04-29,10:46:54 | INFO | dataset_type: webdataset | |
| 2025-04-29,10:46:54 | INFO | ddp_static_graph: True | |
| 2025-04-29,10:46:54 | INFO | debug: False | |
| 2025-04-29,10:46:54 | INFO | delete_previous_checkpoint: False | |
| 2025-04-29,10:46:54 | INFO | device: cuda:0 | |
| 2025-04-29,10:46:54 | INFO | dist_backend: nccl | |
| 2025-04-29,10:46:54 | INFO | dist_url: env:// | |
| 2025-04-29,10:46:54 | INFO | distill: False | |
| 2025-04-29,10:46:54 | INFO | distill_model: None | |
| 2025-04-29,10:46:54 | INFO | distill_pretrained: None | |
| 2025-04-29,10:46:54 | INFO | distributed: True | |
| 2025-04-29,10:46:54 | INFO | epochs: 8 | |
| 2025-04-29,10:46:54 | INFO | epochs_cooldown: None | |
| 2025-04-29,10:46:54 | INFO | eps: 1e-06 | |
| 2025-04-29,10:46:54 | INFO | force_custom_text: False | |
| 2025-04-29,10:46:54 | INFO | force_image_size: None | |
| 2025-04-29,10:46:54 | INFO | force_patch_dropout: None | |
| 2025-04-29,10:46:54 | INFO | force_quick_gelu: False | |
| 2025-04-29,10:46:54 | INFO | gather_with_grad: True | |
| 2025-04-29,10:46:54 | INFO | grad_checkpointing: True | |
| 2025-04-29,10:46:54 | INFO | grad_clip_norm: None | |
| 2025-04-29,10:46:54 | INFO | horovod: False | |
| 2025-04-29,10:46:54 | INFO | image_mean: None | |
| 2025-04-29,10:46:54 | INFO | image_std: None | |
| 2025-04-29,10:46:54 | INFO | imagenet_v2: None | |
| 2025-04-29,10:46:54 | INFO | imagenet_val: None | |
| 2025-04-29,10:46:54 | INFO | local_loss: True | |
| 2025-04-29,10:46:54 | INFO | local_rank: 0 | |
| 2025-04-29,10:46:54 | INFO | lock_image: False | |
| 2025-04-29,10:46:54 | INFO | lock_image_freeze_bn_stats: False | |
| 2025-04-29,10:46:54 | INFO | lock_image_unlocked_groups: 0 | |
| 2025-04-29,10:46:54 | INFO | lock_text: False | |
| 2025-04-29,10:46:54 | INFO | lock_text_freeze_layer_norm: False | |
| 2025-04-29,10:46:54 | INFO | lock_text_unlocked_layers: 0 | |
| 2025-04-29,10:46:54 | INFO | log_every_n_steps: 100 | |
| 2025-04-29,10:46:54 | INFO | log_level: 20 | |
| 2025-04-29,10:46:54 | INFO | log_local: False | |
| 2025-04-29,10:46:54 | INFO | log_path: /mnt/personal/zhudongy/datacomp_results/small/low_inter_only/out.log | |
| 2025-04-29,10:46:54 | INFO | logs: /mnt/personal/zhudongy/datacomp_results/small | |
| 2025-04-29,10:46:54 | INFO | lr: 0.0005 | |
| 2025-04-29,10:46:54 | INFO | lr_cooldown_end: 0.0 | |
| 2025-04-29,10:46:54 | INFO | lr_cooldown_power: 1.0 | |
| 2025-04-29,10:46:54 | INFO | lr_scheduler: cosine | |
| 2025-04-29,10:46:54 | INFO | model: ViT-B-32 | |
| 2025-04-29,10:46:54 | INFO | name: low_inter_only | |
| 2025-04-29,10:46:54 | INFO | no_set_device_rank: False | |
| 2025-04-29,10:46:54 | INFO | precision: amp_bfloat16 | |
| 2025-04-29,10:46:54 | INFO | pretrained: | |
| 2025-04-29,10:46:54 | INFO | pretrained_image: False | |
| 2025-04-29,10:46:54 | INFO | rank: 0 | |
| 2025-04-29,10:46:54 | INFO | remote_sync: None | |
| 2025-04-29,10:46:54 | INFO | remote_sync_frequency: 300 | |
| 2025-04-29,10:46:54 | INFO | remote_sync_protocol: s3 | |
| 2025-04-29,10:46:54 | INFO | report_to: | |
| 2025-04-29,10:46:54 | INFO | resume: None | |
| 2025-04-29,10:46:54 | INFO | save_frequency: 0 | |
| 2025-04-29,10:46:54 | INFO | save_most_recent: True | |
| 2025-04-29,10:46:54 | INFO | seed: 0 | |
| 2025-04-29,10:46:54 | INFO | skip_scheduler: False | |
| 2025-04-29,10:46:54 | INFO | tensorboard: False | |
| 2025-04-29,10:46:54 | INFO | tensorboard_path: | |
| 2025-04-29,10:46:54 | INFO | torchscript: False | |
| 2025-04-29,10:46:54 | INFO | trace: False | |
| 2025-04-29,10:46:54 | INFO | train_data: /mnt/personal/zhudongy/datacomp-small/shards/0000{0000..1287}.tar | |
| 2025-04-29,10:46:54 | INFO | train_data_upsampling_factors: None | |
| 2025-04-29,10:46:54 | INFO | train_num_samples: 1600000 | |
| 2025-04-29,10:46:54 | INFO | use_bn_sync: False | |
| 2025-04-29,10:46:54 | INFO | val_data: None | |
| 2025-04-29,10:46:54 | INFO | val_frequency: 1 | |
| 2025-04-29,10:46:54 | INFO | val_num_samples: None | |
| 2025-04-29,10:46:54 | INFO | wandb: False | |
| 2025-04-29,10:46:54 | INFO | wandb_notes: | |
| 2025-04-29,10:46:54 | INFO | wandb_project_name: open-clip | |
| 2025-04-29,10:46:54 | INFO | warmup: 500 | |
| 2025-04-29,10:46:54 | INFO | wd: 0.2 | |
| 2025-04-29,10:46:54 | INFO | workers: 4 | |
| 2025-04-29,10:46:54 | INFO | world_size: 2 | |
| 2025-04-29,10:46:54 | INFO | zeroshot_frequency: 2 | |
| 2025-04-29,10:46:54 | INFO | Start epoch 0 | |
| 2025-04-29,10:47:09 | INFO | Train Epoch: 0 [ 4096/1605632 (0%)] Data (t): 12.367 Batch (t): 14.958, 273.829/s, 136.915/s/gpu LR: 0.000001 Logit Scale: 14.286 Contrastive_loss: 8.3764 (8.3764) Loss: 8.3764 (8.3764) | |
| 2025-04-29,10:47:11 | INFO | Reducer buckets have been rebuilt in this iteration. | |
| 2025-04-29,10:51:10 | INFO | Train Epoch: 0 [ 413696/1605632 (26%)] Data (t): 0.340 Batch (t): 2.410, 1686.32/s, 843.158/s/gpu LR: 0.000101 Logit Scale: 14.265 Contrastive_loss: 8.2427 (8.3096) Loss: 8.2427 (8.3096) | |
| 2025-04-29,10:55:13 | INFO | Train Epoch: 0 [ 823296/1605632 (51%)] Data (t): 0.296 Batch (t): 2.436, 1684.03/s, 842.017/s/gpu LR: 0.000201 Logit Scale: 14.243 Contrastive_loss: 8.1236 (8.2476) Loss: 8.1236 (8.2476) | |
| 2025-04-29,10:59:20 | INFO | Train Epoch: 0 [1232896/1605632 (77%)] Data (t): 0.357 Batch (t): 2.468, 1624.40/s, 812.201/s/gpu LR: 0.000301 Logit Scale: 14.221 Contrastive_loss: 7.9420 (8.1712) Loss: 7.9420 (8.1712) | |
| 2025-04-29,11:03:10 | INFO | Train Epoch: 0 [1605632/1605632 (100%)] Data (t): 0.417 Batch (t): 2.530, 1728.75/s, 864.375/s/gpu LR: 0.000392 Logit Scale: 14.198 Contrastive_loss: 7.8148 (8.0999) Loss: 7.8148 (8.0999) | |
| 2025-04-29,11:03:12 | INFO | Start epoch 1 | |
| 2025-04-29,11:03:26 | INFO | Train Epoch: 1 [ 4096/1605632 (0%)] Data (t): 11.181 Batch (t): 13.231, 309.584/s, 154.792/s/gpu LR: 0.000393 Logit Scale: 14.198 Contrastive_loss: 7.8044 (7.8044) Loss: 7.8044 (7.8044) | |
| 2025-04-29,11:07:36 | INFO | Train Epoch: 1 [ 413696/1605632 (26%)] Data (t): 0.380 Batch (t): 2.507, 1574.67/s, 787.334/s/gpu LR: 0.000493 Logit Scale: 14.191 Contrastive_loss: 7.7343 (7.7694) Loss: 7.7343 (7.7694) | |
| 2025-04-29,11:11:47 | INFO | Train Epoch: 1 [ 823296/1605632 (51%)] Data (t): 0.374 Batch (t): 2.502, 1134.29/s, 567.146/s/gpu LR: 0.000498 Logit Scale: 14.212 Contrastive_loss: 7.6846 (7.7411) Loss: 7.6846 (7.7411) | |
| 2025-04-29,11:16:16 | INFO | Train Epoch: 1 [1232896/1605632 (77%)] Data (t): 0.619 Batch (t): 2.694, 1629.58/s, 814.789/s/gpu LR: 0.000493 Logit Scale: 14.257 Contrastive_loss: 7.5485 (7.6930) Loss: 7.5485 (7.6930) | |
| 2025-04-29,11:20:17 | INFO | Train Epoch: 1 [1605632/1605632 (100%)] Data (t): 0.482 Batch (t): 2.646, 1703.80/s, 851.899/s/gpu LR: 0.000486 Logit Scale: 14.345 Contrastive_loss: 7.3599 (7.6263) Loss: 7.3599 (7.6263) | |
| 2025-04-29,11:20:19 | INFO | Start epoch 2 | |
| 2025-04-29,11:20:31 | INFO | Train Epoch: 2 [ 4096/1605632 (0%)] Data (t): 10.642 Batch (t): 12.727, 321.831/s, 160.916/s/gpu LR: 0.000486 Logit Scale: 14.347 Contrastive_loss: 7.2405 (7.2405) Loss: 7.2405 (7.2405) | |
| 2025-04-29,11:24:41 | INFO | Train Epoch: 2 [ 413696/1605632 (26%)] Data (t): 0.355 Batch (t): 2.492, 1714.72/s, 857.358/s/gpu LR: 0.000474 Logit Scale: 14.450 Contrastive_loss: 7.4679 (7.3542) Loss: 7.4679 (7.3542) | |
| 2025-04-29,11:28:51 | INFO | Train Epoch: 2 [ 823296/1605632 (51%)] Data (t): 0.408 Batch (t): 2.510, 1154.44/s, 577.221/s/gpu LR: 0.000460 Logit Scale: 14.545 Contrastive_loss: 7.2964 (7.3349) Loss: 7.2964 (7.3349) | |
| 2025-04-29,11:33:00 | INFO | Train Epoch: 2 [1232896/1605632 (77%)] Data (t): 0.394 Batch (t): 2.484, 1699.79/s, 849.896/s/gpu LR: 0.000442 Logit Scale: 14.677 Contrastive_loss: 7.1794 (7.2960) Loss: 7.1794 (7.2960) | |
| 2025-04-29,11:36:40 | INFO | Train Epoch: 2 [1605632/1605632 (100%)] Data (t): 0.343 Batch (t): 2.421, 1718.76/s, 859.379/s/gpu LR: 0.000423 Logit Scale: 14.812 Contrastive_loss: 7.2101 (7.2789) Loss: 7.2101 (7.2789) | |
| 2025-04-29,11:36:42 | INFO | Start epoch 3 | |
| 2025-04-29,11:36:55 | INFO | Train Epoch: 3 [ 4096/1605632 (0%)] Data (t): 10.902 Batch (t): 12.965, 315.925/s, 157.962/s/gpu LR: 0.000423 Logit Scale: 14.814 Contrastive_loss: 7.1536 (7.1536) Loss: 7.1536 (7.1536) | |
| 2025-04-29,11:41:22 | INFO | Train Epoch: 3 [ 413696/1605632 (26%)] Data (t): 0.491 Batch (t): 2.665, 1701.82/s, 850.912/s/gpu LR: 0.000400 Logit Scale: 14.984 Contrastive_loss: 7.1237 (7.1387) Loss: 7.1237 (7.1387) | |
| 2025-04-29,11:45:33 | INFO | Train Epoch: 3 [ 823296/1605632 (51%)] Data (t): 0.387 Batch (t): 2.511, 1014.82/s, 507.408/s/gpu LR: 0.000376 Logit Scale: 15.179 Contrastive_loss: 6.8644 (7.0472) Loss: 6.8644 (7.0472) | |
| 2025-04-29,11:49:47 | INFO | Train Epoch: 3 [1232896/1605632 (77%)] Data (t): 0.452 Batch (t): 2.541, 1693.74/s, 846.870/s/gpu LR: 0.000349 Logit Scale: 15.355 Contrastive_loss: 6.6430 (6.9462) Loss: 6.6430 (6.9462) | |
| 2025-04-29,11:53:29 | INFO | Train Epoch: 3 [1605632/1605632 (100%)] Data (t): 0.371 Batch (t): 2.446, 1680.22/s, 840.108/s/gpu LR: 0.000324 Logit Scale: 15.544 Contrastive_loss: 6.8729 (6.9315) Loss: 6.8729 (6.9315) | |
| 2025-04-29,11:53:31 | INFO | Start epoch 4 | |
| 2025-04-29,11:53:45 | INFO | Train Epoch: 4 [ 4096/1605632 (0%)] Data (t): 11.803 Batch (t): 13.888, 294.923/s, 147.461/s/gpu LR: 0.000323 Logit Scale: 15.546 Contrastive_loss: 6.8561 (6.8561) Loss: 6.8561 (6.8561) | |
| 2025-04-29,11:57:50 | INFO | Train Epoch: 4 [ 413696/1605632 (26%)] Data (t): 0.373 Batch (t): 2.446, 1742.13/s, 871.064/s/gpu LR: 0.000294 Logit Scale: 15.744 Contrastive_loss: 6.6737 (6.7649) Loss: 6.6737 (6.7649) | |
| 2025-04-29,12:01:49 | INFO | Train Epoch: 4 [ 823296/1605632 (51%)] Data (t): 0.263 Batch (t): 2.394, 1726.45/s, 863.225/s/gpu LR: 0.000265 Logit Scale: 15.914 Contrastive_loss: 6.7540 (6.7613) Loss: 6.7540 (6.7613) | |
| 2025-04-29,12:05:52 | INFO | Train Epoch: 4 [1232896/1605632 (77%)] Data (t): 0.278 Batch (t): 2.430, 1668.21/s, 834.103/s/gpu LR: 0.000235 Logit Scale: 16.076 Contrastive_loss: 6.7919 (6.7689) Loss: 6.7919 (6.7689) | |
| 2025-04-29,12:09:34 | INFO | Train Epoch: 4 [1605632/1605632 (100%)] Data (t): 0.279 Batch (t): 2.440, 1555.69/s, 777.845/s/gpu LR: 0.000208 Logit Scale: 16.253 Contrastive_loss: 6.6502 (6.7452) Loss: 6.6502 (6.7452) | |
| 2025-04-29,12:09:36 | INFO | Start epoch 5 | |
| 2025-04-29,12:09:50 | INFO | Train Epoch: 5 [ 4096/1605632 (0%)] Data (t): 11.275 Batch (t): 13.347, 306.892/s, 153.446/s/gpu LR: 0.000208 Logit Scale: 16.255 Contrastive_loss: 6.6999 (6.6999) Loss: 6.6999 (6.6999) | |
| 2025-04-29,12:14:05 | INFO | Train Epoch: 5 [ 413696/1605632 (26%)] Data (t): 0.431 Batch (t): 2.551, 1723.17/s, 861.587/s/gpu LR: 0.000179 Logit Scale: 16.426 Contrastive_loss: 6.5666 (6.6333) Loss: 6.5666 (6.6333) | |
| 2025-04-29,12:18:18 | INFO | Train Epoch: 5 [ 823296/1605632 (51%)] Data (t): 0.404 Batch (t): 2.528, 1332.54/s, 666.268/s/gpu LR: 0.000151 Logit Scale: 16.569 Contrastive_loss: 6.4771 (6.5812) Loss: 6.4771 (6.5812) | |
| 2025-04-29,12:22:19 | INFO | Train Epoch: 5 [1232896/1605632 (77%)] Data (t): 0.336 Batch (t): 2.412, 1709.31/s, 854.657/s/gpu LR: 0.000124 Logit Scale: 16.692 Contrastive_loss: 6.5475 (6.5728) Loss: 6.5475 (6.5728) | |
| 2025-04-29,12:25:58 | INFO | Train Epoch: 5 [1605632/1605632 (100%)] Data (t): 0.336 Batch (t): 2.405, 1729.43/s, 864.715/s/gpu LR: 0.000102 Logit Scale: 16.794 Contrastive_loss: 6.5236 (6.5629) Loss: 6.5236 (6.5629) | |
| 2025-04-29,12:26:00 | INFO | Start epoch 6 | |
| 2025-04-29,12:26:13 | INFO | Train Epoch: 6 [ 4096/1605632 (0%)] Data (t): 11.106 Batch (t): 13.199, 310.322/s, 155.161/s/gpu LR: 0.000101 Logit Scale: 16.795 Contrastive_loss: 6.4957 (6.4957) Loss: 6.4957 (6.4957) | |
| 2025-04-29,12:30:20 | INFO | Train Epoch: 6 [ 413696/1605632 (26%)] Data (t): 0.374 Batch (t): 2.472, 1508.63/s, 754.316/s/gpu LR: 0.000079 Logit Scale: 16.882 Contrastive_loss: 6.0769 (6.2863) Loss: 6.0769 (6.2863) | |
| 2025-04-29,12:34:22 | INFO | Train Epoch: 6 [ 823296/1605632 (51%)] Data (t): 0.286 Batch (t): 2.416, 1530.56/s, 765.282/s/gpu LR: 0.000058 Logit Scale: 16.958 Contrastive_loss: 6.2138 (6.2621) Loss: 6.2138 (6.2621) | |
| 2025-04-29,12:38:24 | INFO | Train Epoch: 6 [1232896/1605632 (77%)] Data (t): 0.277 Batch (t): 2.422, 1757.06/s, 878.529/s/gpu LR: 0.000040 Logit Scale: 17.014 Contrastive_loss: 6.1187 (6.2263) Loss: 6.1187 (6.2263) | |
| 2025-04-29,12:42:22 | INFO | Train Epoch: 6 [1605632/1605632 (100%)] Data (t): 0.538 Batch (t): 2.620, 1698.75/s, 849.373/s/gpu LR: 0.000027 Logit Scale: 17.051 Contrastive_loss: 6.2108 (6.2232) Loss: 6.2108 (6.2232) | |
| 2025-04-29,12:42:24 | INFO | Start epoch 7 | |
| 2025-04-29,12:42:38 | INFO | Train Epoch: 7 [ 4096/1605632 (0%)] Data (t): 11.138 Batch (t): 13.217, 309.905/s, 154.953/s/gpu LR: 0.000027 Logit Scale: 17.051 Contrastive_loss: 6.2225 (6.2225) Loss: 6.2225 (6.2225) | |
| 2025-04-29,12:46:42 | INFO | Train Epoch: 7 [ 413696/1605632 (26%)] Data (t): 0.329 Batch (t): 2.443, 1574.65/s, 787.324/s/gpu LR: 0.000015 Logit Scale: 17.075 Contrastive_loss: 6.2199 (6.2212) Loss: 6.2199 (6.2212) | |
| 2025-04-29,12:50:48 | INFO | Train Epoch: 7 [ 823296/1605632 (51%)] Data (t): 0.363 Batch (t): 2.459, 1682.10/s, 841.051/s/gpu LR: 0.000007 Logit Scale: 17.089 Contrastive_loss: 6.2006 (6.2143) Loss: 6.2006 (6.2143) | |
| 2025-04-29,12:54:50 | INFO | Train Epoch: 7 [1232896/1605632 (77%)] Data (t): 0.336 Batch (t): 2.418, 1614.35/s, 807.177/s/gpu LR: 0.000002 Logit Scale: 17.093 Contrastive_loss: 6.0595 (6.1756) Loss: 6.0595 (6.1756) | |
| 2025-04-29,12:58:34 | INFO | Train Epoch: 7 [1605632/1605632 (100%)] Data (t): 0.342 Batch (t): 2.461, 1719.27/s, 859.635/s/gpu LR: 0.000000 Logit Scale: 17.094 Contrastive_loss: 6.1999 (6.1805) Loss: 6.1999 (6.1805) | |