DINO ViT-B/16 Model

This is a DINO (self-distillation with no labels) vision transformer model trained on DataComp-12M dataset.

Training Details

  • Architecture: ViT-B/16
  • Dataset: DataComp-12M
  • Batch size per GPU: 128
  • Number of GPUs: 4
  • Total epochs: 20
  • FP16 training: True
  • Image size: 224x224
  • Patch size: 16x16

Training Command

torchrun \
--nproc_per_node=4 \
main_dino.py \
--arch vit_base \
--patch_size 16 \
--data_path /pasteur2/u/yuhuiz/yiming/datacomp_12m/processed_dataset \
--output_dir ./dino_output \
--batch_size_per_gpu 128 \
--epochs 20 \
--use_fp16 true \
--seed 0 \
--num_workers 4 \
--report-to-wandb \
--wandb-project dino \
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support