DINO ViT-B/16 Model

This is a DINO (self-distillation with no labels) vision transformer model trained on DataComp-12M dataset.

Training Details

Architecture: ViT-B/16
Dataset: DataComp-12M
Batch size per GPU: 128
Number of GPUs: 4
Total epochs: 20
FP16 training: True
Image size: 224x224
Patch size: 16x16

Training Command

torchrun \
--nproc_per_node=4 \
main_dino.py \
--arch vit_base \
--patch_size 16 \
--data_path /pasteur2/u/yuhuiz/yiming/datacomp_12m/processed_dataset \
--output_dir ./dino_output \
--batch_size_per_gpu 128 \
--epochs 20 \
--use_fp16 true \
--seed 0 \
--num_workers 4 \
--report-to-wandb \
--wandb-project dino \

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support