DINO ViT-B/16 Model
This is a DINO (self-distillation with no labels) vision transformer model trained on DataComp-12M dataset.
Training Details
- Architecture: ViT-B/16
- Dataset: DataComp-12M
- Batch size per GPU: 128
- Number of GPUs: 4
- Total epochs: 20
- FP16 training: True
- Image size: 224x224
- Patch size: 16x16
Training Command
torchrun \
--nproc_per_node=4 \
main_dino.py \
--arch vit_base \
--patch_size 16 \
--data_path /pasteur2/u/yuhuiz/yiming/datacomp_12m/processed_dataset \
--output_dir ./dino_output \
--batch_size_per_gpu 128 \
--epochs 20 \
--use_fp16 true \
--seed 0 \
--num_workers 4 \
--report-to-wandb \
--wandb-project dino \
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support