DINO ViT-Small Custom Dataset

This model is a Vision Transformer (ViT) Small model trained using DINO (self-DIstillation with NO labels) on a custom dataset.

Model Details

Architecture: ViT-Small (patch size 16)
Pre-training Method: DINO
Training Epochs: 500
Output Dimension: 384
Dataset Size: ~3000 images
Base Model: WinKawaks/vit-small-patch16-224

Training Configuration

Batch Size: 32
Learning Rate: 0.005
Teacher Temperature: 0.07
Local Crops: 4
Weight Decay: 0.04 → 0.4
Optimizer: adamw

Training Results

Final Loss: 2.9609
Training Time: 5:16:13

Usage

from transformers import ViTModel
import torch

# Load the model
model = ViTModel.from_pretrained("odinson/dino-vit-small-custom")

# Use for feature extraction
model.eval()
with torch.no_grad():
    features = model(images).last_hidden_state
Training Curves
See the training plots in the repository for loss, learning rate, and weight decay curves.

Downloads last month: -

Safetensors

Model size

21.8M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support