YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

DINO ViT-Small Custom Dataset

This model is a Vision Transformer (ViT) Small model trained using DINO (self-DIstillation with NO labels) on a custom dataset.

Model Details

  • Architecture: ViT-Small (patch size 16)
  • Pre-training Method: DINO
  • Training Epochs: 500
  • Output Dimension: 384
  • Dataset Size: ~3000 images
  • Base Model: WinKawaks/vit-small-patch16-224

Training Configuration

  • Batch Size: 32
  • Learning Rate: 0.005
  • Teacher Temperature: 0.07
  • Local Crops: 4
  • Weight Decay: 0.04 โ†’ 0.4
  • Optimizer: adamw

Training Results

  • Final Loss: 2.9609
  • Training Time: 5:16:13

Usage

from transformers import ViTModel
import torch

# Load the model
model = ViTModel.from_pretrained("odinson/dino-vit-small-custom")

# Use for feature extraction
model.eval()
with torch.no_grad():
    features = model(images).last_hidden_state
Training Curves
See the training plots in the repository for loss, learning rate, and weight decay curves.
Downloads last month
6
Safetensors
Model size
21.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support