YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
DINO ViT-Small Custom Dataset
This model is a Vision Transformer (ViT) Small model trained using DINO (self-DIstillation with NO labels) on a custom dataset.
Model Details
- Architecture: ViT-Small (patch size 16)
- Pre-training Method: DINO
- Training Epochs: 500
- Output Dimension: 384
- Dataset Size: ~3000 images
- Base Model: WinKawaks/vit-small-patch16-224
Training Configuration
- Batch Size: 32
- Learning Rate: 0.005
- Teacher Temperature: 0.07
- Local Crops: 4
- Weight Decay: 0.04 โ 0.4
- Optimizer: adamw
Training Results
- Final Loss: 2.9609
- Training Time: 5:16:13
Usage
from transformers import ViTModel
import torch
# Load the model
model = ViTModel.from_pretrained("odinson/dino-vit-small-custom")
# Use for feature extraction
model.eval()
with torch.no_grad():
features = model(images).last_hidden_state
Training Curves
See the training plots in the repository for loss, learning rate, and weight decay curves.
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support