Upload DINO pre-trained ViT-Small model
Browse files- README.md +43 -43
- config.json +0 -0
- model.safetensors +1 -1
- training_curves.png +0 -0
README.md
CHANGED
|
@@ -1,43 +1,43 @@
|
|
| 1 |
-
|
| 2 |
-
# DINO ViT-Small Custom Dataset
|
| 3 |
-
|
| 4 |
-
This model is a Vision Transformer (ViT) Small model trained using DINO (self-DIstillation with NO labels) on a custom dataset.
|
| 5 |
-
|
| 6 |
-
## Model Details
|
| 7 |
-
|
| 8 |
-
- **Architecture**: ViT-Small (patch size 16)
|
| 9 |
-
- **Pre-training Method**: DINO
|
| 10 |
-
- **Training Epochs**:
|
| 11 |
-
- **Output Dimension**: 384
|
| 12 |
-
- **Dataset Size**: ~3000 images
|
| 13 |
-
- **Base Model**: WinKawaks/vit-small-patch16-224
|
| 14 |
-
|
| 15 |
-
## Training Configuration
|
| 16 |
-
|
| 17 |
-
- Batch Size: 32
|
| 18 |
-
- Learning Rate: 0.0003
|
| 19 |
-
- Teacher Temperature: 0.07
|
| 20 |
-
- Local Crops: 4
|
| 21 |
-
- Weight Decay: 0.04 → 0.4
|
| 22 |
-
- Optimizer: adamw
|
| 23 |
-
|
| 24 |
-
## Training Results
|
| 25 |
-
|
| 26 |
-
- Final Loss: 5.
|
| 27 |
-
- Training Time: 0:
|
| 28 |
-
|
| 29 |
-
## Usage
|
| 30 |
-
|
| 31 |
-
```python
|
| 32 |
-
from transformers import ViTModel
|
| 33 |
-
import torch
|
| 34 |
-
|
| 35 |
-
# Load the model
|
| 36 |
-
model = ViTModel.from_pretrained("odinson/dino-vit-small-custom")
|
| 37 |
-
|
| 38 |
-
# Use for feature extraction
|
| 39 |
-
model.eval()
|
| 40 |
-
with torch.no_grad():
|
| 41 |
-
features = model(images).last_hidden_state
|
| 42 |
-
Training Curves
|
| 43 |
-
See the training plots in the repository for loss, learning rate, and weight decay curves.
|
|
|
|
| 1 |
+
|
| 2 |
+
# DINO ViT-Small Custom Dataset
|
| 3 |
+
|
| 4 |
+
This model is a Vision Transformer (ViT) Small model trained using DINO (self-DIstillation with NO labels) on a custom dataset.
|
| 5 |
+
|
| 6 |
+
## Model Details
|
| 7 |
+
|
| 8 |
+
- **Architecture**: ViT-Small (patch size 16)
|
| 9 |
+
- **Pre-training Method**: DINO
|
| 10 |
+
- **Training Epochs**: 10
|
| 11 |
+
- **Output Dimension**: 384
|
| 12 |
+
- **Dataset Size**: ~3000 images
|
| 13 |
+
- **Base Model**: WinKawaks/vit-small-patch16-224
|
| 14 |
+
|
| 15 |
+
## Training Configuration
|
| 16 |
+
|
| 17 |
+
- Batch Size: 32
|
| 18 |
+
- Learning Rate: 0.0003
|
| 19 |
+
- Teacher Temperature: 0.07
|
| 20 |
+
- Local Crops: 4
|
| 21 |
+
- Weight Decay: 0.04 → 0.4
|
| 22 |
+
- Optimizer: adamw
|
| 23 |
+
|
| 24 |
+
## Training Results
|
| 25 |
+
|
| 26 |
+
- Final Loss: 5.8926
|
| 27 |
+
- Training Time: 0:06:19
|
| 28 |
+
|
| 29 |
+
## Usage
|
| 30 |
+
|
| 31 |
+
```python
|
| 32 |
+
from transformers import ViTModel
|
| 33 |
+
import torch
|
| 34 |
+
|
| 35 |
+
# Load the model
|
| 36 |
+
model = ViTModel.from_pretrained("odinson/dino-vit-small-custom")
|
| 37 |
+
|
| 38 |
+
# Use for feature extraction
|
| 39 |
+
model.eval()
|
| 40 |
+
with torch.no_grad():
|
| 41 |
+
features = model(images).last_hidden_state
|
| 42 |
+
Training Curves
|
| 43 |
+
See the training plots in the repository for loss, learning rate, and weight decay curves.
|
config.json
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 87276144
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3a0a9ecef85397f127ee22742d036a12db213b50d64a8511a7692af6eca24260
|
| 3 |
size 87276144
|
training_curves.png
CHANGED
|
|