chitter99 commited on
Commit
dabb0b0
·
verified ·
1 Parent(s): 6631263

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +73 -3
README.md CHANGED
@@ -1,3 +1,73 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - image-classification
5
+ - vision-transformer
6
+ - pytorch
7
+ - oxford-pets
8
+ library_name: torch
9
+ datasets:
10
+ - cvdl/oxford-pets
11
+ language: []
12
+ model-index:
13
+ - name: ViTPets
14
+ results:
15
+ - task:
16
+ type: image-classification
17
+ dataset:
18
+ name: Oxford Pets
19
+ type: cvdl/oxford-pets
20
+ metrics:
21
+ - type: accuracy
22
+ value: 9
23
+ ---
24
+
25
+ # ViTPets - Vision Transformer trained from scratch on Oxford Pets 🐶🐱
26
+
27
+ This model is a Vision Transformer (ViT) trained from scratch on the [Oxford Pets dataset](https://huggingface.co/datasets/cvdl/oxford-pets). It classifies images of cats and dogs into 37 different breeds.
28
+
29
+ ## Model Summary
30
+
31
+ - **Architecture**: Custom Vision Transformer (ViT)
32
+ - **Input resolution**: 128x128
33
+ - **Patch size**: 16x16
34
+ - **Embedding dimension**: 240
35
+ - **Number of Transformer blocks**: 12
36
+ - **Number of heads**: 4
37
+ - **MLP ratio**: 2.0
38
+ - **Dropout**: 10% on attention and MLP
39
+ - **Framework**: PyTorch
40
+ - **Dataset**: Oxford Pets (via 🤗 `cvdl/oxford-pets`)
41
+ - **Loss**: CrossEntropyLoss
42
+ - **Optimizer**: SGD with LR = 0.00257
43
+
44
+ ## Training Setup
45
+
46
+ - **Device**: Multi-GPU (4 GPUs)
47
+ - **Batch size**: 256 (64 × 4 GPUs)
48
+ - **Early stopping**: patience 50, delta 1e-6
49
+ - **Logging**: TensorBoard
50
+
51
+ ## How to Use
52
+
53
+ ```python
54
+ from model import ViT
55
+ import torch
56
+
57
+ model = ViT(
58
+ img_size=(128, 128),
59
+ patch_size=16,
60
+ in_channels=3,
61
+ embed_dim=240,
62
+ n_classes=37,
63
+ n_blocks=12,
64
+ n_heads=4,
65
+ mlp_ratio=2.0,
66
+ qkv_bias=True,
67
+ block_drop_p=0.1,
68
+ attn_drop_p=0.1,
69
+ )
70
+
71
+ model.load_state_dict(torch.load("ViTPets.pth"))
72
+ model.eval()
73
+ ```