iamgarvit commited on
Commit
8364d4e
·
verified ·
1 Parent(s): 515c5bf

Training completed - Acc@1: 50.08%, Acc@5: 74.80%

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +97 -0
  3. final_model.pth +3 -0
  4. training_curves.png +3 -0
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ training_curves.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,97 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - image-classification
4
+ - pyramid-vision-transformer
5
+ - pvt
6
+ - cifar100
7
+ library_name: pytorch
8
+ ---
9
+
10
+ # PVT-Tiny on CIFAR-100 @ 224×224
11
+
12
+ This model is PVT-Tiny (Pyramid Vision Transformer) trained from scratch on CIFAR-100 (upsampled to 224×224) as a baseline for Vision GNN research.
13
+
14
+ ## Model Description
15
+
16
+ - **Architecture**: PVT-Tiny (Pyramid Vision Transformer)
17
+ - **Dataset**: CIFAR-100 (32×32 upsampled to 224×224)
18
+ - **Training**: From scratch (no pretraining)
19
+ - **Purpose**: Transformer baseline for validating Vision GNN performance
20
+
21
+ ## Training Details
22
+
23
+ - **Optimizer**: AdamW (lr=5e-4, weight_decay=0.05)
24
+ - **Scheduler**: CosineAnnealingLR (min_lr=1e-5)
25
+ - **Epochs**: 100
26
+ - **Batch Size**: 128
27
+ - **Normalization**: CIFAR-100 statistics
28
+ - **Mixed Precision**: Enabled
29
+
30
+ ## Model Architecture
31
+
32
+ PVT-Tiny uses a pyramid structure with spatial reduction attention:
33
+ - **Patch Size**: 4×4
34
+ - **Embed Dims**: [64, 128, 320, 512]
35
+ - **Num Heads**: [1, 2, 5, 8]
36
+ - **Depths**: [2, 2, 2, 2]
37
+ - **SR Ratios**: [8, 4, 2, 1]
38
+ - **MLP Ratios**: [8, 8, 4, 4]
39
+
40
+ ## Results
41
+
42
+ - **Best Test Acc@1**: 50.35%
43
+ - **Best Test Acc@5**: 75.69%
44
+ - **Final Test Acc@1**: 50.08%
45
+ - **Final Test Acc@5**: 74.80%
46
+ - **Training Time**: 3.02 hours
47
+
48
+ ## Methodology
49
+
50
+ We follow the original PVT training protocol adapted for CIFAR-100 to ensure fair comparison with Vision GNN and CNN baselines.
51
+ All models in the comparison are trained under identical conditions:
52
+ - Same resolution (224×224)
53
+ - Same data augmentation
54
+ - No pretrained weights
55
+ - Same CIFAR-100 normalization
56
+
57
+ ## Available Checkpoints
58
+
59
+ - `best_model.pth` - Best performing checkpoint (50.35% Acc@1)
60
+ - `final_model.pth` - Final model after all epochs
61
+ - `checkpoint_epoch_X.pth` - Saved every 20 epochs
62
+
63
+ ## Usage
64
+
65
+ ```python
66
+ import torch
67
+ import torch.nn as nn
68
+ from functools import partial
69
+
70
+ # Use pvt-tiny configuration
71
+
72
+ # Load model
73
+ model = pvt_tiny(num_classes=100)
74
+
75
+ # Load trained weights
76
+ checkpoint = torch.load('best_model.pth')
77
+ model.load_state_dict(checkpoint['model_state_dict'])
78
+ model.eval()
79
+ ```
80
+
81
+ ## Citation
82
+
83
+ This implementation is based on:
84
+
85
+ **Pyramid Vision Transformer:**
86
+ ```bibtex
87
+ @inproceedings{wang2021pyramid,
88
+ title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
89
+ author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
90
+ booktitle={ICCV},
91
+ year={2021}
92
+ }
93
+ ```
94
+
95
+ ## Training Protocol
96
+
97
+ Training follows the standard PVT protocol with AdamW optimizer and cosine annealing scheduler, ensuring reproducibility and fair comparison with other vision architectures.
final_model.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:80af27139a10a99fe8f5e318b867abbe040effdb2b4522415b0667d2090d64b7
3
+ size 51134873
training_curves.png ADDED

Git LFS Details

  • SHA256: 51484784ffbdd9a5b998855401bd40dd9f7b4b5e270cf2855869a7835b9f4022
  • Pointer size: 131 Bytes
  • Size of remote file: 489 kB