Training completed - Acc@1: 50.08%, Acc@5: 74.80%

Browse files

Files changed (4) hide show

.gitattributes +1 -0
README.md +97 -0
final_model.pth +3 -0
training_curves.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+training_curves.png filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,97 @@

+---
+tags:
+- image-classification
+- pyramid-vision-transformer
+- pvt
+- cifar100
+library_name: pytorch
+---
+# PVT-Tiny on CIFAR-100 @ 224×224
+This model is PVT-Tiny (Pyramid Vision Transformer) trained from scratch on CIFAR-100 (upsampled to 224×224) as a baseline for Vision GNN research.
+## Model Description
+- **Architecture**: PVT-Tiny (Pyramid Vision Transformer)
+- **Dataset**: CIFAR-100 (32×32 upsampled to 224×224)
+- **Training**: From scratch (no pretraining)
+- **Purpose**: Transformer baseline for validating Vision GNN performance
+## Training Details
+- **Optimizer**: AdamW (lr=5e-4, weight_decay=0.05)
+- **Scheduler**: CosineAnnealingLR (min_lr=1e-5)
+- **Epochs**: 100
+- **Batch Size**: 128
+- **Normalization**: CIFAR-100 statistics
+- **Mixed Precision**: Enabled
+## Model Architecture
+PVT-Tiny uses a pyramid structure with spatial reduction attention:
+- **Patch Size**: 4×4
+- **Embed Dims**: [64, 128, 320, 512]
+- **Num Heads**: [1, 2, 5, 8]
+- **Depths**: [2, 2, 2, 2]
+- **SR Ratios**: [8, 4, 2, 1]
+- **MLP Ratios**: [8, 8, 4, 4]
+## Results
+- **Best Test Acc@1**: 50.35%
+- **Best Test Acc@5**: 75.69%
+- **Final Test Acc@1**: 50.08%
+- **Final Test Acc@5**: 74.80%
+- **Training Time**: 3.02 hours
+## Methodology
+We follow the original PVT training protocol adapted for CIFAR-100 to ensure fair comparison with Vision GNN and CNN baselines.
+All models in the comparison are trained under identical conditions:
+- Same resolution (224×224)
+- Same data augmentation
+- No pretrained weights
+- Same CIFAR-100 normalization
+## Available Checkpoints
+- `best_model.pth` - Best performing checkpoint (50.35% Acc@1)
+- `final_model.pth` - Final model after all epochs
+- `checkpoint_epoch_X.pth` - Saved every 20 epochs
+## Usage
+```python
+import torch
+import torch.nn as nn
+from functools import partial
+# Use pvt-tiny configuration
+# Load model
+model = pvt_tiny(num_classes=100)
+# Load trained weights
+checkpoint = torch.load('best_model.pth')
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+```
+## Citation
+This implementation is based on:
+**Pyramid Vision Transformer:**
+```bibtex
+@inproceedings{wang2021pyramid,
+  title={Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions},
+  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
+  booktitle={ICCV},
+  year={2021}
+}
+```
+## Training Protocol
+Training follows the standard PVT protocol with AdamW optimizer and cosine annealing scheduler, ensuring reproducibility and fair comparison with other vision architectures.

final_model.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:80af27139a10a99fe8f5e318b867abbe040effdb2b4522415b0667d2090d64b7
+size 51134873

training_curves.png ADDED Viewed

Git LFS Details

SHA256: 51484784ffbdd9a5b998855401bd40dd9f7b4b5e270cf2855869a7835b9f4022
Pointer size: 131 Bytes
Size of remote file: 489 kB