cvdl/oxford-pets
Viewer β’ Updated β’ 3.67k β’ 312 β’ 1
This model is a Vision Transformer (ViT) trained from scratch on the Oxford Pets dataset. It classifies images of cats and dogs into 37 different breeds.
cvdl/oxford-pets)from model import ViT
import torch
model = ViT(
img_size=(128, 128),
patch_size=16,
in_channels=3,
embed_dim=240,
n_classes=37,
n_blocks=12,
n_heads=4,
mlp_ratio=2.0,
qkv_bias=True,
block_drop_p=0.1,
attn_drop_p=0.1,
)
model.load_state_dict(torch.load("ViTPets.pth"))
model.eval()