ViT (ImageNet-21k)
Collection
4 items • Updated
How to use cs-giung/vit-base-patch16-imagenet21k-augreg with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("image-classification", model="cs-giung/vit-base-patch16-imagenet21k-augreg")
pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png") # Load model directly
from transformers import AutoImageProcessor, AutoModelForImageClassification
processor = AutoImageProcessor.from_pretrained("cs-giung/vit-base-patch16-imagenet21k-augreg")
model = AutoModelForImageClassification.from_pretrained("cs-giung/vit-base-patch16-imagenet21k-augreg")Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224x224. It was introduced in the paper An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale and further enhanced in the follow-up paper How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers.
The weights were converted from the B_16-i21k-300ep-lr_0.001-aug_medium1-wd_0.1-do_0.0-sd_0.0.npz file in GCS buckets presented in the original repository.