keras
/

vit_base_patch16_384_imagenet

KerasHub

Model card Files Files and versions

xet

Community

Divyasreepat commited on Mar 24, 2025

Commit

b45bf7c

verified ·

1 Parent(s): 609d98e

Update README.md with new model card content

Browse files

Files changed (1) hide show

README.md +90 -0

README.md CHANGED Viewed

@@ -2,5 +2,95 @@
 library_name: keras-hub
 ---
 ### Model Overview

 library_name: keras-hub
 ---
 ### Model Overview
+# Model Summary
+Vision Transformer (ViT) adapts the Transformer architecture, originally designed for natural language processing, to the domain of computer vision. It treats images as sequences of patches, similar to how Transformers treat sentences as sequences of words.. It was introduced in the paper [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929).
+## Links:
+* [Vit Quickstart Notebook](https://www.kaggle.com/code/sineeli/vit-quickstart)
+* [Vit API Documentation](coming soon)
+* [Vit Model Card](https://huggingface.co/google/vit-base-patch16-224)
+* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
+* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)
+## Installation
+Keras and KerasHub can be installed with:
+```
+pip install -U -q keras-hub
+pip install -U -q keras
+```
+## Presets
+Model ID | img_size |Acc | Top-5 | Parameters |
+:--: |:--:|:--:|:--:|:--:|
+**Base**| &nbsp;
+vit_base_patch16_224_imagenet |224|-|-|85798656|
+vit_base_patch_16_224_imagenet21k|224|-|-|85798656|
+vit_base_patch_16_384_imagenet|384|-|-|86090496|
+vit_base_patch32_224_imagenet21k|224|-|-|87455232|
+vit_base_patch32_384_imagenet|384|-|-|87528192|
+**Large**|
+vit_large_patch16_224_imagenet|224|-|-|303301632|
+vit_large_patch16_224_imagenet21k|224|-|-|303301632|
+vit_large_patch16_384_imagenet|224|-|-|303690752|
+vit_large_patch32_224_imagenet21k|224|-|-|305510400|
+vit_large_patch32_384_imagenet|224|-|-|305607680|
+**Huge**|
+vit_huge_patch14_224_imagenet21k|224|-|-|630764800|
+## Example Usage
+## Pretrained ViT model
+```
+image_classifier = keras_hub.models.ImageClassification.from_preset(
+    "vit_base_patch16_384_imagenet"
+)
+input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
+image_classifier(input_data)
+```
+## Load the backbone weights and fine-tune model for custom dataset.
+```python3
+backbone = keras_hub.models.Backbone.from_preset(
+    "vit_base_patch16_384_imagenet"
+)
+preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
+    "vit_base_patch16_384_imagenet"
+)
+model = keras_hub.models.ViTImageClassifier(
+    backbone=backbone,
+    num_classes=len(CLASSES),
+    preprocessor=preprocessor,
+)
+```
+## Example Usage with Hugging Face URI
+## Pretrained ViT model
+```
+image_classifier = keras_hub.models.ImageClassification.from_preset(
+    "hf://keras/vit_base_patch16_384_imagenet"
+)
+input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
+image_classifier(input_data)
+```
+## Load the backbone weights and fine-tune model for custom dataset.
+```python3
+backbone = keras_hub.models.Backbone.from_preset(
+    "hf://keras/vit_base_patch16_384_imagenet"
+)
+preprocessor = keras_hub.models.ViTImageClassifierPreprocessor.from_preset(
+    "hf://keras/vit_base_patch16_384_imagenet"
+)
+model = keras_hub.models.ViTImageClassifier(
+    backbone=backbone,
+    num_classes=len(CLASSES),
+    preprocessor=preprocessor,
+)
+```