---
library_name: keras-hub
---
### Model Overview
# Swin Transformer

Instantiates the Swin Transformer architecture.

## Model Details

The Swin Transformer (Shifted Window Transformer) is a hierarchical vision transformer whose representation is computed with shifted windows. The shifted windowing scheme brings greater efficiency by limiting self-attention computation to non-overlapping local windows while also allowing for cross-window connection.

This hierarchical architecture has the flexibility to model at various scales and has linear computational complexity with respect to image size. These qualities make Swin Transformer compatible with a broad range of vision tasks, including image classification, object detection, and semantic segmentation.

### Reference
* [Swin Transformer: Hierarchical Vision Transformer using Shifted Windows](https://arxiv.org/abs/2103.14030)

Unlike traditional Vision Transformers (ViT), which compute attention globally across all patches (resulting in quadratic complexity relative to image size), Swin Transformer computes self-attention within local non-overlapping windows. By shifting the window partition between consecutive layers, the model achieves cross-window connections, maintaining linear computational complexity while enabling robust global context modeling.

### Links
* [Swin Transformer Quickstart Notebook](https://www.kaggle.com/code/prasadsachin/swin-transformer-quickstart-keras-hub) 
* [Swin Transformer API Documentation](https://keras.io/keras_hub/api/models/swin_transformer/)
* [KerasHub Beginner Guide](https://keras.io/guides/keras_hub/getting_started/)
* [KerasHub Model Publishing Guide](https://keras.io/guides/keras_hub/upload/)

## Installation

Keras and KerasHub can be installed with:
```bash
pip install -U -q keras-hub
pip install -U -q keras
```

JAX, TensorFlow, and PyTorch come preinstalled in Kaggle Notebooks. For instructions on installing them in another environment, see the [Keras Getting Started](https://keras.io/getting_started/) page.

## Presets

The following model checkpoints are provided by the Keras team. Weights have been ported from [Hugging Face Hub](https://huggingface.co/microsoft).

| Preset name | Parameters | Description |
| :--- | :--- | :--- |
| **swin_tiny_patch4_window7_224** | 28.29M | Tiny Swin Transformer model pre-trained on ImageNet-1k at a 224x224 resolution |
| **swin_small_patch4_window7_224** | 49.61M | Small Swin Transformer model pre-trained on ImageNet-1k at a 224x224 resolution |
| **swin_base_patch4_window7_224** | 87.77M | Base Swin Transformer model pre-trained on ImageNet-1k at a 224x224 resolution |
| **swin_base_patch4_window12_384** | 87.90M | Base Swin Transformer model pre-trained on ImageNet-1k at a 384x384 resolution |
| **swin_large_patch4_window7_224** | 196.53M | Large Swin Transformer model pre-trained ImageNet-1k at a 224x224 resolution |
| **swin_large_patch4_window12_384** | 196.74M | Large Swin Transformer model pre-trained on ImageNet-1k at a 384x384 resolution |

## Example Use

```python
import numpy as np
import keras_hub

# Pretrained Swin Transformer backbone
model = keras_hub.models.SwinTransformerBackbone.from_preset("swin_tiny_patch4_window7_224")
input_data = np.random.uniform(0, 1, size=(2, 224, 224, 3))
model(input_data)

# Randomly initialized Swin Transformer backbone with custom config
model = keras_hub.models.SwinTransformerBackbone(
    image_shape=(224, 224, 3),
    embed_dim=96,
    depths=(2, 2, 6, 2),
    num_heads=(3, 6, 12, 24),
    window_size=7,
)
model(input_data)

# Use Swin Transformer for image classification task
classifier = keras_hub.models.SwinTransformerImageClassifier.from_preset(
    "swin_tiny_patch4_window7_224",
    num_classes=1000,
)

# Use Hugging Face presets directly for on-the-fly conversion
classifier = keras_hub.models.SwinTransformerImageClassifier.from_preset(
    "hf://microsoft/swin-tiny-patch4-window7-224"
)
```

## Example Usage
```
import numpy as np
import keras_hub

# Top-5 ImageNet class decoding.
model = keras_hub.models.SwinTransformerImageClassifier.from_preset(
    "swin_tiny_patch4_window7_224"
)
images = np.random.randint(0, 256, size=(1, 384, 384, 3), dtype="uint8")
logits = model.predict(images, verbose=0)
print(keras_hub.utils.decode_imagenet_predictions(logits, top=5)[0])
```

## Example Usage with Hugging Face URI

```
import numpy as np
import keras_hub

# Top-5 ImageNet class decoding.
model = keras_hub.models.SwinTransformerImageClassifier.from_preset(
    "hf://keras/swin_tiny_patch4_window7_224"
)
images = np.random.randint(0, 256, size=(1, 384, 384, 3), dtype="uint8")
logits = model.predict(images, verbose=0)
print(keras_hub.utils.decode_imagenet_predictions(logits, top=5)[0])
```