Swin Transformer V2 extends the original Swin Transformer with improved attention scaling and positional encoding, enabling stable training and strong performance on very large and high-resolution vision datasets.

Original paper: Swin Transformer V2: Scaling Up Capacity and Resolution

Swin Transformer V2-Tiny

This model uses the Swin Transformer V2-Tiny variant, a compact hierarchical transformer that applies shifted window self-attention for efficient computation. It is well suited for high-resolution image classification and as a backbone for dense vision tasks such as detection and segmentation.

Model Configuration:

Reference implementation: torchvision.models.swin_v2_t
Original Weight: Swin_V2_T_Weights.IMAGENET1K_V1
Resolution: 3x256x256
Support Cooper version:
- Cooper SDK: [2.5.4]
- Cooper Foundry: [2.3]

Model	Device	compression	Model Link
SwinV2-Tiny	N1-655	Amba_optimized	Model_Link
SwinV2-Tiny	N1-655	Activation_fp16	Model_Link
SwinV2-Tiny	CV7	Amba_optimized	Model_Link
SwinV2-Tiny	CV7	Activation_fp16	Model_Link
SwinV2-Tiny	CV72	Amba_optimized	Model_Link
SwinV2-Tiny	CV72	Activation_fp16	Model_Link
SwinV2-Tiny	CV75	Amba_optimized	Model_Link
SwinV2-Tiny	CV75	Activation_fp16	Model_Link

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for Ambarella/SwinV2

Swin Transformer V2: Scaling Up Capacity and Resolution

Paper • 2111.09883 • Published Nov 18, 2021