Swin Transformer V2: Scaling Up Capacity and Resolution
Paper • 2111.09883 • Published
Swin Transformer V2 extends the original Swin Transformer with improved attention scaling and positional encoding, enabling stable training and strong performance on very large and high-resolution vision datasets.
Original paper: Swin Transformer V2: Scaling Up Capacity and Resolution
This model uses the Swin Transformer V2-Tiny variant, a compact hierarchical transformer that applies shifted window self-attention for efficient computation. It is well suited for high-resolution image classification and as a backbone for dense vision tasks such as detection and segmentation.
Model Configuration:
| Model | Device | compression | Model Link |
|---|---|---|---|
| SwinV2-Tiny | N1-655 | Amba_optimized | Model_Link |
| SwinV2-Tiny | N1-655 | Activation_fp16 | Model_Link |
| SwinV2-Tiny | CV7 | Amba_optimized | Model_Link |
| SwinV2-Tiny | CV7 | Activation_fp16 | Model_Link |
| SwinV2-Tiny | CV72 | Amba_optimized | Model_Link |
| SwinV2-Tiny | CV72 | Activation_fp16 | Model_Link |
| SwinV2-Tiny | CV75 | Amba_optimized | Model_Link |
| SwinV2-Tiny | CV75 | Activation_fp16 | Model_Link |