| | --- |
| | datasets: |
| | - imagenet-1k |
| | library_name: transformers |
| | pipeline_tag: image-classification |
| | --- |
| | |
| | # SwiftFormer (swiftformer-l3) |
| |
|
| | ## Model description |
| |
|
| | The SwiftFormer model was proposed in [SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications](https://arxiv.org/abs/2303.15446) by Abdelrahman Shaker, Muhammad Maaz, Hanoona Rasheed, Salman Khan, Ming-Hsuan Yang, Fahad Shahbaz Khan. |
| |
|
| | SwiftFormer paper introduces a novel efficient additive attention mechanism that effectively replaces the quadratic matrix multiplication operations in the self-attention computation with linear element-wise multiplications. A series of models called 'SwiftFormer' is built based on this, which achieves state-of-the-art performance in terms of both accuracy and mobile inference speed. Even their small variant achieves 78.5% top-1 ImageNet1K accuracy with only 0.8 ms latency on iPhone 14, which is more accurate and 2× faster compared to MobileViT-v2. |
| |
|
| | ## Intended uses & limitations |
| |
|
| |
|
| |
|
| |
|
| | ## How to use |
| |
|
| |
|
| | import requests |
| | from PIL import Image |
| | |
| | url = 'http://images.cocodataset.org/val2017/000000039769.jpg' |
| | image = Image.open(requests.get(url, stream=True).raw) |
| | |
| | from transformers import ViTImageProcessor |
| | processor = ViTImageProcessor.from_pretrained('shehan97/swiftformer-l3') |
| | inputs = processor(images=image, return_tensors="pt") |
| | |
| |
|
| | from transformers.models.swiftformer import SwiftFormerForImageClassification |
| | new_model = SwiftFormerForImageClassification.from_pretrained('shehan97/swiftformer-l3') |
| | |
| | output = new_model(inputs['pixel_values'], output_hidden_states=True) |
| | logits = output.logits |
| | predicted_class_idx = logits.argmax(-1).item() |
| | print("Predicted class:", new_model.config.id2label[predicted_class_idx]) |
| | |
| | |
| | ## Limitations and bias |
| |
|
| | ## Training data |
| |
|
| | The classification model is trained on the ImageNet-1K dataset. |
| |
|
| |
|
| | ## Training procedure |
| |
|
| | ## Evaluation results |
| |
|
| |
|
| |
|
| |
|