|
|
--- |
|
|
datasets: |
|
|
- ILSVRC/imagenet-1k |
|
|
metrics: |
|
|
- accuracy |
|
|
--- |
|
|
CSATv2 |
|
|
|
|
|
CSATv2 is a lightweight high-resolution vision backbone designed to maximize throughput at 512×512 resolution. |
|
|
By applying frequency-domain compression at the input stage, the model suppresses redundant spatial information and achieves extremely fast inference. |
|
|
|
|
|
## Highlights |
|
|
|
|
|
- 🚀 **2,800 images/s at 512×512 resolution (A6000 1×GPU)** |
|
|
- ⚡ **Frequency-domain compression** for lightweight and efficient modeling |
|
|
- 🎯 **80.02%** ImageNet-1K Top-1 Accuracy |
|
|
- 🪶 Only **11M parameters** |
|
|
- 🧩 Suitable for **image classification** or as a **high-throughput detection backbone** |
|
|
|
|
|
This model is an improved version of the architecture used in the [paper](https://www.mdpi.com/2306-5354/10/11/1279) |
|
|
|
|
|
Special thanks to **Demino** for contributing ideas and feedback that greatly helped in lightweighting and optimizing the model. |
|
|
|
|
|
Model description |
|
|
|
|
|
 |
|
|
|
|
|
This model is designed primarily for image classification tasks and can also serve as a high-throughput backbone for object detection. |
|
|
```python |
|
|
import torch |
|
|
from datasets import load_dataset |
|
|
from transformers import AutoImageProcessor, AutoModelForImageClassification |
|
|
|
|
|
# 예시 데이터: 고양이 이미지 |
|
|
dataset = load_dataset("huggingface/cats-image") |
|
|
image = dataset["test"]["image"][0] |
|
|
|
|
|
# 👉 CSATv2 모델로 교체 |
|
|
model_name = "Hyunil/CSATv2" |
|
|
|
|
|
# Preprocessor + Model 로드 |
|
|
processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True) |
|
|
model = AutoModelForImageClassification.from_pretrained(model_name, trust_remote_code=True) |
|
|
|
|
|
# 전처리 |
|
|
inputs = processor(image, return_tensors="pt") |
|
|
|
|
|
# 추론 |
|
|
with torch.no_grad(): |
|
|
logits = model(**inputs).logits |
|
|
|
|
|
pred = logits.argmax(-1).item() |
|
|
print("Predicted label:", model.config.id2label[pred]) |
|
|
``` |