CSATv2 / README.md
Hyunil's picture
Update README.md
dd4d5c3 verified
|
raw
history blame
1.92 kB
metadata
datasets:
  - ILSVRC/imagenet-1k
metrics:
  - accuracy

CSATv2

CSATv2 is a lightweight high-resolution vision backbone designed to maximize throughput at 512×512 resolution. By applying frequency-domain compression at the input stage, the model suppresses redundant spatial information and achieves extremely fast inference.

Highlights

  • 🚀 2,800 images/s at 512×512 resolution (A6000 1×GPU)
  • Frequency-domain compression for lightweight and efficient modeling
  • 🎯 80.02% ImageNet-1K Top-1 Accuracy
  • 🪶 Only 11M parameters
  • 🧩 Suitable for image classification or as a high-throughput detection backbone

This model is an improved version of the architecture used in the paper

Special thanks to Demino for contributing ideas and feedback that greatly helped in lightweighting and optimizing the model.

Model description

image

This model is designed primarily for image classification tasks and can also serve as a high-throughput backbone for object detection.

import torch
from datasets import load_dataset
from transformers import AutoImageProcessor, AutoModelForImageClassification

# 예시 데이터: 고양이 이미지
dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

# 👉 CSATv2 모델로 교체
model_name = "Hyunil/CSATv2"

# Preprocessor + Model 로드
processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)
model = AutoModelForImageClassification.from_pretrained(model_name, trust_remote_code=True)

# 전처리
inputs = processor(image, return_tensors="pt")

# 추론
with torch.no_grad():
    logits = model(**inputs).logits

pred = logits.argmax(-1).item()
print("Predicted label:", model.config.id2label[pred])