Hyunil
/

CSATv2

Model card Files Files and versions

CSATv2 / README.md

Hyunil's picture

Update README.md

dd4d5c3 verified about 1 month ago

|

1.92 kB

	---
	datasets:
	- ILSVRC/imagenet-1k
	metrics:
	- accuracy
	---
	CSATv2

	CSATv2 is a lightweight high-resolution vision backbone designed to maximize throughput at 512×512 resolution.
	By applying frequency-domain compression at the input stage, the model suppresses redundant spatial information and achieves extremely fast inference.

	## Highlights

	- 🚀 2,800 images/s at 512×512 resolution (A6000 1×GPU)
	- ⚡ Frequency-domain compression for lightweight and efficient modeling
	- 🎯 80.02% ImageNet-1K Top-1 Accuracy
	- 🪶 Only 11M parameters
	- 🧩 Suitable for image classification or as a high-throughput detection backbone

	This model is an improved version of the architecture used in the [paper](https://www.mdpi.com/2306-5354/10/11/1279)

	Special thanks to Demino for contributing ideas and feedback that greatly helped in lightweighting and optimizing the model.

	Model description

	![image](https://cdn-uploads.huggingface.co/production/uploads/633a801b7646c9f51a05cc92/pynK0OWbjH5WUlu8L7OTj.png)

	This model is designed primarily for image classification tasks and can also serve as a high-throughput backbone for object detection.
	```python
	import torch
	from datasets import load_dataset
	from transformers import AutoImageProcessor, AutoModelForImageClassification

	# 예시 데이터: 고양이 이미지
	dataset = load_dataset("huggingface/cats-image")
	image = dataset["test"]["image"][0]

	# 👉 CSATv2 모델로 교체
	model_name = "Hyunil/CSATv2"

	# Preprocessor + Model 로드
	processor = AutoImageProcessor.from_pretrained(model_name, trust_remote_code=True)
	model = AutoModelForImageClassification.from_pretrained(model_name, trust_remote_code=True)

	# 전처리
	inputs = processor(image, return_tensors="pt")

	# 추론
	with torch.no_grad():
	logits = model(**inputs).logits

	pred = logits.argmax(-1).item()
	print("Predicted label:", model.config.id2label[pred])
	```