amd
/

FDViT_b

Model card Files Files and versions

FDViT_b / README.md

yttdebaba's picture

Upload 5 files

9d5f243 verified over 1 year ago

|

history blame contribute delete

1.42 kB

	---
	license: apache-2.0
	---
	## FDViT: Improve the Hierarchical Architecture of Vision Transformer (ICCV 2023)

	Yixing Xu, Chao Li, Dong Li, Xiao Sheng, Fan Jiang, Lu Tian, Ashish Sirasao \| [Paper](https://openaccess.thecvf.com/content/ICCV2023/papers/Xu_FDViT_Improve_the_Hierarchical_Architecture_of_Vision_Transformer_ICCV_2023_paper.pdf)

	Advanced Micro Devices, Inc.

	---

	## Dependancies

	```bash
	torch == 1.13.1
	torchvision == 0.14.1
	timm == 0.6.12
	einops == 0.6.1
	```

	## Model performance

	The image classification results of FDViT models on ImageNet dataset are shown in the following table.

	\|Model\|Parameters (M)\|FLOPs(G)\|Top-1 Accuracy (%)\|
	\|-\|-\|-\|-\|
	\|FDViT-Ti\|4.6\|0.6\|73.74\|
	\|FDViT-S\|21.6\|2.8\|81.45\|
	\|FDViT-B\|68.1\|11.9\|82.39\|

	## Model Usage

	```bash
	from transformers import AutoModelForImageClassification
	import torch

	model = AutoModelForImageClassification.from_pretrained("FDViT_b", trust_remote_code=True)

	model.eval()

	inp = torch.ones(1,3,224,224)
	out = model(inp)
	```

	## Citation

	```
	@inproceedings{xu2023fdvit,
	title={FDViT: Improve the Hierarchical Architecture of Vision Transformer},
	author={Xu, Yixing and Li, Chao and Li, Dong and Sheng, Xiao and Jiang, Fan and Tian, Lu and Sirasao, Ashish},
	booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
	pages={5950--5960},
	year={2023}
	}
	```