narugo1992

Upload model 'animetimm/repvit_m2_3.dbv4-full', on 2026-05-28 18:37:20 HKT

79fda56 verified 11 days ago

8.65 kB

	---
	tags:
	- image-classification
	- timm
	- transformers
	- animetimm
	- dghs-imgutils
	library_name: timm
	license: gpl-3.0
	datasets:
	- animetimm/danbooru-wdtagger-v4-w640-ws-full
	base_model:
	- timm/repvit_m2_3.dist_450e_in1k
	---

	# Anime Tagger repvit_m2_3.dbv4-full

	## Model Details

	- Model Type: Multilabel Image classification / feature backbone
	- Model Stats:
	- Params: 30.4M
	- FLOPs / MACs: 26.8G / 13.3G
	- Image size: train = 384 x 384, test = 384 x 384
	- Dataset: [animetimm/danbooru-wdtagger-v4-w640-ws-full](https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full)
	- Tags Count: 12476
	- General (#0) Tags Count: 9225
	- Character (#4) Tags Count: 3247
	- Rating (#9) Tags Count: 4

	## Results

	\| # \| Macro@0.40 (F1/MCC/P/R) \| Micro@0.40 (F1/MCC/P/R) \| Macro@Best (F1/P/R) \|
	\|:----------:\|:-----------------------------:\|:-----------------------------:\|:---------------------:\|
	\| Validation \| 0.473 / 0.478 / 0.483 / 0.488 \| 0.637 / 0.636 / 0.627 / 0.647 \| --- \|
	\| Test \| 0.474 / 0.478 / 0.483 / 0.489 \| 0.637 / 0.636 / 0.627 / 0.648 \| 0.510 / 0.534 / 0.516 \|

	* `Macro/Micro@0.40` means the metrics on the threshold 0.40.
	* `Macro@Best` means the mean metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

	## Thresholds

	\| Category \| Name \| Alpha \| Threshold \| Micro@Thr (F1/P/R) \| Macro@0.40 (F1/P/R) \| Macro@Best (F1/P/R) \|
	\|:----------:\|:---------:\|:-------:\|:-----------:\|:---------------------:\|:---------------------:\|:---------------------:\|
	\| 0 \| general \| 1 \| 0.42 \| 0.625 / 0.628 / 0.622 \| 0.354 / 0.369 / 0.366 \| 0.388 / 0.402 / 0.409 \|
	\| 4 \| character \| 1 \| 0.64 \| 0.864 / 0.916 / 0.817 \| 0.816 / 0.805 / 0.837 \| 0.857 / 0.909 / 0.817 \|
	\| 9 \| rating \| 1 \| 0.39 \| 0.807 / 0.758 / 0.863 \| 0.813 / 0.778 / 0.855 \| 0.817 / 0.789 / 0.850 \|

	* `Micro@Thr` means the metrics on the category-level suggested thresholds, which are listed in the table above.
	* `Macro@0.40` means the metrics on the threshold 0.40.
	* `Macro@Best` means the metrics on the tag-level thresholds on each tags, which should have the best F1 scores.

	For tag-level thresholds, you can find them in [selected_tags.csv](https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/selected_tags.csv).

	## How to Use

	We provided a sample image for our code samples, you can find it [here](https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/blob/main/sample.webp).

	### Use TIMM And Torch

	Install [dghs-imgutils](https://github.com/deepghs/imgutils), [timm](https://github.com/huggingface/pytorch-image-models) and other necessary requirements with the following command

	```shell
	pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow pandas
	```

	After that you can load this model with timm library, and use it for train, validation and test, with the following code

	```python
	import json

	import pandas as pd
	import torch
	from huggingface_hub import hf_hub_download
	from imgutils.data import load_image
	from imgutils.preprocess import create_torchvision_transforms
	from timm import create_model

	repo_id = 'animetimm/repvit_m2_3.dbv4-full'
	model = create_model(f'hf-hub:{repo_id}', pretrained=True)
	model.eval()

	with open(hf_hub_download(repo_id=repo_id, repo_type='model', filename='preprocess.json'), 'r') as f:
	preprocessor = create_torchvision_transforms(json.load(f)['test'])
	# Compose(
	# PadToSize(size=(384, 384), interpolation=bilinear, background_color=white)
	# Resize(size=384, interpolation=bicubic, max_size=None, antialias=True)
	# CenterCrop(size=[384, 384])
	# MaybeToTensor()
	# Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250]))
	# )

	image = load_image('https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/sample.webp')
	input_ = preprocessor(image).unsqueeze(0)
	# input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32
	with torch.no_grad():
	output = model(input_)
	prediction = torch.sigmoid(output)[0]
	# output, shape: torch.Size([1, 12476]), dtype: torch.float32
	# prediction, shape: torch.Size([12476]), dtype: torch.float32

	df_tags = pd.read_csv(
	hf_hub_download(repo_id=repo_id, repo_type='model', filename='selected_tags.csv'),
	keep_default_na=False
	)
	tags = df_tags['name']
	mask = prediction.numpy() >= df_tags['best_threshold']
	print(dict(zip(tags[mask].tolist(), prediction[mask].tolist())))
	# {'sensitive': 0.8662903904914856,
	# '1girl': 0.9911483526229858,
	# 'long_hair': 0.7848780155181885,
	# 'breasts': 0.6503047943115234,
	# 'shirt': 0.40559232234954834,
	# 'simple_background': 0.4496791362762451,
	# 'holding': 0.392380028963089,
	# 'white_background': 0.45867159962654114,
	# '1boy': 0.9859222173690796,
	# 'dress': 0.895875096321106,
	# 'jewelry': 0.512308657169342,
	# 'white_shirt': 0.6412474513053894,
	# 'ponytail': 0.36655429005622864,
	# 'grey_hair': 0.5069402456283569,
	# 'weapon': 0.7197086811065674,
	# 'earrings': 0.6887646317481995,
	# 'sleeveless': 0.5068082213401794,
	# 'barefoot': 0.6119781136512756,
	# 'hair_over_one_eye': 0.7765354514122009,
	# 'looking_to_the_side': 0.1008133813738823,
	# 'sleeveless_dress': 0.6517123579978943,
	# 'blood': 0.4474491775035858,
	# 'scar': 0.5092765688896179,
	# 'chinese_clothes': 0.8715230226516724,
	# 'mouth_hold': 0.4852164387702942,
	# 'leg_up': 0.21184596419334412,
	# 'eyepatch': 0.907089352607727,
	# 'china_dress': 0.9197738766670227,
	# 'carrying': 0.4297461211681366,
	# 'side_slit': 0.49013879895210266,
	# 'one_eye_covered': 0.47284403443336487,
	# 'cigarette': 0.9730519652366638,
	# 'smoking': 0.9113378524780273,
	# 'stitches': 0.5202369689941406,
	# 'tassel_earrings': 0.24812373518943787,
	# 'quanxi_(chainsaw_man)': 0.9999498128890991}
	```

	### Use ONNX Model For Inference

	Install [dghs-imgutils](https://github.com/deepghs/imgutils) with the following command

	```shell
	pip install 'dghs-imgutils>=0.19.0'
	```

	Use `multilabel_timm_predict` function with the following code

	```python
	from imgutils.generic import multilabel_timm_predict

	general, character, rating = multilabel_timm_predict(
	'https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/sample.webp',
	repo_id='animetimm/repvit_m2_3.dbv4-full',
	fmt=('general', 'character', 'rating'),
	)

	print(general)
	# {'1girl': 0.9911484718322754,
	# '1boy': 0.9859222173690796,
	# 'cigarette': 0.9730523824691772,
	# 'china_dress': 0.9197739362716675,
	# 'smoking': 0.9113388061523438,
	# 'eyepatch': 0.9070903062820435,
	# 'dress': 0.8958752155303955,
	# 'chinese_clothes': 0.8715232014656067,
	# 'long_hair': 0.7848783731460571,
	# 'hair_over_one_eye': 0.7765361070632935,
	# 'weapon': 0.7197083830833435,
	# 'earrings': 0.6887662410736084,
	# 'sleeveless_dress': 0.6517126560211182,
	# 'breasts': 0.6503055095672607,
	# 'white_shirt': 0.6412477493286133,
	# 'barefoot': 0.6119793057441711,
	# 'stitches': 0.5202380418777466,
	# 'jewelry': 0.512310266494751,
	# 'scar': 0.5092771053314209,
	# 'grey_hair': 0.5069411993026733,
	# 'sleeveless': 0.5068085789680481,
	# 'side_slit': 0.49013906717300415,
	# 'mouth_hold': 0.48521846532821655,
	# 'one_eye_covered': 0.4728451669216156,
	# 'white_background': 0.4586714804172516,
	# 'simple_background': 0.449679434299469,
	# 'blood': 0.44745004177093506,
	# 'carrying': 0.4297464191913605,
	# 'shirt': 0.4055924117565155,
	# 'holding': 0.3923792541027069,
	# 'ponytail': 0.3665536940097809,
	# 'tassel_earrings': 0.24812528491020203,
	# 'leg_up': 0.21184629201889038,
	# 'looking_to_the_side': 0.10081371665000916}
	print(character)
	# {'quanxi_(chainsaw_man)': 0.9999498128890991}
	print(rating)
	# {'sensitive': 0.8662903308868408}
	```

	For further information, see [documentation of function multilabel_timm_predict](https://dghs-imgutils.deepghs.org/main/api_doc/generic/multilabel_timm.html#multilabel-timm-predict).

	## Citation

	```
	@misc{repvit_m2_3_dbv4_full,
	title = {Anime Tagger repvit_m2_3.dbv4-full},
	author = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)},
	year = {2026},
	howpublished = {\url{https://huggingface.co/animetimm/repvit_m2_3.dbv4-full}},
	note = {A large-scale anime-style image classification model based on repvit_m2_3 architecture for multi-label tagging with 12476 tags, trained on anime dataset dbv4-full (\url{https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full}). Model parameters: 30.4M, FLOPs: 26.8G, input resolution: 384×384.},
	license = {gpl-3.0}
	}
	```