Instructions to use animetimm/repvit_m2_3.dbv4-full with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- timm
How to use animetimm/repvit_m2_3.dbv4-full with timm:
import timm model = timm.create_model("hf_hub:animetimm/repvit_m2_3.dbv4-full", pretrained=True) - Transformers
How to use animetimm/repvit_m2_3.dbv4-full with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-classification", model="animetimm/repvit_m2_3.dbv4-full") pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("animetimm/repvit_m2_3.dbv4-full", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - image-classification | |
| - timm | |
| - transformers | |
| - animetimm | |
| - dghs-imgutils | |
| library_name: timm | |
| license: gpl-3.0 | |
| datasets: | |
| - animetimm/danbooru-wdtagger-v4-w640-ws-full | |
| base_model: | |
| - timm/repvit_m2_3.dist_450e_in1k | |
| # Anime Tagger repvit_m2_3.dbv4-full | |
| ## Model Details | |
| - **Model Type:** Multilabel Image classification / feature backbone | |
| - **Model Stats:** | |
| - Params: 30.4M | |
| - FLOPs / MACs: 26.8G / 13.3G | |
| - Image size: train = 384 x 384, test = 384 x 384 | |
| - **Dataset:** [animetimm/danbooru-wdtagger-v4-w640-ws-full](https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full) | |
| - Tags Count: 12476 | |
| - General (#0) Tags Count: 9225 | |
| - Character (#4) Tags Count: 3247 | |
| - Rating (#9) Tags Count: 4 | |
| ## Results | |
| | # | Macro@0.40 (F1/MCC/P/R) | Micro@0.40 (F1/MCC/P/R) | Macro@Best (F1/P/R) | | |
| |:----------:|:-----------------------------:|:-----------------------------:|:---------------------:| | |
| | Validation | 0.473 / 0.478 / 0.483 / 0.488 | 0.637 / 0.636 / 0.627 / 0.647 | --- | | |
| | Test | 0.474 / 0.478 / 0.483 / 0.489 | 0.637 / 0.636 / 0.627 / 0.648 | 0.510 / 0.534 / 0.516 | | |
| * `Macro/Micro@0.40` means the metrics on the threshold 0.40. | |
| * `Macro@Best` means the mean metrics on the tag-level thresholds on each tags, which should have the best F1 scores. | |
| ## Thresholds | |
| | Category | Name | Alpha | Threshold | Micro@Thr (F1/P/R) | Macro@0.40 (F1/P/R) | Macro@Best (F1/P/R) | | |
| |:----------:|:---------:|:-------:|:-----------:|:---------------------:|:---------------------:|:---------------------:| | |
| | 0 | general | 1 | 0.42 | 0.625 / 0.628 / 0.622 | 0.354 / 0.369 / 0.366 | 0.388 / 0.402 / 0.409 | | |
| | 4 | character | 1 | 0.64 | 0.864 / 0.916 / 0.817 | 0.816 / 0.805 / 0.837 | 0.857 / 0.909 / 0.817 | | |
| | 9 | rating | 1 | 0.39 | 0.807 / 0.758 / 0.863 | 0.813 / 0.778 / 0.855 | 0.817 / 0.789 / 0.850 | | |
| * `Micro@Thr` means the metrics on the category-level suggested thresholds, which are listed in the table above. | |
| * `Macro@0.40` means the metrics on the threshold 0.40. | |
| * `Macro@Best` means the metrics on the tag-level thresholds on each tags, which should have the best F1 scores. | |
| For tag-level thresholds, you can find them in [selected_tags.csv](https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/selected_tags.csv). | |
| ## How to Use | |
| We provided a sample image for our code samples, you can find it [here](https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/blob/main/sample.webp). | |
| ### Use TIMM And Torch | |
| Install [dghs-imgutils](https://github.com/deepghs/imgutils), [timm](https://github.com/huggingface/pytorch-image-models) and other necessary requirements with the following command | |
| ```shell | |
| pip install 'dghs-imgutils>=0.19.0' torch huggingface_hub timm pillow pandas | |
| ``` | |
| After that you can load this model with timm library, and use it for train, validation and test, with the following code | |
| ```python | |
| import json | |
| import pandas as pd | |
| import torch | |
| from huggingface_hub import hf_hub_download | |
| from imgutils.data import load_image | |
| from imgutils.preprocess import create_torchvision_transforms | |
| from timm import create_model | |
| repo_id = 'animetimm/repvit_m2_3.dbv4-full' | |
| model = create_model(f'hf-hub:{repo_id}', pretrained=True) | |
| model.eval() | |
| with open(hf_hub_download(repo_id=repo_id, repo_type='model', filename='preprocess.json'), 'r') as f: | |
| preprocessor = create_torchvision_transforms(json.load(f)['test']) | |
| # Compose( | |
| # PadToSize(size=(384, 384), interpolation=bilinear, background_color=white) | |
| # Resize(size=384, interpolation=bicubic, max_size=None, antialias=True) | |
| # CenterCrop(size=[384, 384]) | |
| # MaybeToTensor() | |
| # Normalize(mean=tensor([0.4850, 0.4560, 0.4060]), std=tensor([0.2290, 0.2240, 0.2250])) | |
| # ) | |
| image = load_image('https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/sample.webp') | |
| input_ = preprocessor(image).unsqueeze(0) | |
| # input_, shape: torch.Size([1, 3, 384, 384]), dtype: torch.float32 | |
| with torch.no_grad(): | |
| output = model(input_) | |
| prediction = torch.sigmoid(output)[0] | |
| # output, shape: torch.Size([1, 12476]), dtype: torch.float32 | |
| # prediction, shape: torch.Size([12476]), dtype: torch.float32 | |
| df_tags = pd.read_csv( | |
| hf_hub_download(repo_id=repo_id, repo_type='model', filename='selected_tags.csv'), | |
| keep_default_na=False | |
| ) | |
| tags = df_tags['name'] | |
| mask = prediction.numpy() >= df_tags['best_threshold'] | |
| print(dict(zip(tags[mask].tolist(), prediction[mask].tolist()))) | |
| # {'sensitive': 0.8662903904914856, | |
| # '1girl': 0.9911483526229858, | |
| # 'long_hair': 0.7848780155181885, | |
| # 'breasts': 0.6503047943115234, | |
| # 'shirt': 0.40559232234954834, | |
| # 'simple_background': 0.4496791362762451, | |
| # 'holding': 0.392380028963089, | |
| # 'white_background': 0.45867159962654114, | |
| # '1boy': 0.9859222173690796, | |
| # 'dress': 0.895875096321106, | |
| # 'jewelry': 0.512308657169342, | |
| # 'white_shirt': 0.6412474513053894, | |
| # 'ponytail': 0.36655429005622864, | |
| # 'grey_hair': 0.5069402456283569, | |
| # 'weapon': 0.7197086811065674, | |
| # 'earrings': 0.6887646317481995, | |
| # 'sleeveless': 0.5068082213401794, | |
| # 'barefoot': 0.6119781136512756, | |
| # 'hair_over_one_eye': 0.7765354514122009, | |
| # 'looking_to_the_side': 0.1008133813738823, | |
| # 'sleeveless_dress': 0.6517123579978943, | |
| # 'blood': 0.4474491775035858, | |
| # 'scar': 0.5092765688896179, | |
| # 'chinese_clothes': 0.8715230226516724, | |
| # 'mouth_hold': 0.4852164387702942, | |
| # 'leg_up': 0.21184596419334412, | |
| # 'eyepatch': 0.907089352607727, | |
| # 'china_dress': 0.9197738766670227, | |
| # 'carrying': 0.4297461211681366, | |
| # 'side_slit': 0.49013879895210266, | |
| # 'one_eye_covered': 0.47284403443336487, | |
| # 'cigarette': 0.9730519652366638, | |
| # 'smoking': 0.9113378524780273, | |
| # 'stitches': 0.5202369689941406, | |
| # 'tassel_earrings': 0.24812373518943787, | |
| # 'quanxi_(chainsaw_man)': 0.9999498128890991} | |
| ``` | |
| ### Use ONNX Model For Inference | |
| Install [dghs-imgutils](https://github.com/deepghs/imgutils) with the following command | |
| ```shell | |
| pip install 'dghs-imgutils>=0.19.0' | |
| ``` | |
| Use `multilabel_timm_predict` function with the following code | |
| ```python | |
| from imgutils.generic import multilabel_timm_predict | |
| general, character, rating = multilabel_timm_predict( | |
| 'https://huggingface.co/animetimm/repvit_m2_3.dbv4-full/resolve/main/sample.webp', | |
| repo_id='animetimm/repvit_m2_3.dbv4-full', | |
| fmt=('general', 'character', 'rating'), | |
| ) | |
| print(general) | |
| # {'1girl': 0.9911484718322754, | |
| # '1boy': 0.9859222173690796, | |
| # 'cigarette': 0.9730523824691772, | |
| # 'china_dress': 0.9197739362716675, | |
| # 'smoking': 0.9113388061523438, | |
| # 'eyepatch': 0.9070903062820435, | |
| # 'dress': 0.8958752155303955, | |
| # 'chinese_clothes': 0.8715232014656067, | |
| # 'long_hair': 0.7848783731460571, | |
| # 'hair_over_one_eye': 0.7765361070632935, | |
| # 'weapon': 0.7197083830833435, | |
| # 'earrings': 0.6887662410736084, | |
| # 'sleeveless_dress': 0.6517126560211182, | |
| # 'breasts': 0.6503055095672607, | |
| # 'white_shirt': 0.6412477493286133, | |
| # 'barefoot': 0.6119793057441711, | |
| # 'stitches': 0.5202380418777466, | |
| # 'jewelry': 0.512310266494751, | |
| # 'scar': 0.5092771053314209, | |
| # 'grey_hair': 0.5069411993026733, | |
| # 'sleeveless': 0.5068085789680481, | |
| # 'side_slit': 0.49013906717300415, | |
| # 'mouth_hold': 0.48521846532821655, | |
| # 'one_eye_covered': 0.4728451669216156, | |
| # 'white_background': 0.4586714804172516, | |
| # 'simple_background': 0.449679434299469, | |
| # 'blood': 0.44745004177093506, | |
| # 'carrying': 0.4297464191913605, | |
| # 'shirt': 0.4055924117565155, | |
| # 'holding': 0.3923792541027069, | |
| # 'ponytail': 0.3665536940097809, | |
| # 'tassel_earrings': 0.24812528491020203, | |
| # 'leg_up': 0.21184629201889038, | |
| # 'looking_to_the_side': 0.10081371665000916} | |
| print(character) | |
| # {'quanxi_(chainsaw_man)': 0.9999498128890991} | |
| print(rating) | |
| # {'sensitive': 0.8662903308868408} | |
| ``` | |
| For further information, see [documentation of function multilabel_timm_predict](https://dghs-imgutils.deepghs.org/main/api_doc/generic/multilabel_timm.html#multilabel-timm-predict). | |
| ## Citation | |
| ``` | |
| @misc{repvit_m2_3_dbv4_full, | |
| title = {Anime Tagger repvit_m2_3.dbv4-full}, | |
| author = {narugo1992 and Deep Generative anime Hobbyist Syndicate (DeepGHS)}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/animetimm/repvit_m2_3.dbv4-full}}, | |
| note = {A large-scale anime-style image classification model based on repvit_m2_3 architecture for multi-label tagging with 12476 tags, trained on anime dataset dbv4-full (\url{https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full}). Model parameters: 30.4M, FLOPs: 26.8G, input resolution: 384×384.}, | |
| license = {gpl-3.0} | |
| } | |
| ``` | |