Upload model 'animetimm/eva02_large_patch14_448.dbv4-full', on 2025-09-05 02:02:46 UTC
7f11ec9
verified
| tags: | |
| - image-classification | |
| - timm | |
| - transformers | |
| - animetimm | |
| - dghs-imgutils | |
| library_name: timm | |
| license: gpl-3.0 | |
| datasets: | |
| - animetimm/danbooru-wdtagger-v4-w640-ws-full | |
| base_model: | |
| - timm/eva02_large_patch14_448.mim_m38m_ft_in22k_in1k | |
| # Anime Tagger eva02_large_patch14_448.dbv4-full | |
| ## Model Details | |
| - **Model Type:** Multilabel Image classification / feature backbone | |
| - **Model Stats:** | |
| - Params: 316.8M | |
| - FLOPs / MACs: 620.9G / 310.1G | |
| - Image size: train = 448 x 448, test = 448 x 448 | |
| - **Dataset:** [animetimm/danbooru-wdtagger-v4-w640-ws-full](https://huggingface.co/datasets/animetimm/danbooru-wdtagger-v4-w640-ws-full) | |
| - Tags Count: 12476 | |
| - General (#0) Tags Count: 9225 | |
| - Character (#4) Tags Count: 3247 | |
| - Rating (#9) Tags Count: 4 | |
| ## Results | |
| | # | Macro@0.40 (F1/MCC/P/R) | Micro@0.40 (F1/MCC/P/R) | Macro@Best (F1/P/R) | | |
| |:----------:|:-----------------------------:|:-----------------------------:|:---------------------:| | |
| | Validation | 0.570 / 0.573 / 0.600 / 0.557 | 0.693 / 0.692 / 0.690 / 0.696 | --- | | |
| | Test | 0.569 / 0.573 / 0.600 / 0.556 | 0.693 / 0.693 / 0.691 / 0.696 | 0.599 / 0.600 / 0.618 | | |
| * `Macro/Micro@0.40` means the metrics on the threshold 0.40. | |
| * `Macro@Best` means the mean metrics on the tag-level thresholds on each tags, which should have the best F1 scores. | |
| ## Thresholds | |
| | Category | Name | Alpha | Threshold | Micro@Thr (F1/P/R) | Macro@0.40 (F1/P/R) | Macro@Best (F1/P/R) | | |
| |:----------:|:---------:|:-------:|:-----------:|:---------------------:|:---------------------:|:---------------------:| | |
| | 0 | general | 1 | 0.39 | 0.681 / 0.674 / 0.688 | 0.445 / 0.485 / 0.427 | 0.480 / 0.476 / 0.510 | | |
| | 4 | character | 1 | 0.61 | 0.943 / 0.961 / 0.925 | 0.921 / 0.925 / 0.920 | 0.938 / 0.954 / 0.924 | | |
| | 9 | rating | 1 | 0.38 | 0.832 / 0.801 / 0.865 | 0.838 / 0.817 / 0.860 | 0.839 / 0.819 / 0.861 | | |
| * `Micro@Thr` means the metrics on the category-level suggested thresholds, which are listed in the table above. | |
| * `Macro@0.40` means the metrics on the threshold 0.40. | |
| * `Macro@Best` means the metrics on the tag-level thresholds on each tags, which should have the best F1 scores. | |
| For tag-level thresholds, you can find them in [selected_tags.csv](https://huggingface.co/animetimm/eva02_large_patch14_448.dbv4-full/resolve/main/selected_tags.csv). | |
| ## How to Use | |
| We provided a sample image for our code samples, you can find it [here](https://huggingface.co/animetimm/eva02_large_patch14_448.dbv4-full/blob/main/sample.webp). | |
| ### Use TIMM And Torch | |
| Install [dghs-imgutils](https://github.com/deepghs/imgutils), [timm](https://github.com/huggingface/pytorch-image-models) and other necessary requirements with the following command | |
| ```shell | |
| pip install 'dghs-imgutils>=0.17.0' torch huggingface_hub timm pillow pandas | |
| ``` | |
| After that you can load this model with timm library, and use it for train, validation and test, with the following code | |
| ```python | |
| import json | |
| import pandas as pd | |
| import torch | |
| from huggingface_hub import hf_hub_download | |
| from imgutils.data import load_image | |
| from imgutils.preprocess import create_torchvision_transforms | |
| from timm import create_model | |
| repo_id = 'animetimm/eva02_large_patch14_448.dbv4-full' | |
| model = create_model(f'hf-hub:{repo_id}', pretrained=True) | |
| model.eval() | |
| with open(hf_hub_download(repo_id=repo_id, repo_type='model', filename='preprocess.json'), 'r') as f: | |
| preprocessor = create_torchvision_transforms(json.load(f)['test']) | |
| # Compose( | |
| # PadToSize(size=(512, 512), interpolation=bilinear, background_color=white) | |
| # Resize(size=(448, 448), interpolation=bicubic, max_size=None, antialias=True) | |
| # CenterCrop(size=[448, 448]) | |
| # MaybeToTensor() | |
| # Normalize(mean=tensor([0.4815, 0.4578, 0.4082]), std=tensor([0.2686, 0.2613, 0.2758])) | |
| # ) | |
| image = load_image('https://huggingface.co/animetimm/eva02_large_patch14_448.dbv4-full/resolve/main/sample.webp') | |
| input_ = preprocessor(image).unsqueeze(0) | |
| # input_, shape: torch.Size([1, 3, 448, 448]), dtype: torch.float32 | |
| with torch.no_grad(): | |
| output = model(input_) | |
| prediction = torch.sigmoid(output)[0] | |
| # output, shape: torch.Size([1, 12476]), dtype: torch.float32 | |
| # prediction, shape: torch.Size([12476]), dtype: torch.float32 | |
| df_tags = pd.read_csv( | |
| hf_hub_download(repo_id=repo_id, repo_type='model', filename='selected_tags.csv'), | |
| keep_default_na=False | |
| ) | |
| tags = df_tags['name'] | |
| mask = prediction.numpy() >= df_tags['best_threshold'] | |
| print(dict(zip(tags[mask].tolist(), prediction[mask].tolist()))) | |
| # {'sensitive': 0.9555495381355286, | |
| # '1girl': 0.9977720379829407, | |
| # 'solo': 0.9800751209259033, | |
| # 'looking_at_viewer': 0.7236320972442627, | |
| # 'blush': 0.7710952758789062, | |
| # 'smile': 0.8856169581413269, | |
| # 'short_hair': 0.803878128528595, | |
| # 'long_sleeves': 0.3804128170013428, | |
| # 'brown_hair': 0.6562796831130981, | |
| # 'dress': 0.5758444666862488, | |
| # 'sitting': 0.7712022066116333, | |
| # 'purple_eyes': 0.5440564751625061, | |
| # 'flower': 0.9287881851196289, | |
| # 'braid': 0.8394284844398499, | |
| # 'tears': 0.778815746307373, | |
| # 'floral_print': 0.43895024061203003, | |
| # 'plant': 0.6179906725883484, | |
| # 'blue_flower': 0.30160021781921387, | |
| # 'crown_braid': 0.40592360496520996, | |
| # 'potted_plant': 0.5879666209220886, | |
| # 'flower_pot': 0.49822214245796204, | |
| # 'wiping_tears': 0.4761575758457184} | |
| ``` | |
| ### Use ONNX Model For Inference | |
| Install [dghs-imgutils](https://github.com/deepghs/imgutils) with the following command | |
| ```shell | |
| pip install 'dghs-imgutils>=0.17.0' | |
| ``` | |
| Use `multilabel_timm_predict` function with the following code | |
| ```python | |
| from imgutils.generic import multilabel_timm_predict | |
| general, character, rating = multilabel_timm_predict( | |
| 'https://huggingface.co/animetimm/eva02_large_patch14_448.dbv4-full/resolve/main/sample.webp', | |
| repo_id='animetimm/eva02_large_patch14_448.dbv4-full', | |
| fmt=('general', 'character', 'rating'), | |
| ) | |
| print(general) | |
| # {'1girl': 0.9977719783782959, | |
| # 'solo': 0.9800750613212585, | |
| # 'flower': 0.9287877082824707, | |
| # 'smile': 0.8856177926063538, | |
| # 'braid': 0.8394323587417603, | |
| # 'short_hair': 0.8038788437843323, | |
| # 'tears': 0.7787976264953613, | |
| # 'sitting': 0.7712044715881348, | |
| # 'blush': 0.7710968255996704, | |
| # 'looking_at_viewer': 0.7236329317092896, | |
| # 'brown_hair': 0.6562790870666504, | |
| # 'plant': 0.6180056929588318, | |
| # 'potted_plant': 0.5879812836647034, | |
| # 'dress': 0.5758441686630249, | |
| # 'purple_eyes': 0.5440553426742554, | |
| # 'flower_pot': 0.4982312321662903, | |
| # 'wiping_tears': 0.47614389657974243, | |
| # 'floral_print': 0.43895548582077026, | |
| # 'crown_braid': 0.40593117475509644, | |
| # 'long_sleeves': 0.3804135322570801, | |
| # 'blue_flower': 0.3015919327735901} | |
| print(character) | |
| # {} | |
| print(rating) | |
| # {'sensitive': 0.9555498361587524} | |
| ``` | |
| For further information, see [documentation of function multilabel_timm_predict](https://dghs-imgutils.deepghs.org/main/api_doc/generic/multilabel_timm.html#multilabel-timm-predict). | |