| --- |
| license: mit |
| datasets: |
| - deepghs/monochrome_danbooru |
| metrics: |
| - accuracy |
| pipeline_tag: image-classification |
| tags: |
| - art |
| --- |
| |
| The models used for determining whether an anime image is monochrome have a training size of 384. |
|
|
| | Model | FLOPs | Accuracy | Confusion Matrix | Description | |
| |:--------------------------------:|:------:|:--------:|:----------------------------------------------------------------------------------------------------------------------------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------| |
| | caformer_s36 | 22.10G | 95.63% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36/plot_confusion.png) | Model: caformer_s36 from timm | |
| | caformer_s36_safe2 | 22.10G | 95.52% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_safe2/plot_confusion.png) | Model: caformer_s36 from timm, which have better precision and lower recall than caformer_s36 | |
| | caformer_s36_plus | 22.10G | 97.31% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_plus/plot_confusion.png) | Model: caformer_s36.sail_in22k_ft_in1k_384 pratrained from timm | |
| | caformer_s36_plus_safe2 | 22.10G | 97.09% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/caformer_s36_plus_safe2/plot_confusion.png) | Model: caformer_s36.sail_in22k_ft_in1k_384 pratrained from timm, which have better precision and lower recall than caformer_s36.sail_in22k_ft_in1k_384 | |
| | mobilenetv3_large_100 | 0.63G | 95.40% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100/plot_confusion.png) | Model: mobilenetv3_large_100 from timm | |
| | mobilenetv3_large_100_dist | 0.63G | 96.30% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_dist/plot_confusion.png) | Distillated from caformer_s36_plus, using mobilenetv3_large_100 | |
| | mobilenetv3_large_100_safe2 | 0.63G | 94.62% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_safe2/plot_confusion.png) | Model: mobilenetv3_large_100 from timm, which have better precision and lower recall than mobilenetv3_large_100 | |
| | mobilenetv3_large_100_dist_safe2 | 0.63G | 95.85% | [Confusion Matrix](https://huggingface.co/deepghs/monochrome_detect/blob/main/mobilenetv3_large_100_dist_safe2/plot_confusion.png) | Distillated from caformer_s36_plus_safe2, using mobilenetv3_large_100 | |
| |