narugo1992 commited on
Commit
65a6a2e
·
verified ·
1 Parent(s): 60009d1

Auto-update README.md via abstractor, on 2025-11-17 21:14:41 CST

Browse files
Files changed (1) hide show
  1. README.md +178 -1
README.md CHANGED
@@ -1,3 +1,180 @@
1
  ---
 
2
  license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ pipeline_tag: image-classification
3
  license: mit
4
+ tags:
5
+ - image-classification
6
+ - onnx
7
+ - anime
8
+ - tagging
9
+ - danbooru
10
+ - deep-learning
11
+ - computer-vision
12
+
13
+ # ML-Danbooru ONNX Models
14
+
15
+ ## Summary
16
+
17
+ This repository provides **ONNX-optimized** implementations of the **ML-Danbooru** image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated **deep learning** system specifically designed for **automated tagging** of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models in this repository have been converted to ONNX format for improved inference performance and cross-platform compatibility.
18
+
19
+ The core architecture employs **Caformer** (Convolution-Augmented Transformer) models, which combine the global receptive field of transformers with the local feature extraction capabilities of convolutional networks. This hybrid approach enables the models to effectively capture both fine-grained details and global contextual information in anime artwork. The repository includes multiple model variants trained with different configurations and epochs, providing users with options ranging from faster inference to higher accuracy depending on their specific requirements.
20
+
21
+ Performance-wise, these models demonstrate **exceptional accuracy** in recognizing common anime character attributes, clothing items, accessories, backgrounds, and compositional elements. They can reliably identify tags such as hair colors, eye colors, clothing types, character poses, and scene settings with confidence scores typically exceeding 0.7-0.9 for relevant features. The models support **batch processing** and can handle images of various aspect ratios through intelligent resizing strategies that preserve important visual information while maintaining computational efficiency.
22
+
23
+ ## Usage
24
+
25
+ The models in this repository are designed to be used with the `dghs-imgutils` library, which provides a comprehensive interface for image tagging tasks.
26
+
27
+ ### Installation
28
+
29
+ ```bash
30
+ pip install dghs-imgutils
31
+ ```
32
+
33
+ ### Basic Usage
34
+
35
+ ```python
36
+ from imgutils.tagging import get_mldanbooru_tags
37
+
38
+ # Tag an image with default settings
39
+ tags = get_mldanbooru_tags('your_image.jpg')
40
+ print(tags)
41
+
42
+ # Tag with custom threshold and settings
43
+ tags_custom = get_mldanbooru_tags(
44
+ 'your_image.jpg',
45
+ threshold=0.5,
46
+ size=448,
47
+ keep_ratio=True,
48
+ drop_overlap=True,
49
+ use_real_name=False
50
+ )
51
+ print(tags_custom)
52
+ ```
53
+
54
+ ### Advanced Usage with Model Selection
55
+
56
+ ```python
57
+ from huggingface_hub import hf_hub_download
58
+ from imgutils.utils import open_onnx_model
59
+ from PIL import Image
60
+ import numpy as np
61
+
62
+ # Load a specific model from this repository
63
+ model_path = hf_hub_download('deepghs/ml-danbooru-onnx', 'ml_caformer_m36_dec-5-97527.onnx')
64
+ model = open_onnx_model(model_path)
65
+
66
+ # Manual preprocessing and inference
67
+ def preprocess_image(image_path, size=448):
68
+ image = Image.open(image_path).convert('RGB')
69
+ # Resize with alignment for optimal performance
70
+ min_edge = min(image.size)
71
+ target_size = (
72
+ int(image.size[0] / min_edge * size),
73
+ int(image.size[1] / min_edge * size),
74
+ )
75
+ target_size = (
76
+ (target_size[0] // 4) * 4,
77
+ (target_size[1] // 4) * 4,
78
+ )
79
+ image = image.resize(target_size, resample=Image.BILINEAR)
80
+
81
+ # Convert to tensor
82
+ img_array = np.array(image, dtype=np.float32).transpose(2, 0, 1) / 255.0
83
+ return img_array.reshape(1, *img_array.shape)
84
+
85
+ # Run inference
86
+ input_tensor = preprocess_image('your_image.jpg')
87
+ output = model.run(['output'], {'input': input_tensor})[0]
88
+ probabilities = 1 / (1 + np.exp(-output)).reshape(-1)
89
+
90
+ # Process results (you would need to load the tag labels)
91
+ # tags = process_probabilities(probabilities, threshold=0.7)
92
+ ```
93
+
94
+ ## Model Variants
95
+
96
+ This repository contains multiple ML-Danbooru model variants:
97
+
98
+ - **ml_caformer_m36_dec-5-97527.onnx**: Primary model with Caformer-M36 architecture
99
+ - **ml_caformer_m36_dec-3-80000.onnx**: Alternative checkpoint with different training
100
+ - **TResnet-D-FLq_ema_2-40000.onnx**: TResnet-based variant
101
+ - **TResnet-D-FLq_ema_4-10000.onnx**: Lightweight TResnet variant
102
+ - **TResnet-D-FLq_ema_6-10000.onnx**: Additional TResnet checkpoint
103
+ - **TResnet-D-FLq_ema_6-30000.onnx**: Extended training TResnet variant
104
+ - **caformer_m36-3-80000.onnx**: Base Caformer model
105
+
106
+ ## Tag Information
107
+
108
+ The repository includes comprehensive tag information:
109
+
110
+ - **classes.json**: Contains 1,527 simplified tag names for common anime attributes
111
+ - **tags.csv**: Complete tag database with 12,547 entries including:
112
+ - Original tag names
113
+ - Root forms for morphological variations
114
+ - Part-of-speech classifications
115
+ - Usage frequency counts
116
+
117
+ ## Performance Characteristics
118
+
119
+ - **Input Size**: Default 448x448 pixels (configurable)
120
+ - **Tag Count**: 12,547 possible tags
121
+ - **Threshold**: Default 0.7 (configurable)
122
+ - **Supported Tags**: Character attributes, clothing, accessories, backgrounds, compositions
123
+ - **Architecture**: Caformer-M36 and TResnet variants
124
+ - **Format**: ONNX for optimized inference
125
+
126
+ ## Original Content
127
+
128
+ ### Tag Database Structure
129
+
130
+ The repository includes a comprehensive tag database with the following structure:
131
+
132
+ ```json
133
+ // Sample from classes.json (simplified tags)
134
+ [
135
+ "1girl",
136
+ "bangs",
137
+ "blunt_bangs",
138
+ "brown_hair",
139
+ "hair_bun",
140
+ "hime_cut",
141
+ "long_hair",
142
+ "mask",
143
+ "ribbon",
144
+ "solo",
145
+ "yellow_eyes",
146
+ // ... 1,527 tags total
147
+ ]
148
+ ```
149
+
150
+ ```csv
151
+ # Sample from tags.csv
152
+ tag,root,pos,count
153
+ 1girl,girl,NOUN,4317542
154
+ bangs,bang,NOUN,1576060
155
+ blunt_bangs,bang,NOUN,178797
156
+ brown_hair,hair,NOUN,1092727
157
+ hair_bun,bun,NOUN,157335
158
+ ```
159
+
160
+ ### Model Architecture Details
161
+
162
+ The ML-Danbooru models utilize modern transformer-based architectures:
163
+
164
+ - **Caformer-M36**: Combines convolutional layers with transformer blocks for efficient feature extraction
165
+ - **TResnet-D**: Transformer-enhanced ResNet variants with focal loss optimization
166
+ - **ONNX Optimization**: Models are exported with optimized operators for fast inference across different hardware platforms
167
+
168
+ ## Citation
169
+
170
+ ```bibtex
171
+ @misc{deepghs_ml_danbooru_onnx,
172
+ title = {{ML-Danbooru ONNX Models: Optimized Anime Image Tagging}},
173
+ author = {7eu7d7 and DeepGHS Contributors},
174
+ howpublished = {\url{https://huggingface.co/deepghs/ml-danbooru-onnx}},
175
+ year = {2023},
176
+ note = {ONNX-optimized implementations of ML-Danbooru models for efficient anime image tagging with transformer-based architectures},
177
+ abstract = {This repository provides ONNX-optimized implementations of the ML-Danbooru image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated deep learning system specifically designed for automated tagging of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models employ Caformer (Convolution-Augmented Transformer) architectures that combine the global receptive field of transformers with local feature extraction capabilities of convolutional networks, enabling effective capture of both fine-grained details and global contextual information in anime artwork.},
178
+ keywords = {image-classification, anime, tagging, danbooru, transformer, onnx}
179
+ }
180
+ ```