deepghs
/

ml-danbooru-onnx

 ---
+pipeline_tag: image-classification
 license: mit
+tags:
+- image-classification
+- onnx
+- anime
+- tagging
+- danbooru
+- deep-learning
+- computer-vision
+# ML-Danbooru ONNX Models
+## Summary
+This repository provides **ONNX-optimized** implementations of the **ML-Danbooru** image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated **deep learning** system specifically designed for **automated tagging** of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models in this repository have been converted to ONNX format for improved inference performance and cross-platform compatibility.
+The core architecture employs **Caformer** (Convolution-Augmented Transformer) models, which combine the global receptive field of transformers with the local feature extraction capabilities of convolutional networks. This hybrid approach enables the models to effectively capture both fine-grained details and global contextual information in anime artwork. The repository includes multiple model variants trained with different configurations and epochs, providing users with options ranging from faster inference to higher accuracy depending on their specific requirements.
+Performance-wise, these models demonstrate **exceptional accuracy** in recognizing common anime character attributes, clothing items, accessories, backgrounds, and compositional elements. They can reliably identify tags such as hair colors, eye colors, clothing types, character poses, and scene settings with confidence scores typically exceeding 0.7-0.9 for relevant features. The models support **batch processing** and can handle images of various aspect ratios through intelligent resizing strategies that preserve important visual information while maintaining computational efficiency.
+## Usage
+The models in this repository are designed to be used with the `dghs-imgutils` library, which provides a comprehensive interface for image tagging tasks.
+### Installation
+```bash
+pip install dghs-imgutils
+```
+### Basic Usage
+```python
+from imgutils.tagging import get_mldanbooru_tags
+# Tag an image with default settings
+tags = get_mldanbooru_tags('your_image.jpg')
+print(tags)
+# Tag with custom threshold and settings
+tags_custom = get_mldanbooru_tags(
+    'your_image.jpg',
+    threshold=0.5,
+    size=448,
+    keep_ratio=True,
+    drop_overlap=True,
+    use_real_name=False
+)
+print(tags_custom)
+```
+### Advanced Usage with Model Selection
+```python
+from huggingface_hub import hf_hub_download
+from imgutils.utils import open_onnx_model
+from PIL import Image
+import numpy as np
+# Load a specific model from this repository
+model_path = hf_hub_download('deepghs/ml-danbooru-onnx', 'ml_caformer_m36_dec-5-97527.onnx')
+model = open_onnx_model(model_path)
+# Manual preprocessing and inference
+def preprocess_image(image_path, size=448):
+    image = Image.open(image_path).convert('RGB')
+    # Resize with alignment for optimal performance
+    min_edge = min(image.size)
+    target_size = (
+        int(image.size[0] / min_edge * size),
+        int(image.size[1] / min_edge * size),
+    )
+    target_size = (
+        (target_size[0] // 4) * 4,
+        (target_size[1] // 4) * 4,
+    )
+    image = image.resize(target_size, resample=Image.BILINEAR)
+    # Convert to tensor
+    img_array = np.array(image, dtype=np.float32).transpose(2, 0, 1) / 255.0
+    return img_array.reshape(1, *img_array.shape)
+# Run inference
+input_tensor = preprocess_image('your_image.jpg')
+output = model.run(['output'], {'input': input_tensor})[0]
+probabilities = 1 / (1 + np.exp(-output)).reshape(-1)
+# Process results (you would need to load the tag labels)
+# tags = process_probabilities(probabilities, threshold=0.7)
+```
+## Model Variants
+This repository contains multiple ML-Danbooru model variants:
+- **ml_caformer_m36_dec-5-97527.onnx**: Primary model with Caformer-M36 architecture
+- **ml_caformer_m36_dec-3-80000.onnx**: Alternative checkpoint with different training
+- **TResnet-D-FLq_ema_2-40000.onnx**: TResnet-based variant
+- **TResnet-D-FLq_ema_4-10000.onnx**: Lightweight TResnet variant
+- **TResnet-D-FLq_ema_6-10000.onnx**: Additional TResnet checkpoint
+- **TResnet-D-FLq_ema_6-30000.onnx**: Extended training TResnet variant
+- **caformer_m36-3-80000.onnx**: Base Caformer model
+## Tag Information
+The repository includes comprehensive tag information:
+- **classes.json**: Contains 1,527 simplified tag names for common anime attributes
+- **tags.csv**: Complete tag database with 12,547 entries including:
+  - Original tag names
+  - Root forms for morphological variations
+  - Part-of-speech classifications
+  - Usage frequency counts
+## Performance Characteristics
+- **Input Size**: Default 448x448 pixels (configurable)
+- **Tag Count**: 12,547 possible tags
+- **Threshold**: Default 0.7 (configurable)
+- **Supported Tags**: Character attributes, clothing, accessories, backgrounds, compositions
+- **Architecture**: Caformer-M36 and TResnet variants
+- **Format**: ONNX for optimized inference
+## Original Content
+### Tag Database Structure
+The repository includes a comprehensive tag database with the following structure:
+```json
+// Sample from classes.json (simplified tags)
+[
+    "1girl",
+    "bangs",
+    "blunt_bangs",
+    "brown_hair",
+    "hair_bun",
+    "hime_cut",
+    "long_hair",
+    "mask",
+    "ribbon",
+    "solo",
+    "yellow_eyes",
+    // ... 1,527 tags total
+]
+```
+```csv
+# Sample from tags.csv
+tag,root,pos,count
+1girl,girl,NOUN,4317542
+bangs,bang,NOUN,1576060
+blunt_bangs,bang,NOUN,178797
+brown_hair,hair,NOUN,1092727
+hair_bun,bun,NOUN,157335
+```
+### Model Architecture Details
+The ML-Danbooru models utilize modern transformer-based architectures:
+- **Caformer-M36**: Combines convolutional layers with transformer blocks for efficient feature extraction
+- **TResnet-D**: Transformer-enhanced ResNet variants with focal loss optimization
+- **ONNX Optimization**: Models are exported with optimized operators for fast inference across different hardware platforms
+## Citation
+```bibtex
+@misc{deepghs_ml_danbooru_onnx,
+  title        = {{ML-Danbooru ONNX Models: Optimized Anime Image Tagging}},
+  author       = {7eu7d7 and DeepGHS Contributors},
+  howpublished = {\url{https://huggingface.co/deepghs/ml-danbooru-onnx}},
+  year         = {2023},
+  note         = {ONNX-optimized implementations of ML-Danbooru models for efficient anime image tagging with transformer-based architectures},
+  abstract     = {This repository provides ONNX-optimized implementations of the ML-Danbooru image tagging models, originally developed by 7eu7d7. ML-Danbooru is a sophisticated deep learning system specifically designed for automated tagging of anime-style images, leveraging modern transformer architectures to achieve high-precision classification across thousands of Danbooru-style tags. The models employ Caformer (Convolution-Augmented Transformer) architectures that combine the global receptive field of transformers with local feature extraction capabilities of convolutional networks, enabling effective capture of both fine-grained details and global contextual information in anime artwork.},
+  keywords     = {image-classification, anime, tagging, danbooru, transformer, onnx}
+}
+```