| | --- |
| | language: |
| | - "en" |
| | - "zh" |
| | tags: |
| | - mlx |
| | - DeepDanbooru |
| | - danbooru |
| | - Image-Clip |
| | - image-interrogate |
| | - image-to-text |
| | - captioning |
| | license: "mit" |
| | base_model: "hazhu/mlx-DeepDanbooru" |
| | --- |
| | |
| | # mlx-DeepDanbooru |
| |
|
| | Pure MLX implementation of DeepDanbooru Neural Network for __Apple Silicon Chips__: M1, M2, M3, M4; |
| | `mlx-DeepDanBooru` is available for: MacBook Pro / Air, Mac mini, iMac. |
| |
|
| | ## Usage |
| |
|
| | Image-to-Text, captioning, CLIP by using [DeepDanBooru Model](https://github.com/KichangKim/DeepDanbooru) on Apple Devices. |
| |
|
| | ## MLX DeepDanBooru Model |
| |
|
| | This `mlx-DeepDanBooru` Model implementation is inspired by a PyTorch implementation of [AUTOMATIC1111/TorchDeepDanbooru](https://github.com/AUTOMATIC1111/TorchDeepDanbooru) |
| |
|
| | ## Installation |
| |
|
| | ``` |
| | conda create -n mlx026 python=3.12 |
| | conda activate mlx026 |
| | # |
| | pip install numpy |
| | pip install pillow |
| | ``` |
| |
|
| | MLX is available on [PyPI](https://pypi.org/project/mlx/). To install the Python API, run: |
| |
|
| | ``` |
| | pip install mlx |
| | ``` |
| |
|
| | `mlx-DeepDanbooru` is base on `mlx` version: `0.26.1` |
| |
|
| | ## Inference |
| |
|
| | ``` |
| | python infer.py |
| | ``` |
| |
|
| | Image Interrogate: |
| |
|
| | ```python |
| | import numpy as np |
| | from PIL import Image, ImageDraw |
| | |
| | # using apple silicon's MLX |
| | # not Pytorch |
| | import mlx.core as mx |
| | from mlxDeepDanBooru.mlx_deep_danbooru_model import mlxDeepDanBooruModel |
| | |
| | |
| | model_path = "models/model-resnet_custom_v3_mlx.npz" |
| | tags_path = 'models/tags-resnet_custom_v3_mlx.npy' |
| | |
| | mlx_dan = mlxDeepDanBooruModel() |
| | mlx_dan.load_weights(model_path) |
| | mx.eval(mlx_dan.parameters()) |
| | |
| | |
| | model_tags = np.load(tags_path) |
| | print(f'total tags: {len(model_tags)}') |
| | |
| | def danbooru_tags(fpath): |
| | tags = [] |
| | pic = Image.open(fpath).convert("RGB").resize((512, 512)) |
| | a = np.expand_dims(np.array(pic, dtype=np.float32), 0) / 255 |
| | |
| | x = mx.array(a) |
| | y = mlx_dan(x)[0] |
| | |
| | for n in range(10): |
| | mlx_dan(x) |
| | for i, p in enumerate(y): |
| | if p >= 0.5: |
| | # 0.5 can be changed for demand: 0.0 ~ 1.0 |
| | #print(model_tags[i].item(), p) |
| | tags.append(model_tags[i].item()) |
| | |
| | return tags |
| | |
| | image_count = 0 |
| | def image_infer(fpath): |
| | global image_count |
| | tags = danbooru_tags(fpath) |
| | image_count += 1 |
| | return tags |
| | |
| | |
| | t1 = time.time() |
| | tags_1 = image_infer("example/1.png") |
| | tags_2 = image_infer("example/2.png") |
| | |
| | t2 = time.time() |
| | |
| | print(tags_1) |
| | # will show tags: ['1girl', 'beach', 'black_hair', 'blurry', 'blurry_background', 'blurry_foreground', 'building', 'bush', 'christmas_tree', 'day', 'depth_of_field', 'field', 'grass', 'lake', 'looking_at_viewer', 'mountain', 'nature', 'outdoors', 'palm_leaf', 'palm_tree', 'park', 'park_bench', 'path', 'photo_background', 'plant', 'river', 'road', 'skirt', 'sky', 'smile', 'striped', 'striped_dress', 'striped_shirt', 'tree', 'vertical-striped_shirt', 'vertical_stripes', 'rating:safe'] |
| | |
| | print(tags_2) |
| | # will show tags: ['1girl', '3d', 'blurry', 'blurry_background', 'blurry_foreground', 'brown_eyes', 'brown_hair', 'bush', 'christmas_tree', 'cosplay_photo', 'day', 'depth_of_field', 'field', 'floral_print', 'foliage', 'forest', 'garden', 'grass', 'jungle', 'lips', 'long_hair', 'long_sleeves', 'looking_at_viewer', 'nature', 'on_grass', 'outdoors', 'palm_tree', 'park', 'path', 'plant', 'potted_plant', 'realistic', 'smile', 'solo', 'tree', 'upper_body', 'white_dress', 'rating:safe'] |
| | |
| | print("-----------") |
| | print(f'infer speed(with mlx): {(t2 - t1)/image_count} seconds per image') |
| | ``` |
| |
|
| |
|
| | ## Performance |
| |
|
| | In the `example` folder, 1024x1024 pixel, |
| |
|
| | On Mac Mini M4, `MLX DeepDanBooru Model` inference Speed: |
| |
|
| | ``` |
| | 1.7 seconds per image |
| | ``` |
| |
|
| | On Mac Mini M4, __MPS + Pytorch__ inference Speed: `0.8 seconds per image` |
| |
|
| | On Mac Mini M4, CPU + Pytorch inference Speed: `2.5 seconds per image` |
| |
|
| | ## CURRENTLY |
| |
|
| | the speed of __MPS + Pytorch__ > MLX. |
| |
|
| |  |
| |
|
| | ## Bench: 351 images, 720x1280 and 540x720: |
| |
|
| | In Windows 11, Nvidia RTX 4070 Ti, CUDA+Pytorch: |
| |
|
| | ``` |
| | SPEED: 0.3 seconds per image |
| | Power Consumption: 260 ~ 300 Watt |
| | ``` |
| |
|
| | In Mac mini M4, `mlx-DeepDanBooru`: |
| |
|
| | ``` |
| | SPEED: 1.68 seconds per image |
| | Power Consumption: 8 ~ 12 Watt |
| | ``` |
| |
|
| | In Mac mini M4, `mlx-DeepDanBooru` with multiprocessing, i.e.: run `infer_multiprocessing.py`: |
| |
|
| | ``` |
| | SPEED: 0.42 seconds per image |
| | ``` |
| |
|
| |
|