| --- |
| license: cc0-1.0 |
| tags: |
| - art |
| - computer vision |
| - Image segmentation |
| --- |
| |
| # DeepLabV3+ ResNet50 for human body parts segmentation |
|
|
| This is a very simple ONNX model that can segment human body parts. |
|
|
| ## Why this model |
|
|
| This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50) |
| where the provided model can segment human body parts. All the others models that I found was trained on |
| city segmentation. |
|
|
| The original model is built for old version of Keras and cannot be used with recent version of TensorFlow. |
| I translated the model to ONNX format. |
|
|
| ## Usage |
|
|
| Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package. |
|
|
| The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor: |
|
|
| - 1: number of output (you can squeeze it) |
| - 1: batch size (you can squeeze it) |
| - 512, 512: the size of the image (fixed) |
| - 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel |
|
|
| ```python |
| import onnxruntime |
| import numpy as np |
| from PIL import Image |
| |
| model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx") |
| |
| img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg") |
| img = img.resize((512, 512)) |
| img = np.array(img).astype(np.float32) / 127.5 - 1 |
| |
| # infer |
| input_name = model.get_inputs()[0].name |
| output_name = model.get_outputs()[0].name |
| result = model.run([output_name], {input_name: img}) |
| |
| # squeeze, argmax... |
| result = np.array(result[0]) |
| # argmax the classes, remove the batch size |
| result = result.argmax(axis=3).squeeze(0) |
| |
| # get the masks |
| for i in range(20): |
| detected = result == i # get the detected pixels for the class i |
| # detected is a 512, 512 boolean array |
| mask = np.zeros_like(img) |
| mask[detected] = 255 |
| Image.fromarray(mask).show() # or save, or return the mask... |
| ``` |
|
|
| ## Classes index |
|
|
| This is the list of classes that the model can detect (some classes are not specifically identified, see below): |
|
|
| - 0: "background", |
| - 1: "unknown", |
| - 2: "hair", |
| - 3: "unknown", |
| - 4: "glasses", |
| - 5: "top-clothes", |
| - 6: "unknown", |
| - 7: "unknown", |
| - 8: "unknown", |
| - 9: "bottom-clothes", |
| - 10: "torso-skin", |
| - 11: "unknown", |
| - 12: "unknown", |
| - 13: "face", |
| - 14: "left-arm", |
| - 15: "right-arm", |
| - 16: "left-leg", |
| - 17: "right-leg", |
| - 18: "left-foot", |
| - 19: "right-foot", |
|
|
| ## Known limitation |
|
|
| - The model could fail on portrait images, because the model was trained on "full body" images. |
| - There are some classes that I don't know what they are. I can't find the list of classes (help !). |
| - The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it. |
|
|
| ## License |
|
|
| The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0" |
| license. I don't know if it's the right license for the model, but I keep it. |
|
|
| > Anyway, thanks to the authors of the model for sharing it and to leave it open to use. |
|
|
| This means that you may use the model, share, modify, and distribute it without any restriction. |