Upload deeplabv3p-resnet50-human read.md
Browse files
deeplabv3p-resnet50-human read.md
ADDED
|
@@ -0,0 +1,101 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc0-1.0
|
| 3 |
+
tags:
|
| 4 |
+
- art
|
| 5 |
+
- computer vision
|
| 6 |
+
- Image segmentation
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# DeepLabV3+ ResNet50 for human body parts segmentation
|
| 10 |
+
|
| 11 |
+
This is a very simple ONNX model that can segment human body parts.
|
| 12 |
+
|
| 13 |
+
## Why this model
|
| 14 |
+
|
| 15 |
+
This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50)
|
| 16 |
+
where the provided model can segment human body parts. All the others models that I found was trained on
|
| 17 |
+
city segmentation.
|
| 18 |
+
|
| 19 |
+
The original model is built for old version of Keras and cannot be used with recent version of TensorFlow.
|
| 20 |
+
I translated the model to ONNX format.
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package.
|
| 25 |
+
|
| 26 |
+
The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor:
|
| 27 |
+
|
| 28 |
+
- 1: number of output (you can squeeze it)
|
| 29 |
+
- 1: batch size (you can squeeze it)
|
| 30 |
+
- 512, 512: the size of the image (fixed)
|
| 31 |
+
- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
import onnxruntime
|
| 35 |
+
import numpy as np
|
| 36 |
+
from PIL import Image
|
| 37 |
+
|
| 38 |
+
model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")
|
| 39 |
+
|
| 40 |
+
img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
|
| 41 |
+
img = img.resize((512, 512))
|
| 42 |
+
img = np.array(img).astype(np.float32) / 127.5 - 1
|
| 43 |
+
|
| 44 |
+
# infer
|
| 45 |
+
input_name = model.get_inputs()[0].name
|
| 46 |
+
output_name = model.get_outputs()[0].name
|
| 47 |
+
result = model.run([output_name], {input_name: img})
|
| 48 |
+
|
| 49 |
+
# squeeze, argmax...
|
| 50 |
+
result = np.array(result[0])
|
| 51 |
+
# argmax the classes, remove the batch size
|
| 52 |
+
result = result.argmax(axis=3).squeeze(0)
|
| 53 |
+
|
| 54 |
+
# get the masks
|
| 55 |
+
for i in range(20):
|
| 56 |
+
detected = result == i # get the detected pixels for the class i
|
| 57 |
+
# detected is a 512, 512 boolean array
|
| 58 |
+
mask = np.zeros_like(img)
|
| 59 |
+
mask[detected] = 255
|
| 60 |
+
Image.fromarray(mask).show() # or save, or return the mask...
|
| 61 |
+
```
|
| 62 |
+
|
| 63 |
+
## Classes index
|
| 64 |
+
|
| 65 |
+
This is the list of classes that the model can detect (some classes are not specifically identified, see below):
|
| 66 |
+
|
| 67 |
+
- 0: "background",
|
| 68 |
+
- 1: "unknown",
|
| 69 |
+
- 2: "hair",
|
| 70 |
+
- 3: "unknown",
|
| 71 |
+
- 4: "glasses",
|
| 72 |
+
- 5: "top-clothes",
|
| 73 |
+
- 6: "unknown",
|
| 74 |
+
- 7: "unknown",
|
| 75 |
+
- 8: "unknown",
|
| 76 |
+
- 9: "bottom-clothes",
|
| 77 |
+
- 10: "torso-skin",
|
| 78 |
+
- 11: "unknown",
|
| 79 |
+
- 12: "unknown",
|
| 80 |
+
- 13: "face",
|
| 81 |
+
- 14: "left-arm",
|
| 82 |
+
- 15: "right-arm",
|
| 83 |
+
- 16: "left-leg",
|
| 84 |
+
- 17: "right-leg",
|
| 85 |
+
- 18: "left-foot",
|
| 86 |
+
- 19: "right-foot",
|
| 87 |
+
|
| 88 |
+
## Known limitation
|
| 89 |
+
|
| 90 |
+
- The model could fail on portrait images, because the model was trained on "full body" images.
|
| 91 |
+
- There are some classes that I don't know what they are. I can't find the list of classes (help !).
|
| 92 |
+
- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.
|
| 93 |
+
|
| 94 |
+
## License
|
| 95 |
+
|
| 96 |
+
The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0"
|
| 97 |
+
license. I don't know if it's the right license for the model, but I keep it.
|
| 98 |
+
|
| 99 |
+
> Anyway, thanks to the authors of the model for sharing it and to leave it open to use.
|
| 100 |
+
|
| 101 |
+
This means that you may use the model, share, modify, and distribute it without any restriction.
|