yichengup
/

segmentation-models-collection

ONNX

Model card Files Files and versions

xet

Community

yichengup commited on Nov 8, 2025

Commit

a35386c

verified ·

1 Parent(s): 258c224

Upload deeplabv3p-resnet50-human read.md

Browse files

Files changed (1) hide show

deeplabv3p-resnet50-human read.md +101 -0

deeplabv3p-resnet50-human read.md ADDED Viewed

	@@ -0,0 +1,101 @@

+---
+license: cc0-1.0
+tags:
+- art
+- computer vision
+- Image segmentation
+---
+# DeepLabV3+ ResNet50 for human body parts segmentation
+This is a very simple ONNX model that can segment human body parts.
+## Why this model
+This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50)
+where the provided model can segment human body parts. All the others models that I found was trained on
+city segmentation.
+The original model is built for old version of Keras and cannot be used with recent version of TensorFlow.
+I translated the model to ONNX format.
+## Usage
+Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package.
+The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor:
+- 1: number of output (you can squeeze it)
+- 1: batch size (you can squeeze it)
+- 512, 512: the size of the image (fixed)
+- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel
+```python
+import onnxruntime
+import numpy as np
+from PIL import Image
+model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")
+img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
+img = img.resize((512, 512))
+img = np.array(img).astype(np.float32) / 127.5 - 1
+# infer
+input_name = model.get_inputs()[0].name
+output_name = model.get_outputs()[0].name
+result = model.run([output_name], {input_name: img})
+# squeeze, argmax...
+result = np.array(result[0])
+# argmax the classes, remove the batch size
+result = result.argmax(axis=3).squeeze(0)
+# get the masks
+for i in range(20):
+    detected = result == i # get the detected pixels for the class i
+    # detected  is a 512, 512 boolean array
+    mask = np.zeros_like(img)
+    mask[detected] = 255
+    Image.fromarray(mask).show() # or save, or return the mask...
+```
+## Classes index
+This is the list of classes that the model can detect (some classes are not specifically identified, see below):
+- 0: "background",
+- 1: "unknown",
+- 2: "hair",
+- 3: "unknown",
+- 4: "glasses",
+- 5: "top-clothes",
+- 6: "unknown",
+- 7: "unknown",
+- 8: "unknown",
+- 9: "bottom-clothes",
+- 10: "torso-skin",
+- 11: "unknown",
+- 12: "unknown",
+- 13: "face",
+- 14: "left-arm",
+- 15: "right-arm",
+- 16: "left-leg",
+- 17: "right-leg",
+- 18: "left-foot",
+- 19: "right-foot",
+## Known limitation
+- The model could fail on portrait images, because the model was trained on "full body" images.
+- There are some classes that I don't know what they are. I can't find the list of classes (help !).
+- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.
+## License
+The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0"
+license. I don't know if it's the right license for the model, but I keep it.
+> Anyway, thanks to the authors of the model for sharing it and to leave it open to use.
+This means that you may use the model, share, modify, and distribute it without any restriction.