Metal3d
/

deeplabv3p-resnet50-human

computer vision

Image segmentation

Model card Files Files and versions

deeplabv3p-resnet50-human / README.md

Metal3d's picture

Change tags

b9b594a verified over 1 year ago

|

history blame contribute delete

3.06 kB

	---
	license: cc0-1.0
	tags:
	- art
	- computer vision
	- Image segmentation
	---

	# DeepLabV3+ ResNet50 for human body parts segmentation

	This is a very simple ONNX model that can segment human body parts.

	## Why this model

	This model is a ONNX transposition of [keras-io/deeplabv3p-resnet50](https://huggingface.co/keras-io/deeplabv3p-resnet50)
	where the provided model can segment human body parts. All the others models that I found was trained on
	city segmentation.

	The original model is built for old version of Keras and cannot be used with recent version of TensorFlow.
	I translated the model to ONNX format.

	## Usage

	Get the `deeplabv3p-resnet50-human.onnx` file and use it with ONNXRuntime package.

	The result of `model.run` is a `(1, 1, 512, 512, 20)` tensor:

	- 1: number of output (you can squeeze it)
	- 1: batch size (you can squeeze it)
	- 512, 512: the size of the image (fixed)
	- 20: number of classes, so you can take the `argmax`` of the tensor to get the class of each pixel

	```python
	import onnxruntime
	import numpy as np
	from PIL import Image

	model = onnxruntime.InferenceSession("deeplabv3p-resnet50-human.onnx")

	img = Image.open(sys.argv[1] if len(sys.argv) > 1 else "image.jpg")
	img = img.resize((512, 512))
	img = np.array(img).astype(np.float32) / 127.5 - 1

	# infer
	input_name = model.get_inputs()[0].name
	output_name = model.get_outputs()[0].name
	result = model.run([output_name], {input_name: img})

	# squeeze, argmax...
	result = np.array(result[0])
	# argmax the classes, remove the batch size
	result = result.argmax(axis=3).squeeze(0)

	# get the masks
	for i in range(20):
	detected = result == i # get the detected pixels for the class i
	# detected is a 512, 512 boolean array
	mask = np.zeros_like(img)
	mask[detected] = 255
	Image.fromarray(mask).show() # or save, or return the mask...
	```

	## Classes index

	This is the list of classes that the model can detect (some classes are not specifically identified, see below):

	- 0: "background",
	- 1: "unknown",
	- 2: "hair",
	- 3: "unknown",
	- 4: "glasses",
	- 5: "top-clothes",
	- 6: "unknown",
	- 7: "unknown",
	- 8: "unknown",
	- 9: "bottom-clothes",
	- 10: "torso-skin",
	- 11: "unknown",
	- 12: "unknown",
	- 13: "face",
	- 14: "left-arm",
	- 15: "right-arm",
	- 16: "left-leg",
	- 17: "right-leg",
	- 18: "left-foot",
	- 19: "right-foot",

	## Known limitation

	- The model could fail on portrait images, because the model was trained on "full body" images.
	- There are some classes that I don't know what they are. I can't find the list of classes (help !).
	- The model is not perfect, and can fail on some images. I'm not the author of the model, so I can't fix it.

	## License

	The [original model card](https://huggingface.co/keras-io/deeplabv3p-resnet50/blob/main/README.md) proposes the "CC0-1.0"
	license. I don't know if it's the right license for the model, but I keep it.

	> Anyway, thanks to the authors of the model for sharing it and to leave it open to use.

	This means that you may use the model, share, modify, and distribute it without any restriction.