docs: rename table header to Segmentation result

5590873 10 days ago

4.32 kB

	---
	language: en
	license: mit
	tags:
	- vision
	- image-segmentation
	- semantic-segmentation
	- human-parsing
	- body-parts
	- pytorch
	- onnx
	datasets:
	- pascal-person-part
	pipeline_tag: image-segmentation
	---

	# SCHP — Self-Correction Human Parsing (Pascal Person Part, 7 classes)

	SCHP (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
	This checkpoint is trained on the Pascal Person Part dataset and packaged for the 🤗 Transformers `AutoModel` API.

	> Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing)

	\| Source image \| Segmentation result \|
	\|:---:\|:---:\|
	\| ![demo](./assets/demo.jpg) \| ![demo-pascal](./assets/demo_pascal.png) \|

	Use cases:
	- 🏃 Body part segmentation — segment coarse body regions (head, torso, arms, legs) for pose-aware applications
	- 🎮 Avatar rigging — generate body part masks as a preprocessing step for AR/VR avatars
	- 🏥 Medical / ergonomics — coarse body region detection for posture analysis or wearable device placement
	- 📐 Body proportion estimation — measure relative areas of body segments in 2D images

	## Dataset — Pascal Person Part

	Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on body part segmentation.

	- mIoU on Pascal Person Part validation: 71.46%
	- 7 coarse labels covering body regions

	## Labels

	\| ID \| Label \|
	\|----\|-------\|
	\| 0 \| Background \|
	\| 1 \| Head \|
	\| 2 \| Torso \|
	\| 3 \| Upper Arms \|
	\| 4 \| Lower Arms \|
	\| 5 \| Upper Legs \|
	\| 6 \| Lower Legs \|

	## Usage — PyTorch

	```python
	from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
	from PIL import Image
	import torch

	model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
	processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

	image = Image.open("photo.jpg").convert("RGB")
	inputs = processor(images=image, return_tensors="pt")

	with torch.no_grad():
	outputs = model(**inputs)

	# outputs.logits — (1, 7, 512, 512) raw logits
	# outputs.parsing_logits — (1, 7, 512, 512) refined parsing logits
	# outputs.edge_logits — (1, 1, 512, 512) edge prediction logits
	seg_map = outputs.logits.argmax(dim=1).squeeze().numpy() # (H, W), values in [0, 6]
	```

	Each pixel in `seg_map` is a label ID. To map IDs back to names:

	```python
	id2label = model.config.id2label
	print(id2label[1]) # → "Head"
	```

	## Usage — ONNX Runtime

	Optimized ONNX files are available in the `onnx/` folder of this repo:

	\| File \| Size \| Notes \|
	\|------\|------\|-------\|
	\| `onnx/schp-pascal-7.onnx` + `.onnx.data` \| ~257 MB \| FP32, dynamic batch \|
	\| `onnx/schp-pascal-7-int8-static.onnx` \| ~66 MB \| INT8 static, 99.77% pixel agreement \|

	```python
	import onnxruntime as ort
	import numpy as np
	from huggingface_hub import hf_hub_download
	from transformers import AutoImageProcessor
	from PIL import Image

	model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx")
	processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

	sess_opts = ort.SessionOptions()
	sess_opts.intra_op_num_threads = 8
	sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])

	image = Image.open("photo.jpg").convert("RGB")
	inputs = processor(images=image, return_tensors="np")
	logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
	seg_map = logits.argmax(axis=1).squeeze() # (H, W)
	```

	## Performance

	Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`):

	\| Backend \| Latency \| Speedup \| Size \|
	\|---------\|---------\|---------\|------\|
	\| PyTorch FP32 \| ~424 ms \| 1× \| 255 MB \|
	\| ONNX FP32 \| ~296 ms \| 1.44× \| 256 MB \|
	\| ONNX INT8 static \| ~218 ms \| 1.94× \| 66 MB \|

	INT8 static quantization achieves 99.77% pixel-level agreement with the FP32 model.

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Architecture \| ResNet-101 + SCHP self-correction \|
	\| Input size \| 512 × 512 \|
	\| Output \| 3 heads: logits, parsing_logits, edge_logits \|
	\| num_labels \| 7 \|
	\| Dataset \| Pascal Person Part \|
	\| Original mIoU \| 71.46% \|