| --- |
| language: en |
| license: mit |
| tags: |
| - vision |
| - image-segmentation |
| - semantic-segmentation |
| - human-parsing |
| - body-parts |
| - pytorch |
| - onnx |
| datasets: |
| - pascal-person-part |
| pipeline_tag: image-segmentation |
| --- |
| |
| # SCHP โ Self-Correction Human Parsing (Pascal Person Part, 7 classes) |
|
|
| **SCHP** (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone. |
| This checkpoint is trained on the **Pascal Person Part** dataset and packaged for the ๐ค Transformers `AutoModel` API. |
|
|
| > Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing) |
|
|
| | Source image | Segmentation result | |
| |:---:|:---:| |
| |  |  | |
|
|
| **Use cases:** |
| - ๐ **Body part segmentation** โ segment coarse body regions (head, torso, arms, legs) for pose-aware applications |
| - ๐ฎ **Avatar rigging** โ generate body part masks as a preprocessing step for AR/VR avatars |
| - ๐ฅ **Medical / ergonomics** โ coarse body region detection for posture analysis or wearable device placement |
| - ๐ **Body proportion estimation** โ measure relative areas of body segments in 2D images |
|
|
| ## Dataset โ Pascal Person Part |
|
|
| Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on **body part segmentation**. |
|
|
| - **mIoU on Pascal Person Part validation: 71.46%** |
| - 7 coarse labels covering body regions |
|
|
| ## Labels |
|
|
| | ID | Label | |
| |----|-------| |
| | 0 | Background | |
| | 1 | Head | |
| | 2 | Torso | |
| | 3 | Upper Arms | |
| | 4 | Lower Arms | |
| | 5 | Upper Legs | |
| | 6 | Lower Legs | |
|
|
| ## Usage โ PyTorch |
|
|
| ```python |
| from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation |
| from PIL import Image |
| import torch |
| |
| model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True) |
| processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True) |
| |
| image = Image.open("photo.jpg").convert("RGB") |
| inputs = processor(images=image, return_tensors="pt") |
| |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| |
| # outputs.logits โ (1, 7, 512, 512) raw logits |
| # outputs.parsing_logits โ (1, 7, 512, 512) refined parsing logits |
| # outputs.edge_logits โ (1, 1, 512, 512) edge prediction logits |
| seg_map = outputs.logits.argmax(dim=1).squeeze().numpy() # (H, W), values in [0, 6] |
| ``` |
|
|
| Each pixel in `seg_map` is a label ID. To map IDs back to names: |
|
|
| ```python |
| id2label = model.config.id2label |
| print(id2label[1]) # โ "Head" |
| ``` |
|
|
| ## Usage โ ONNX Runtime |
|
|
| Optimized ONNX files are available in the `onnx/` folder of this repo: |
|
|
| | File | Size | Notes | |
| |------|------|-------| |
| | `onnx/schp-pascal-7.onnx` + `.onnx.data` | ~257 MB | FP32, dynamic batch | |
| | `onnx/schp-pascal-7-int8-static.onnx` | ~66 MB | INT8 static, 99.77% pixel agreement | |
|
|
| ```python |
| import onnxruntime as ort |
| import numpy as np |
| from huggingface_hub import hf_hub_download |
| from transformers import AutoImageProcessor |
| from PIL import Image |
| |
| model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx") |
| processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True) |
| |
| sess_opts = ort.SessionOptions() |
| sess_opts.intra_op_num_threads = 8 |
| sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"]) |
| |
| image = Image.open("photo.jpg").convert("RGB") |
| inputs = processor(images=image, return_tensors="np") |
| logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0] |
| seg_map = logits.argmax(axis=1).squeeze() # (H, W) |
| ``` |
|
|
| ## Performance |
|
|
| Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`): |
|
|
| | Backend | Latency | Speedup | Size | |
| |---------|---------|---------|------| |
| | PyTorch FP32 | ~424 ms | 1ร | 255 MB | |
| | ONNX FP32 | ~296 ms | 1.44ร | 256 MB | |
| | ONNX INT8 static | ~218 ms | **1.94ร** | **66 MB** | |
|
|
| INT8 static quantization achieves **99.77% pixel-level agreement** with the FP32 model. |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | Architecture | ResNet-101 + SCHP self-correction | |
| | Input size | 512 ร 512 | |
| | Output | 3 heads: logits, parsing_logits, edge_logits | |
| | num_labels | 7 | |
| | Dataset | Pascal Person Part | |
| | Original mIoU | 71.46% | |
| |