File size: 4,322 Bytes

---
language: en
license: mit
tags:
  - vision
  - image-segmentation
  - semantic-segmentation
  - human-parsing
  - body-parts
  - pytorch
  - onnx
datasets:
  - pascal-person-part
pipeline_tag: image-segmentation
---

# SCHP — Self-Correction Human Parsing (Pascal Person Part, 7 classes)

**SCHP** (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
This checkpoint is trained on the **Pascal Person Part** dataset and packaged for the 🤗 Transformers `AutoModel` API.

> Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing)

| Source image | Segmentation result |
|:---:|:---:|
| ![demo](./assets/demo.jpg) | ![demo-pascal](./assets/demo_pascal.png) |

**Use cases:**
- 🏃 **Body part segmentation** — segment coarse body regions (head, torso, arms, legs) for pose-aware applications
- 🎮 **Avatar rigging** — generate body part masks as a preprocessing step for AR/VR avatars
- 🏥 **Medical / ergonomics** — coarse body region detection for posture analysis or wearable device placement
- 📐 **Body proportion estimation** — measure relative areas of body segments in 2D images

## Dataset — Pascal Person Part

Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on **body part segmentation**.

- **mIoU on Pascal Person Part validation: 71.46%**
- 7 coarse labels covering body regions

## Labels

| ID | Label |
|----|-------|
| 0 | Background |
| 1 | Head |
| 2 | Torso |
| 3 | Upper Arms |
| 4 | Lower Arms |
| 5 | Upper Legs |
| 6 | Lower Legs |

## Usage — PyTorch

```python
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch

model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# outputs.logits         — (1,  7, 512, 512) raw logits
# outputs.parsing_logits — (1,  7, 512, 512) refined parsing logits
# outputs.edge_logits    — (1,  1, 512, 512) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy()  # (H, W), values in [0, 6]
```

Each pixel in `seg_map` is a label ID. To map IDs back to names:

```python
id2label = model.config.id2label
print(id2label[1])  # → "Head"
```

## Usage — ONNX Runtime

Optimized ONNX files are available in the `onnx/` folder of this repo:

| File | Size | Notes |
|------|------|-------|
| `onnx/schp-pascal-7.onnx` + `.onnx.data` | ~257 MB | FP32, dynamic batch |
| `onnx/schp-pascal-7-int8-static.onnx` | ~66 MB | INT8 static, 99.77% pixel agreement |

```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image

model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx")
processor  = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])

image  = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze()  # (H, W)
```

## Performance

Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`):

| Backend | Latency | Speedup | Size |
|---------|---------|---------|------|
| PyTorch FP32 | ~424 ms | 1× | 255 MB |
| ONNX FP32 | ~296 ms | 1.44× | 256 MB |
| ONNX INT8 static | ~218 ms | **1.94×** | **66 MB** |

INT8 static quantization achieves **99.77% pixel-level agreement** with the FP32 model.

## Model Details

| Property | Value |
|----------|-------|
| Architecture | ResNet-101 + SCHP self-correction |
| Input size | 512 × 512 |
| Output | 3 heads: logits, parsing_logits, edge_logits |
| num_labels | 7 |
| Dataset | Pascal Person Part |
| Original mIoU | 71.46% |