File size: 4,322 Bytes
e97480b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5590873
73f8017
 
 
e97480b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
language: en
license: mit
tags:
  - vision
  - image-segmentation
  - semantic-segmentation
  - human-parsing
  - body-parts
  - pytorch
  - onnx
datasets:
  - pascal-person-part
pipeline_tag: image-segmentation
---

# SCHP โ€” Self-Correction Human Parsing (Pascal Person Part, 7 classes)

**SCHP** (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
This checkpoint is trained on the **Pascal Person Part** dataset and packaged for the ๐Ÿค— Transformers `AutoModel` API.

> Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing)

| Source image | Segmentation result |
|:---:|:---:|
| ![demo](./assets/demo.jpg) | ![demo-pascal](./assets/demo_pascal.png) |

**Use cases:**
- ๐Ÿƒ **Body part segmentation** โ€” segment coarse body regions (head, torso, arms, legs) for pose-aware applications
- ๐ŸŽฎ **Avatar rigging** โ€” generate body part masks as a preprocessing step for AR/VR avatars
- ๐Ÿฅ **Medical / ergonomics** โ€” coarse body region detection for posture analysis or wearable device placement
- ๐Ÿ“ **Body proportion estimation** โ€” measure relative areas of body segments in 2D images

## Dataset โ€” Pascal Person Part

Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on **body part segmentation**.

- **mIoU on Pascal Person Part validation: 71.46%**
- 7 coarse labels covering body regions

## Labels

| ID | Label |
|----|-------|
| 0 | Background |
| 1 | Head |
| 2 | Torso |
| 3 | Upper Arms |
| 4 | Lower Arms |
| 5 | Upper Legs |
| 6 | Lower Legs |

## Usage โ€” PyTorch

```python
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch

model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# outputs.logits         โ€” (1,  7, 512, 512) raw logits
# outputs.parsing_logits โ€” (1,  7, 512, 512) refined parsing logits
# outputs.edge_logits    โ€” (1,  1, 512, 512) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy()  # (H, W), values in [0, 6]
```

Each pixel in `seg_map` is a label ID. To map IDs back to names:

```python
id2label = model.config.id2label
print(id2label[1])  # โ†’ "Head"
```

## Usage โ€” ONNX Runtime

Optimized ONNX files are available in the `onnx/` folder of this repo:

| File | Size | Notes |
|------|------|-------|
| `onnx/schp-pascal-7.onnx` + `.onnx.data` | ~257 MB | FP32, dynamic batch |
| `onnx/schp-pascal-7-int8-static.onnx` | ~66 MB | INT8 static, 99.77% pixel agreement |

```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image

model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx")
processor  = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)

sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])

image  = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze()  # (H, W)
```

## Performance

Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`):

| Backend | Latency | Speedup | Size |
|---------|---------|---------|------|
| PyTorch FP32 | ~424 ms | 1ร— | 255 MB |
| ONNX FP32 | ~296 ms | 1.44ร— | 256 MB |
| ONNX INT8 static | ~218 ms | **1.94ร—** | **66 MB** |

INT8 static quantization achieves **99.77% pixel-level agreement** with the FP32 model.

## Model Details

| Property | Value |
|----------|-------|
| Architecture | ResNet-101 + SCHP self-correction |
| Input size | 512 ร— 512 |
| Output | 3 heads: logits, parsing_logits, edge_logits |
| num_labels | 7 |
| Dataset | Pascal Person Part |
| Original mIoU | 71.46% |