schp-pascal-7 / README.md
pirocheto's picture
docs: rename table header to Segmentation result
5590873
---
language: en
license: mit
tags:
- vision
- image-segmentation
- semantic-segmentation
- human-parsing
- body-parts
- pytorch
- onnx
datasets:
- pascal-person-part
pipeline_tag: image-segmentation
---
# SCHP โ€” Self-Correction Human Parsing (Pascal Person Part, 7 classes)
**SCHP** (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
This checkpoint is trained on the **Pascal Person Part** dataset and packaged for the ๐Ÿค— Transformers `AutoModel` API.
> Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing)
| Source image | Segmentation result |
|:---:|:---:|
| ![demo](./assets/demo.jpg) | ![demo-pascal](./assets/demo_pascal.png) |
**Use cases:**
- ๐Ÿƒ **Body part segmentation** โ€” segment coarse body regions (head, torso, arms, legs) for pose-aware applications
- ๐ŸŽฎ **Avatar rigging** โ€” generate body part masks as a preprocessing step for AR/VR avatars
- ๐Ÿฅ **Medical / ergonomics** โ€” coarse body region detection for posture analysis or wearable device placement
- ๐Ÿ“ **Body proportion estimation** โ€” measure relative areas of body segments in 2D images
## Dataset โ€” Pascal Person Part
Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on **body part segmentation**.
- **mIoU on Pascal Person Part validation: 71.46%**
- 7 coarse labels covering body regions
## Labels
| ID | Label |
|----|-------|
| 0 | Background |
| 1 | Head |
| 2 | Torso |
| 3 | Upper Arms |
| 4 | Lower Arms |
| 5 | Upper Legs |
| 6 | Lower Legs |
## Usage โ€” PyTorch
```python
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch
model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# outputs.logits โ€” (1, 7, 512, 512) raw logits
# outputs.parsing_logits โ€” (1, 7, 512, 512) refined parsing logits
# outputs.edge_logits โ€” (1, 1, 512, 512) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy() # (H, W), values in [0, 6]
```
Each pixel in `seg_map` is a label ID. To map IDs back to names:
```python
id2label = model.config.id2label
print(id2label[1]) # โ†’ "Head"
```
## Usage โ€” ONNX Runtime
Optimized ONNX files are available in the `onnx/` folder of this repo:
| File | Size | Notes |
|------|------|-------|
| `onnx/schp-pascal-7.onnx` + `.onnx.data` | ~257 MB | FP32, dynamic batch |
| `onnx/schp-pascal-7-int8-static.onnx` | ~66 MB | INT8 static, 99.77% pixel agreement |
```python
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image
model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx")
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])
image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze() # (H, W)
```
## Performance
Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`):
| Backend | Latency | Speedup | Size |
|---------|---------|---------|------|
| PyTorch FP32 | ~424 ms | 1ร— | 255 MB |
| ONNX FP32 | ~296 ms | 1.44ร— | 256 MB |
| ONNX INT8 static | ~218 ms | **1.94ร—** | **66 MB** |
INT8 static quantization achieves **99.77% pixel-level agreement** with the FP32 model.
## Model Details
| Property | Value |
|----------|-------|
| Architecture | ResNet-101 + SCHP self-correction |
| Input size | 512 ร— 512 |
| Output | 3 heads: logits, parsing_logits, edge_logits |
| num_labels | 7 |
| Dataset | Pascal Person Part |
| Original mIoU | 71.46% |