SCHP β Self-Correction Human Parsing (LIP, 20 classes)
SCHP (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
This checkpoint is trained on the LIP dataset and packaged for the π€ Transformers AutoModel API.
Original repository: PeikeLi/Self-Correction-Human-Parsing
Use cases:
- π¨ Outfit palette extraction β mask each clothing region then run color clustering to extract dominant colors per garment
- π·οΈ Product tagging for e-commerce β automatically label photos with clothing categories
- π Virtual try-on pre-processing β generate garment masks as input to try-on models such as VITON or LaDI-VTON
- βοΈ Dataset annotation β accelerate labeling pipelines with predicted masks as initial annotations
Dataset β LIP
LIP is the largest single-person human parsing dataset with 50 000+ images focused on real-world scenarios.
- mIoU on LIP validation: 59.36%
- 20 labels covering clothing items, body parts, and accessories
Labels
| ID | Label | ID | Label | ID | Label |
|---|---|---|---|---|---|
| 0 | Background | 7 | Coat | 14 | Left-arm |
| 1 | Hat | 8 | Socks | 15 | Right-arm |
| 2 | Hair | 9 | Pants | 16 | Left-leg |
| 3 | Glove | 10 | Jumpsuits | 17 | Right-leg |
| 4 | Sunglasses | 11 | Scarf | 18 | Left-shoe |
| 5 | Upper-clothes | 12 | Skirt | 19 | Right-shoe |
| 6 | Dress | 13 | Face |
Usage β PyTorch
from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch
model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)
image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
# outputs.logits β (1, 20, 473, 473) raw logits
# outputs.parsing_logits β (1, 20, 473, 473) refined parsing logits
# outputs.edge_logits β (1, 2, 473, 473) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy() # (H, W), values in [0, 19]
Each pixel in seg_map is a label ID. To map IDs back to names:
id2label = model.config.id2label
print(id2label[5]) # β "Upper-clothes"
Usage β ONNX Runtime
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image
model_path = hf_hub_download("pirocheto/schp-lip-20", "onnx/schp-lip-20-int8-static.onnx")
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)
sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])
image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze() # (H, W)
Performance
Benchmarked on CPU (16-core, 8 ORT threads, intra_op_num_threads=8):
| Backend | Latency | Speedup |
|---|---|---|
| PyTorch FP32 | ~360 ms | 1Γ |
| ONNX FP32 | ~243 ms | 1.5Γ |
| ONNX INT8 static | ~189 ms | 1.9Γ |
INT8 static quantization achieves 99.09% pixel-level agreement with the FP32 model.
Model Details
| Property | Value |
|---|---|
| Architecture | ResNet-101 + SCHP self-correction |
| Input size | 473 Γ 473 |
| Output | 3 heads: logits, parsing_logits, edge_logits |
| Training dataset | LIP |
| Number of classes | 20 |
| Framework | PyTorch / Transformers |
Citation
@article{li2020self,
title={Self-Correction for Human Parsing},
author={Li, Peike and Xu, Yunqiu and Wei, Yunchao and Yang, Yi},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2020},
doi={10.1109/TPAMI.2020.3048039}
}
- Downloads last month
- 80

