SCHP — Self-Correction Human Parsing (LIP, 20 classes)

SCHP (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone. This checkpoint is trained on the LIP dataset and packaged for the 🤗 Transformers AutoModel API.

Original repository: PeikeLi/Self-Correction-Human-Parsing

Source image	Segmentation result

Use cases:

🎨 Outfit palette extraction — mask each clothing region then run color clustering to extract dominant colors per garment
🏷️ Product tagging for e-commerce — automatically label photos with clothing categories
👚 Virtual try-on pre-processing — generate garment masks as input to try-on models such as VITON or LaDI-VTON
✏️ Dataset annotation — accelerate labeling pipelines with predicted masks as initial annotations

Dataset — LIP

LIP is the largest single-person human parsing dataset with 50 000+ images focused on real-world scenarios.

mIoU on LIP validation: 59.36%
20 labels covering clothing items, body parts, and accessories

Labels

ID	Label	ID	Label	ID	Label
0	Background	7	Coat	14	Left-arm
1	Hat	8	Socks	15	Right-arm
2	Hair	9	Pants	16	Left-leg
3	Glove	10	Jumpsuits	17	Right-leg
4	Sunglasses	11	Scarf	18	Left-shoe
5	Upper-clothes	12	Skirt	19	Right-shoe
6	Dress	13	Face

Usage — PyTorch

from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch

model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)

image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# outputs.logits         — (1, 20, 473, 473) raw logits
# outputs.parsing_logits — (1, 20, 473, 473) refined parsing logits
# outputs.edge_logits    — (1,  2, 473, 473) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy()  # (H, W), values in [0, 19]

Each pixel in seg_map is a label ID. To map IDs back to names:

id2label = model.config.id2label
print(id2label[5])  # → "Upper-clothes"

Usage — ONNX Runtime

import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image

model_path = hf_hub_download("pirocheto/schp-lip-20", "onnx/schp-lip-20-int8-static.onnx")
processor  = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)

sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])

image  = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze()  # (H, W)

Performance

Benchmarked on CPU (16-core, 8 ORT threads, intra_op_num_threads=8):

Backend	Latency	Speedup
PyTorch FP32	~360 ms	1×
ONNX FP32	~243 ms	1.5×
ONNX INT8 static	~189 ms	1.9×

INT8 static quantization achieves 99.09% pixel-level agreement with the FP32 model.

Model Details

Property	Value
Architecture	ResNet-101 + SCHP self-correction
Input size	473 × 473
Output	3 heads: logits, parsing_logits, edge_logits
Training dataset	LIP
Number of classes	20
Framework	PyTorch / Transformers

Citation

@article{li2020self,
  title={Self-Correction for Human Parsing},
  author={Li, Peike and Xu, Yunqiu and Wei, Yunchao and Yang, Yi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2020},
  doi={10.1109/TPAMI.2020.3048039}
}

Downloads last month: 120

Safetensors

Model size

66.8M params

Tensor type

F32