SCHP β€” Self-Correction Human Parsing (LIP, 20 classes)

SCHP (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone. This checkpoint is trained on the LIP dataset and packaged for the πŸ€— Transformers AutoModel API.

Original repository: PeikeLi/Self-Correction-Human-Parsing

Source image Segmentation result
demo demo-lip

Use cases:

  • 🎨 Outfit palette extraction β€” mask each clothing region then run color clustering to extract dominant colors per garment
  • 🏷️ Product tagging for e-commerce β€” automatically label photos with clothing categories
  • πŸ‘š Virtual try-on pre-processing β€” generate garment masks as input to try-on models such as VITON or LaDI-VTON
  • ✏️ Dataset annotation β€” accelerate labeling pipelines with predicted masks as initial annotations

Dataset β€” LIP

LIP is the largest single-person human parsing dataset with 50 000+ images focused on real-world scenarios.

  • mIoU on LIP validation: 59.36%
  • 20 labels covering clothing items, body parts, and accessories

Labels

ID Label ID Label ID Label
0 Background 7 Coat 14 Left-arm
1 Hat 8 Socks 15 Right-arm
2 Hair 9 Pants 16 Left-leg
3 Glove 10 Jumpsuits 17 Right-leg
4 Sunglasses 11 Scarf 18 Left-shoe
5 Upper-clothes 12 Skirt 19 Right-shoe
6 Dress 13 Face

Usage β€” PyTorch

from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
from PIL import Image
import torch

model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)
processor = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)

image = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# outputs.logits         β€” (1, 20, 473, 473) raw logits
# outputs.parsing_logits β€” (1, 20, 473, 473) refined parsing logits
# outputs.edge_logits    β€” (1,  2, 473, 473) edge prediction logits
seg_map = outputs.logits.argmax(dim=1).squeeze().numpy()  # (H, W), values in [0, 19]

Each pixel in seg_map is a label ID. To map IDs back to names:

id2label = model.config.id2label
print(id2label[5])  # β†’ "Upper-clothes"

Usage β€” ONNX Runtime

import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoImageProcessor
from PIL import Image

model_path = hf_hub_download("pirocheto/schp-lip-20", "onnx/schp-lip-20-int8-static.onnx")
processor  = AutoImageProcessor.from_pretrained("pirocheto/schp-lip-20", trust_remote_code=True)

sess_opts = ort.SessionOptions()
sess_opts.intra_op_num_threads = 8
sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])

image  = Image.open("photo.jpg").convert("RGB")
inputs = processor(images=image, return_tensors="np")
logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
seg_map = logits.argmax(axis=1).squeeze()  # (H, W)

Performance

Benchmarked on CPU (16-core, 8 ORT threads, intra_op_num_threads=8):

Backend Latency Speedup
PyTorch FP32 ~360 ms 1Γ—
ONNX FP32 ~243 ms 1.5Γ—
ONNX INT8 static ~189 ms 1.9Γ—

INT8 static quantization achieves 99.09% pixel-level agreement with the FP32 model.

Model Details

Property Value
Architecture ResNet-101 + SCHP self-correction
Input size 473 Γ— 473
Output 3 heads: logits, parsing_logits, edge_logits
Training dataset LIP
Number of classes 20
Framework PyTorch / Transformers

Citation

@article{li2020self,
  title={Self-Correction for Human Parsing},
  author={Li, Peike and Xu, Yunqiu and Wei, Yunchao and Yang, Yi},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2020},
  doi={10.1109/TPAMI.2020.3048039}
}
Downloads last month
80
Safetensors
Model size
66.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support