feat: initial release — Pascal Person Part 7-class SCHP model

Browse files

Files changed (12) hide show

.gitattributes +5 -0
.gitignore +14 -0
README.md +128 -0
config.json +32 -0
configuration_schp.py +48 -0
image_processing_schp.py +95 -0
model.safetensors +3 -0
modeling_schp.py +428 -0
onnx/schp-pascal-7-int8-static.onnx +3 -0
onnx/schp-pascal-7.onnx +3 -0
onnx/schp-pascal-7.onnx.data +3 -0
preprocessor_config.json +20 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,5 @@

+*.onnx.data filter=lfs diff=lfs merge=lfs -text
+*.jpg filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,14 @@

+__pycache__/
+*.pyc
+*.pyo
+# Temporary files from ONNX quantization pre-processing
+onnx/*-preprocessed.onnx
+onnx/*-preprocessed.onnx.data
+onnx/*.data
+# Keep named ONNX files
+!onnx/schp-pascal-7.onnx
+!onnx/schp-pascal-7.onnx.data
+!onnx/schp-pascal-7-int8-static.onnx
+!onnx/schp-pascal-7-int8-dynamic.onnx

README.md ADDED Viewed

	@@ -0,0 +1,128 @@

+---
+language: en
+license: mit
+tags:
+  - vision
+  - image-segmentation
+  - semantic-segmentation
+  - human-parsing
+  - body-parts
+  - pytorch
+  - onnx
+datasets:
+  - pascal-person-part
+pipeline_tag: image-segmentation
+---
+# SCHP — Self-Correction Human Parsing (Pascal Person Part, 7 classes)
+**SCHP** (Self-Correction for Human Parsing) is a state-of-the-art human parsing model based on a ResNet-101 backbone.
+This checkpoint is trained on the **Pascal Person Part** dataset and packaged for the 🤗 Transformers `AutoModel` API.
+> Original repository: [PeikeLi/Self-Correction-Human-Parsing](https://github.com/PeikeLi/Self-Correction-Human-Parsing)
+**Use cases:**
+- 🏃 **Body part segmentation** — segment coarse body regions (head, torso, arms, legs) for pose-aware applications
+- 🎮 **Avatar rigging** — generate body part masks as a preprocessing step for AR/VR avatars
+- 🏥 **Medical / ergonomics** — coarse body region detection for posture analysis or wearable device placement
+- 📐 **Body proportion estimation** — measure relative areas of body segments in 2D images
+## Dataset — Pascal Person Part
+Pascal Person Part is a single-person human parsing dataset with 3 000+ images focused on **body part segmentation**.
+- **mIoU on Pascal Person Part validation: 71.46%**
+- 7 coarse labels covering body regions
+## Labels
+| ID | Label |
+|----|-------|
+| 0 | Background |
+| 1 | Head |
+| 2 | Torso |
+| 3 | Upper Arms |
+| 4 | Lower Arms |
+| 5 | Upper Legs |
+| 6 | Lower Legs |
+## Usage — PyTorch
+```python
+from transformers import AutoImageProcessor, AutoModelForSemanticSegmentation
+from PIL import Image
+import torch
+model = AutoModelForSemanticSegmentation.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
+processor = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
+image = Image.open("photo.jpg").convert("RGB")
+inputs = processor(images=image, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+# outputs.logits         — (1,  7, 512, 512) raw logits
+# outputs.parsing_logits — (1,  7, 512, 512) refined parsing logits
+# outputs.edge_logits    — (1,  1, 512, 512) edge prediction logits
+seg_map = outputs.logits.argmax(dim=1).squeeze().numpy()  # (H, W), values in [0, 6]
+```
+Each pixel in `seg_map` is a label ID. To map IDs back to names:
+```python
+id2label = model.config.id2label
+print(id2label[1])  # → "Head"
+```
+## Usage — ONNX Runtime
+Optimized ONNX files are available in the `onnx/` folder of this repo:
+| File | Size | Notes |
+|------|------|-------|
+| `onnx/schp-pascal-7.onnx` + `.onnx.data` | ~257 MB | FP32, dynamic batch |
+| `onnx/schp-pascal-7-int8-static.onnx` | ~66 MB | INT8 static, 99.77% pixel agreement |
+```python
+import onnxruntime as ort
+import numpy as np
+from huggingface_hub import hf_hub_download
+from transformers import AutoImageProcessor
+from PIL import Image
+model_path = hf_hub_download("pirocheto/schp-pascal-7", "onnx/schp-pascal-7-int8-static.onnx")
+processor  = AutoImageProcessor.from_pretrained("pirocheto/schp-pascal-7", trust_remote_code=True)
+sess_opts = ort.SessionOptions()
+sess_opts.intra_op_num_threads = 8
+sess = ort.InferenceSession(model_path, sess_opts, providers=["CPUExecutionProvider"])
+image  = Image.open("photo.jpg").convert("RGB")
+inputs = processor(images=image, return_tensors="np")
+logits = sess.run(["logits"], {"pixel_values": inputs["pixel_values"]})[0]
+seg_map = logits.argmax(axis=1).squeeze()  # (H, W)
+```
+## Performance
+Benchmarked on CPU (16-core, 8 ORT threads, `intra_op_num_threads=8`):
+| Backend | Latency | Speedup | Size |
+|---------|---------|---------|------|
+| PyTorch FP32 | ~424 ms | 1× | 255 MB |
+| ONNX FP32 | ~296 ms | 1.44× | 256 MB |
+| ONNX INT8 static | ~218 ms | **1.94×** | **66 MB** |
+INT8 static quantization achieves **99.77% pixel-level agreement** with the FP32 model.
+## Model Details
+| Property | Value |
+|----------|-------|
+| Architecture | ResNet-101 + SCHP self-correction |
+| Input size | 512 × 512 |
+| Output | 3 heads: logits, parsing_logits, edge_logits |
+| num_labels | 7 |
+| Dataset | Pascal Person Part |
+| Original mIoU | 71.46% |

config.json ADDED Viewed

	@@ -0,0 +1,32 @@

+{
+  "architectures": [
+    "SCHPForSemanticSegmentation"
+  ],
+  "auto_map": {
+    "AutoConfig": "configuration_schp.SCHPConfig",
+    "AutoModelForSemanticSegmentation": "modeling_schp.SCHPForSemanticSegmentation"
+  },
+  "backbone": "resnet101",
+  "dtype": "float32",
+  "id2label": {
+    "0": "Background",
+    "1": "Head",
+    "2": "Torso",
+    "3": "Upper Arms",
+    "4": "Lower Arms",
+    "5": "Upper Legs",
+    "6": "Lower Legs"
+  },
+  "input_size": 512,
+  "label2id": {
+    "Background": "0",
+    "Head": "1",
+    "Lower Arms": "4",
+    "Lower Legs": "6",
+    "Torso": "2",
+    "Upper Arms": "3",
+    "Upper Legs": "5"
+  },
+  "model_type": "schp",
+  "transformers_version": "5.5.0"
+}

configuration_schp.py ADDED Viewed

	@@ -0,0 +1,48 @@

+from transformers import PretrainedConfig
+_PASCAL_LABELS = [
+    "Background",
+    "Head",
+    "Torso",
+    "Upper Arms",
+    "Lower Arms",
+    "Upper Legs",
+    "Lower Legs",
+]
+class SCHPConfig(PretrainedConfig):
+    r"""
+    Configuration for **Self-Correction-Human-Parsing (SCHP)**.
+    Args:
+        num_labels (`int`, *optional*, defaults to 7):
+            Number of segmentation classes (7 for Pascal Person Part dataset).
+        input_size (`int`, *optional*, defaults to 512):
+            Spatial resolution the model expects (height = width).
+        backbone (`str`, *optional*, defaults to `"resnet101"`):
+            Backbone architecture name. Only `"resnet101"` is supported.
+    """
+    model_type = "schp"
+    def __init__(
+        self,
+        num_labels: int = 7,
+        input_size: int = 512,
+        backbone: str = "resnet101",
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.num_labels = num_labels
+        self.input_size = input_size
+        self.backbone = backbone
+        if "id2label" not in kwargs:
+            self.id2label = {
+                str(i): lbl for i, lbl in enumerate(_PASCAL_LABELS[:num_labels])
+            }
+        if "label2id" not in kwargs:
+            self.label2id = {
+                lbl: str(i) for i, lbl in enumerate(_PASCAL_LABELS[:num_labels])
+            }

image_processing_schp.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""
+SCHPImageProcessor — preprocessing for SCHPForSemanticSegmentation.
+Resizes images to the model's expected input size and normalises with the
+SCHP BGR-indexed mean/std convention (channels are RGB in the tensor but
+the normalisation constants come from a BGR-trained ResNet-101).
+"""
+from typing import Dict, List, Optional, Union
+import numpy as np
+import torch
+import torchvision.transforms.functional as TF
+from PIL import Image
+from transformers import BaseImageProcessor
+from transformers.image_processing_utils import BatchFeature
+class SCHPImageProcessor(BaseImageProcessor):
+    """
+    Image processor for SCHP (Self-Correction Human Parsing).
+    Args:
+        size (`dict`, *optional*, defaults to ``{"height": 512, "width": 512}``):
+            Resize target for the shorter edge. The model was trained at 512×512.
+        image_mean (`list[float]`):
+            Per-channel mean in **RGB channel order** using BGR-indexed values:
+            ``[0.406, 0.456, 0.485]``.
+        image_std (`list[float]`):
+            Per-channel std  in **RGB channel order** using BGR-indexed values:
+            ``[0.225, 0.224, 0.229]``.
+    """
+    model_input_names = ["pixel_values"]
+    def __init__(
+        self,
+        size: Optional[Dict[str, int]] = None,
+        image_mean: Optional[List[float]] = None,
+        image_std: Optional[List[float]] = None,
+        **kwargs,
+    ):
+        super().__init__(**kwargs)
+        self.size = size or {"height": 512, "width": 512}
+        # BGR-indexed normalisation constants used during SCHP training
+        self.image_mean = image_mean or [0.406, 0.456, 0.485]
+        self.image_std = image_std or [0.225, 0.224, 0.229]
+    def preprocess(
+        self,
+        images: Union[
+            Image.Image,
+            np.ndarray,
+            torch.Tensor,
+            List[Union[Image.Image, np.ndarray, torch.Tensor]],
+        ],
+        return_tensors: Optional[str] = "pt",
+        **kwargs,
+    ) -> BatchFeature:
+        """
+        Pre-process one or more images.
+        Returns a :class:`BatchFeature` with a ``pixel_values`` key of shape
+        ``(batch, 3, H, W)`` as a ``torch.Tensor`` (when ``return_tensors="pt"``).
+        """
+        if not isinstance(images, (list, tuple)):
+            images = [images]
+        h = self.size["height"]
+        w = self.size["width"]
+        mean = self.image_mean
+        std = self.image_std
+        tensors = []
+        for img in images:
+            # --- normalise input type to PIL RGB ---
+            pil: Image.Image
+            if isinstance(img, torch.Tensor):
+                # (C, H, W) float tensor in [0, 1]
+                pil = TF.to_pil_image(img.cpu())
+            elif isinstance(img, np.ndarray):
+                pil = Image.fromarray(np.asarray(img, dtype=np.uint8))
+            else:
+                assert isinstance(img, Image.Image)
+                pil = img
+            pil = pil.convert("RGB")
+            # --- resize → tensor → normalise ---
+            pil = pil.resize((w, h), resample=Image.Resampling.BILINEAR)
+            t = TF.to_tensor(pil)  # float32 in [0, 1], shape (3, H, W)
+            t = TF.normalize(t, mean=mean, std=std)
+            tensors.append(t)
+        pixel_values = torch.stack(tensors)  # (B, 3, H, W)
+        return BatchFeature({"pixel_values": pixel_values}, tensor_type=return_tensors)

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:50f185b34ce14e92bccf809c9d8a369e9beaa4b999ef15fab0e2a8c3475560c6
+size 267399112

modeling_schp.py ADDED Viewed

	@@ -0,0 +1,428 @@

+"""
+SCHP (Self-Correction Human Parsing) — Transformers-compatible implementation.
+Architecture inlined from https://github.com/GoGoDuck912/Self-Correction-Human-Parsing
+(networks/AugmentCE2P.py) with the CUDA-only InPlaceABNSync replaced by a pure-PyTorch
+drop-in, making the model fully runnable on CPU.
+"""
+import functools
+from dataclasses import dataclass
+from typing import Optional, Tuple, Union
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from transformers import PreTrainedModel
+from transformers.utils import ModelOutput
+from schp.configuration_schp import SCHPConfig
+# ── Pure-PyTorch InPlaceABNSync shim ──────────────────────────────────────────
+class InPlaceABNSync(nn.BatchNorm2d):
+    """CPU-compatible drop-in for InPlaceABNSync.
+    Subclasses ``nn.BatchNorm2d`` directly so that state-dict keys
+    (weight, bias, running_mean, running_var) match the original SCHP
+    checkpoints without any nesting.
+    """
+    def __init__(self, num_features, activation="leaky_relu", slope=0.01, **kwargs):
+        bn_kwargs = {
+            k: v
+            for k, v in kwargs.items()
+            if k in ("eps", "momentum", "affine", "track_running_stats")
+        }
+        super().__init__(num_features, **bn_kwargs)
+        self.activation = activation
+        self.slope = slope
+    def forward(self, input: torch.Tensor) -> torch.Tensor:  # type: ignore[override]
+        input = super().forward(input)
+        if self.activation == "leaky_relu":
+            return F.leaky_relu(input, negative_slope=self.slope, inplace=True)
+        elif self.activation == "elu":
+            return F.elu(input, inplace=True)
+        return input
+# BatchNorm2d with no activation (activation="none")
+BatchNorm2d = functools.partial(InPlaceABNSync, activation="none")
+affine_par = True
+# ── Model architecture (inlined from AugmentCE2P.py) ─────────────────────────
+def _conv3x3(in_planes, out_planes, stride=1):
+    return nn.Conv2d(
+        in_planes, out_planes, kernel_size=3, stride=stride, padding=1, bias=False
+    )
+class _Bottleneck(nn.Module):
+    expansion = 4
+    def __init__(
+        self, inplanes, planes, stride=1, dilation=1, downsample=None, multi_grid=1
+    ):
+        super().__init__()
+        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+        self.bn1 = BatchNorm2d(planes)
+        self.conv2 = nn.Conv2d(
+            planes,
+            planes,
+            kernel_size=3,
+            stride=stride,
+            padding=dilation * multi_grid,
+            dilation=dilation * multi_grid,
+            bias=False,
+        )
+        self.bn2 = BatchNorm2d(planes)
+        self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
+        self.bn3 = BatchNorm2d(planes * 4)
+        self.relu = nn.ReLU(inplace=False)
+        self.relu_inplace = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.dilation = dilation
+        self.stride = stride
+    def forward(self, x):
+        residual = x
+        out = self.relu(self.bn1(self.conv1(x)))
+        out = self.relu(self.bn2(self.conv2(out)))
+        out = self.bn3(self.conv3(out))
+        if self.downsample is not None:
+            residual = self.downsample(x)
+        return self.relu_inplace(out + residual)
+class _PSPModule(nn.Module):
+    def __init__(self, features, out_features=512, sizes=(1, 2, 3, 6)):
+        super().__init__()
+        self.stages = nn.ModuleList(
+            [
+                nn.Sequential(
+                    nn.AdaptiveAvgPool2d(size),
+                    nn.Conv2d(features, out_features, kernel_size=1, bias=False),
+                    InPlaceABNSync(out_features),
+                )
+                for size in sizes
+            ]
+        )
+        self.bottleneck = nn.Sequential(
+            nn.Conv2d(
+                features + len(sizes) * out_features,
+                out_features,
+                kernel_size=3,
+                padding=1,
+                dilation=1,
+                bias=False,
+            ),
+            InPlaceABNSync(out_features),
+        )
+    def forward(self, feats):
+        h, w = feats.size(2), feats.size(3)
+        priors = [
+            F.interpolate(
+                stage(feats), size=(h, w), mode="bilinear", align_corners=True
+            )
+            for stage in self.stages
+        ] + [feats]
+        return self.bottleneck(torch.cat(priors, dim=1))
+class _Edge_Module(nn.Module):
+    def __init__(self, in_fea=(256, 512, 1024), mid_fea=256, out_fea=2):
+        super().__init__()
+        self.conv1 = nn.Sequential(
+            nn.Conv2d(in_fea[0], mid_fea, kernel_size=1, bias=False),
+            InPlaceABNSync(mid_fea),
+        )
+        self.conv2 = nn.Sequential(
+            nn.Conv2d(in_fea[1], mid_fea, kernel_size=1, bias=False),
+            InPlaceABNSync(mid_fea),
+        )
+        self.conv3 = nn.Sequential(
+            nn.Conv2d(in_fea[2], mid_fea, kernel_size=1, bias=False),
+            InPlaceABNSync(mid_fea),
+        )
+        self.conv4 = nn.Conv2d(mid_fea, out_fea, kernel_size=3, padding=1, bias=True)
+        self.conv5 = nn.Conv2d(out_fea * 3, out_fea, kernel_size=1, bias=True)
+    def forward(self, x1, x2, x3):
+        _, _, h, w = x1.size()
+        ef1 = self.conv1(x1)
+        ef2 = self.conv2(x2)
+        ef3 = self.conv3(x3)
+        e1 = self.conv4(ef1)
+        e2 = F.interpolate(
+            self.conv4(ef2), size=(h, w), mode="bilinear", align_corners=True
+        )
+        e3 = F.interpolate(
+            self.conv4(ef3), size=(h, w), mode="bilinear", align_corners=True
+        )
+        ef2 = F.interpolate(ef2, size=(h, w), mode="bilinear", align_corners=True)
+        ef3 = F.interpolate(ef3, size=(h, w), mode="bilinear", align_corners=True)
+        edge = self.conv5(torch.cat([e1, e2, e3], dim=1))
+        edge_fea = torch.cat([ef1, ef2, ef3], dim=1)
+        return edge, edge_fea
+class _Decoder_Module(nn.Module):
+    def __init__(self, num_classes):
+        super().__init__()
+        self.conv1 = nn.Sequential(
+            nn.Conv2d(512, 256, kernel_size=1, bias=False),
+            InPlaceABNSync(256),
+        )
+        self.conv2 = nn.Sequential(
+            nn.Conv2d(256, 48, kernel_size=1, bias=False),
+            InPlaceABNSync(48),
+        )
+        self.conv3 = nn.Sequential(
+            nn.Conv2d(304, 256, kernel_size=1, bias=False),
+            InPlaceABNSync(256),
+            nn.Conv2d(256, 256, kernel_size=1, bias=False),
+            InPlaceABNSync(256),
+        )
+        self.conv4 = nn.Conv2d(256, num_classes, kernel_size=1, bias=True)
+    def forward(self, xt, xl):
+        _, _, h, w = xl.size()
+        xt = F.interpolate(
+            self.conv1(xt), size=(h, w), mode="bilinear", align_corners=True
+        )
+        xl = self.conv2(xl)
+        x = self.conv3(torch.cat([xt, xl], dim=1))
+        return self.conv4(x), x
+class _SCHPResNet(nn.Module):
+    """SCHP ResNet-101 backbone + decoder (reproduced from AugmentCE2P.py)."""
+    def __init__(self, num_classes: int):
+        self.inplanes = 128
+        super().__init__()
+        # Three-layer stem
+        self.conv1 = _conv3x3(3, 64, stride=2)
+        self.bn1 = BatchNorm2d(64)
+        self.relu1 = nn.ReLU(inplace=False)
+        self.conv2 = _conv3x3(64, 64)
+        self.bn2 = BatchNorm2d(64)
+        self.relu2 = nn.ReLU(inplace=False)
+        self.conv3 = _conv3x3(64, 128)
+        self.bn3 = BatchNorm2d(128)
+        self.relu3 = nn.ReLU(inplace=False)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        # ResNet stages
+        self.layer1 = self._make_layer(_Bottleneck, 64, 3)
+        self.layer2 = self._make_layer(_Bottleneck, 128, 4, stride=2)
+        self.layer3 = self._make_layer(_Bottleneck, 256, 23, stride=2)
+        self.layer4 = self._make_layer(
+            _Bottleneck, 512, 3, stride=1, dilation=2, multi_grid=(1, 1, 1)
+        )
+        # Head modules
+        self.context_encoding = _PSPModule(2048, 512)
+        self.edge = _Edge_Module()
+        self.decoder = _Decoder_Module(num_classes)
+        self.fushion = nn.Sequential(
+            nn.Conv2d(1024, 256, kernel_size=1, bias=False),
+            InPlaceABNSync(256),
+            nn.Dropout2d(0.1),
+            nn.Conv2d(256, num_classes, kernel_size=1, bias=True),
+        )
+    def _make_layer(self, block, planes, blocks, stride=1, dilation=1, multi_grid=1):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.Sequential(
+                nn.Conv2d(
+                    self.inplanes,
+                    planes * block.expansion,
+                    kernel_size=1,
+                    stride=stride,
+                    bias=False,
+                ),
+                BatchNorm2d(planes * block.expansion, affine=affine_par),
+            )
+        def _grid(i, g):
+            return g[i % len(g)] if isinstance(g, tuple) else 1
+        layers = [
+            block(
+                self.inplanes,
+                planes,
+                stride,
+                dilation=dilation,
+                downsample=downsample,
+                multi_grid=_grid(0, multi_grid),
+            )
+        ]
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(
+                block(
+                    self.inplanes,
+                    planes,
+                    dilation=dilation,
+                    multi_grid=_grid(i, multi_grid),
+                )
+            )
+        return nn.Sequential(*layers)
+    def forward(self, x):
+        x = self.relu1(self.bn1(self.conv1(x)))
+        x = self.relu2(self.bn2(self.conv2(x)))
+        x = self.relu3(self.bn3(self.conv3(x)))
+        x = self.maxpool(x)
+        x2 = self.layer1(x)
+        x3 = self.layer2(x2)
+        x4 = self.layer3(x3)
+        x5 = self.layer4(x4)
+        context = self.context_encoding(x5)
+        parsing_result, parsing_fea = self.decoder(context, x2)
+        edge_result, edge_fea = self.edge(x2, x3, x4)
+        fusion_result = self.fushion(torch.cat([parsing_fea, edge_fea], dim=1))
+        # Return format mirrors the original: [[parsing, fusion], [edge]]
+        return [[parsing_result, fusion_result], [edge_result]]
+# ── Transformers output dataclass ────────────────────────────────────────────
+@dataclass
+class SCHPSemanticSegmenterOutput(ModelOutput):
+    """
+    Output type for :class:`SCHPForSemanticSegmentation`.
+    Args:
+        loss: Cross-entropy loss (only when ``labels`` is provided).
+        logits: Final fusion logits, shape ``(batch, num_labels, H, W)``,
+            upsampled to the input image resolution.
+        parsing_logits: Decoder-branch logits before fusion,
+            shape ``(batch, num_labels, H, W)``.
+        edge_logits: Edge-branch logits, shape ``(batch, 2, H, W)``.
+    """
+    loss: Optional[torch.Tensor] = None
+    logits: Optional[torch.Tensor] = None
+    parsing_logits: Optional[torch.Tensor] = None
+    edge_logits: Optional[torch.Tensor] = None
+# ── PreTrainedModel wrapper ───────────────────────────────────────────────────
+class SCHPForSemanticSegmentation(PreTrainedModel):
+    """
+    SCHP ResNet-101 for human parsing / semantic segmentation.
+    Usage — loading from an original SCHP ``.pth`` checkpoint::
+        model = SCHPForSemanticSegmentation.from_schp_checkpoint(
+            "checkpoints/schp/exp-schp-201908301523-atr.pth"
+        )
+    Usage — loading after :meth:`save_pretrained`::
+        model = SCHPForSemanticSegmentation.from_pretrained(
+            "./my-schp-model", trust_remote_code=True
+        )
+    """
+    config_class = SCHPConfig
+    # num_batches_tracked is not stored in the original SCHP checkpoints
+    _keys_to_ignore_on_load_missing = [r"\.num_batches_tracked$"]
+    def __init__(self, config: SCHPConfig):
+        super().__init__(config)
+        self.model = _SCHPResNet(num_classes=config.num_labels)
+        self.post_init()
+    def forward(
+        self,
+        pixel_values: torch.Tensor,
+        labels: Optional[torch.LongTensor] = None,
+        return_dict: Optional[bool] = None,
+    ) -> Union[SCHPSemanticSegmenterOutput, Tuple]:
+        """
+        Args:
+            pixel_values: ``(batch, 3, H, W)`` — normalised with SCHP BGR-indexed means.
+            labels: ``(batch, H, W)`` integer class map for computing CE loss.
+            return_dict: Override ``config.use_return_dict``.
+        """
+        return_dict = return_dict if return_dict is not None else True
+        h, w = pixel_values.shape[-2:]
+        raw = self.model(pixel_values)
+        # raw = [[parsing_result, fusion_result], [edge_result]]
+        logits = F.interpolate(
+            raw[0][1], size=(h, w), mode="bilinear", align_corners=True
+        )
+        parsing_logits = F.interpolate(
+            raw[0][0], size=(h, w), mode="bilinear", align_corners=True
+        )
+        edge_logits = F.interpolate(
+            raw[1][0], size=(h, w), mode="bilinear", align_corners=True
+        )
+        loss = None
+        if labels is not None:
+            loss = F.cross_entropy(logits, labels.long())
+        if not return_dict:
+            return (loss, logits) if loss is not None else (logits,)
+        return SCHPSemanticSegmenterOutput(
+            loss=loss,
+            logits=logits,
+            parsing_logits=parsing_logits,
+            edge_logits=edge_logits,
+        )
+    @classmethod
+    def from_schp_checkpoint(
+        cls,
+        checkpoint_path: str,
+        config: Optional[SCHPConfig] = None,
+        map_location: str = "cpu",
+    ) -> "SCHPForSemanticSegmentation":
+        """
+        Load from an original SCHP ``.pth`` checkpoint.
+        Handles the ``module.`` prefix added by ``DataParallel`` training and
+        remaps keys to the ``model.*`` namespace used by this wrapper.
+        Args:
+            checkpoint_path: Path to the ``.pth`` file.
+            config: :class:`SCHPConfig` instance. Defaults to ATR-18 config.
+            map_location: PyTorch device string (``"cpu"`` or ``"cuda"``).
+        """
+        if config is None:
+            config = SCHPConfig()
+        model = cls(config)
+        raw = torch.load(checkpoint_path, map_location=map_location)
+        state_dict = raw.get("state_dict", raw)
+        # Strip DataParallel module. prefix if present
+        if all(k.startswith("module.") for k in state_dict):
+            state_dict = {k[len("module.") :]: v for k, v in state_dict.items()}
+        # Remap to model.* namespace (self.model = _SCHPResNet)
+        state_dict = {"model." + k: v for k, v in state_dict.items()}
+        missing, unexpected = model.load_state_dict(state_dict, strict=False)
+        real_missing = [k for k in missing if "num_batches_tracked" not in k]
+        if real_missing:
+            raise RuntimeError(
+                f"Missing keys when loading SCHP checkpoint ({len(real_missing)} total): "
+                f"{real_missing[:5]}"
+            )
+        if unexpected:
+            raise RuntimeError(
+                f"Unexpected keys when loading SCHP checkpoint ({len(unexpected)} total): "
+                f"{unexpected[:5]}"
+            )
+        return model

onnx/schp-pascal-7-int8-static.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:66b12766d7f1ddbc3de972e67e8626be727507e7feeeca34e1b23b6f45e756d2
+size 69148800

onnx/schp-pascal-7.onnx ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f8ce1f038ed6cb429f0a4a2f146064afd6398520226569988275e13e2847fd0
+size 1489921

onnx/schp-pascal-7.onnx.data ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:67cfa8f2399a68d7e0e955fb70a0ff57ddf79063cd1d7d5130a3859601f8ef04
+size 266665984

preprocessor_config.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "auto_map": {
+    "AutoImageProcessor": "image_processing_schp.SCHPImageProcessor"
+  },
+  "image_mean": [
+    0.406,
+    0.456,
+    0.485
+  ],
+  "image_processor_type": "SCHPImageProcessor",
+  "image_std": [
+    0.225,
+    0.224,
+    0.229
+  ],
+  "size": {
+    "height": 512,
+    "width": 512
+  }
+}