Spaces:

shriarul5273
/

model-optimization-lab

Sleeping

App Files Files Community

shriarul5273 commited on Nov 26, 2025

Commit

1e97e9e

1 Parent(s): 34a90ad

Added segmentation models for pruning and quantization

Browse files

Files changed (5) hide show

README.md +42 -18
app.py +910 -16
examples/ADE_val_00000001.jpg +0 -0
examples/ADE_val_00000002.jpg +0 -0
requirements.txt +3 -0

README.md CHANGED Viewed

@@ -11,19 +11,29 @@ pinned: false
 # Model Optimization Lab
-Interactive Gradio playground for comparing pruning and quantization on ImageNet-classification backbones. Upload any image and observe how latency, confidence, and model size change when applying different compression recipes. Pretrained weights are loaded by default; set `MODEL_OPT_PRETRAINED=0` if you want random initialization for experimentation.
 ## Features
-- Baseline FP32 inference using cached backbones (ResNet-50, MobileNetV3, EfficientNet-B0, etc.).
-- Pruning tab: structured/unstructured pruning with configurable sparsity and size/latency comparison.
-- Quantization tab: dynamic, weight-only INT8, and FP16 passes with CPU-safe fallbacks for unsupported kernels.
-- Automated metric tables and Top-5 bar charts to visualize confidence shifts between optimized variants.
 - Lightweight CLI mode for quick experiments without launching the UI.
 ## Requirements
 - Python 3.9+
 - PyTorch with CPU support (GPU optional but recommended for FP16 experiments).
-- The packages listed in `requirements.txt` or installed via `pip install -r requirements.txt` (create the file if missing with entries like `torch`, `timm`, `gradio`, `pandas`, `torchvision`).
 ## Quick Start
 1. Clone the repository:
@@ -43,32 +53,42 @@ Interactive Gradio playground for comparing pruning and quantization on ImageNet
 5. Open the local Gradio URL (printed in the terminal) in your browser.
 ## Using the App
-1. **Upload an image** or pick one of the provided examples.
-2. Choose the **Base Model** dropdown (ResNet-50, MobileNetV3-Large, EfficientNet-B0, ConvNeXt-Tiny, ViT-B/16, RegNetY-016, EfficientNet-Lite0).
 3. Pick a **Hardware Preset** or keep `custom`:
 	- Edge CPU — CPU, channels-last off, dynamic quantization, 30% pruning.
 	- Datacenter GPU — CUDA, channels-last on, `torch.compile`, FP16 quantization, 20% pruning.
 	- Apple MPS — MPS, FP16 quantization, 20% pruning.
-4. Pick a tab and set options, then click **Run**.
-### Pruning tab options
 - `Pruning Method`: `structured` (LN-structured) or `unstructured` (L1). Applied to Conv2d weights before export.
 - `Pruning Amount`: 0.1–0.9 sparsity. Higher numbers zero more weights; latency impact depends on kernel support.
 - `Device`: `auto` picks CUDA → MPS → CPU. Channels-last is only honored on CUDA.
 - `Channels-last input (CUDA)`: Converts tensors to channels-last for better CUDA kernel throughput.
 - `Mixed precision (AMP)`: Enables CUDA autocast for FP16/FP32 mixes.
 - `Torch compile (PyTorch 2)`: Wraps the pruned model in `torch.compile` when available.
-- Exports: TorchScript (`pruned_model.ts`), ONNX (`pruned_model.onnx`), JSON report, always saves `pruned_state_dict.pth`.
-- Outputs: comparison metrics, Top-5 bar chart, per-layer sparsity table, download list of artifacts.
-### Quantization tab options
 - `Quantization Type`: `dynamic`/`weight_only` (INT8 linear layers on CPU), or `fp16` (casts model to half precision).
 - `Device`: `auto` picks CUDA → MPS → CPU; dynamic/weight-only runs force CPU execution for kernel support.
 - `Channels-last input (CUDA)`: Same as pruning; ignored on CPU.
 - `Mixed precision (AMP)`: Applies CUDA autocast to the quantized forward pass.
 - `Torch compile (PyTorch 2)`: Compiles the quantized model when available.
-- Exports: TorchScript (`quantized_model.ts`), ONNX (`quantized_model.onnx`), JSON report, always saves `quantized_state_dict.pth`.
-- Outputs: comparison metrics, Top-5 bar chart, download list of artifacts.
 ### What gets exported
 - Artifacts are written to `exports/`. JSON reports include the chosen options, metrics, and Top-5 results for both the baseline and optimized variants.
@@ -76,7 +96,10 @@ Interactive Gradio playground for comparing pruning and quantization on ImageNet
 - State dicts are always saved for reproducibility; disable or prune them manually if you are embedding this module elsewhere.
 ### Output Interpreting Tips
-- **Top-1 Prediction**: Labels come from ImageNet synsets, so some entries include multiple comma-separated synonyms (e.g., `chambered nautilus, pearly nautilus`).
 - **Latency (ms)**: Includes the reported inference latency for each pass. Large numbers for quantized runs may indicate preprocessing overhead rather than faster model execution—see [Performance Notes](#performance-notes).
 - **Model Size (MB)**: Serialized state dictionary size after saving to disk.
@@ -87,10 +110,11 @@ Interactive Gradio playground for comparing pruning and quantization on ImageNet
 - FP16 inference is beneficial on GPUs. On CPU, PyTorch often casts half tensors back to float32, introducing overhead.
 ## Extending the Lab
-- Swap in different architectures by changing the `timm.create_model` call in `app.py`.
 - Add calibration data and static INT8 quantization to include convolution layers.
 - Cache optimized models to avoid recomputation across requests.
-- Integrate evaluation datasets to quantify accuracy drop beyond top-1 confidence.
 ## CLI Mode
 - Run without the UI: `python app.py --cli --image path/to/img.jpg --mode prune --model resnet50 --device auto`

 # Model Optimization Lab
+Interactive Gradio playground for comparing pruning and quantization on both ImageNet-classification and ADE20K-segmentation models. Upload any image and observe how latency, confidence, model size, and segmentation quality change when applying different compression recipes. Pretrained weights are loaded by default; set `MODEL_OPT_PRETRAINED=0` if you want random initialization for experimentation.
 ## Features
+- **Classification Tasks**: Baseline FP32 inference using cached backbones (ResNet-50, MobileNetV3, EfficientNet-B0, ConvNeXt-Tiny, ViT-B/16, RegNetY-016, EfficientNet-Lite0).
+- **Segmentation Tasks**: Pretrained ADE20K models (SegFormer B0/B4, DPT Large, UPerNet ConvNeXt-Tiny) with 150-class semantic segmentation.
+- **Pruning tabs**: Structured/unstructured pruning with configurable sparsity and comprehensive size/latency comparison for both classification and segmentation.
+- **Quantization tabs**: Dynamic, weight-only INT8, and FP16 passes with CPU-safe fallbacks for unsupported kernels, available for both task types.
+- **Visual Comparisons**:
+  - Classification: Automated metric tables and Top-5 bar charts to visualize confidence shifts.
+  - Segmentation: Image sliders for overlay/mask comparisons, class distribution tables, and mask agreement metrics.
+- **Export Options**: TorchScript, ONNX, JSON reports, and state dictionaries for all optimization variants.
 - Lightweight CLI mode for quick experiments without launching the UI.
 ## Requirements
 - Python 3.9+
 - PyTorch with CPU support (GPU optional but recommended for FP16 experiments).
+- The packages listed in `requirements.txt`:
+  - `torch`, `torchvision` - Core PyTorch framework
+  - `timm` - Classification model architectures
+  - `segmentation-models-pytorch` - Segmentation model architectures
+  - `albumentations` - Image preprocessing for segmentation models
+  - `gradio` - Web UI framework
+  - `pandas`, `matplotlib`, `numpy`, `pillow` - Data processing and visualization
 ## Quick Start
 1. Clone the repository:
 5. Open the local Gradio URL (printed in the terminal) in your browser.
 ## Using the App
+1. **Upload an image** or pick one of the provided examples (ImageNet samples for classification, ADE20K validation images for segmentation).
+2. Choose the **Base Model** dropdown:
+   - **Classification**: ResNet-50, MobileNetV3-Large, EfficientNet-B0, ConvNeXt-Tiny, ViT-B/16, RegNetY-016, EfficientNet-Lite0
+   - **Segmentation**: SegFormer B0/B4 (ADE20K 512x512), DPT Large (ADE20K), UPerNet ConvNeXt-Tiny (ADE20K)
 3. Pick a **Hardware Preset** or keep `custom`:
 	- Edge CPU — CPU, channels-last off, dynamic quantization, 30% pruning.
 	- Datacenter GPU — CUDA, channels-last on, `torch.compile`, FP16 quantization, 20% pruning.
 	- Apple MPS — MPS, FP16 quantization, 20% pruning.
+4. Select a tab (Pruning-Classification, Quantization-Classification, Pruning-Segmentation, or Quantization-Segmentation), configure options, then click **Run**.
+### Pruning tab options (Classification & Segmentation)
 - `Pruning Method`: `structured` (LN-structured) or `unstructured` (L1). Applied to Conv2d weights before export.
 - `Pruning Amount`: 0.1–0.9 sparsity. Higher numbers zero more weights; latency impact depends on kernel support.
 - `Device`: `auto` picks CUDA → MPS → CPU. Channels-last is only honored on CUDA.
 - `Channels-last input (CUDA)`: Converts tensors to channels-last for better CUDA kernel throughput.
 - `Mixed precision (AMP)`: Enables CUDA autocast for FP16/FP32 mixes.
 - `Torch compile (PyTorch 2)`: Wraps the pruned model in `torch.compile` when available.
+- **Exports**:
+  - Classification: `pruned_model.ts`, `pruned_model.onnx`, `pruned_report.json`, `pruned_state_dict.pth`
+  - Segmentation: `pruned_seg_model.ts`, `pruned_seg_model.onnx`, `pruned_seg_report.json`, `pruned_seg_state_dict.pth`
+- **Outputs**:
+  - Classification: Comparison metrics, Top-5 bar chart, per-layer sparsity table, download list
+  - Segmentation: Comparison metrics, class distribution table, overlay/mask sliders, per-layer sparsity table, download list
+### Quantization tab options (Classification & Segmentation)
 - `Quantization Type`: `dynamic`/`weight_only` (INT8 linear layers on CPU), or `fp16` (casts model to half precision).
 - `Device`: `auto` picks CUDA → MPS → CPU; dynamic/weight-only runs force CPU execution for kernel support.
 - `Channels-last input (CUDA)`: Same as pruning; ignored on CPU.
 - `Mixed precision (AMP)`: Applies CUDA autocast to the quantized forward pass.
 - `Torch compile (PyTorch 2)`: Compiles the quantized model when available.
+- **Exports**:
+  - Classification: `quantized_model.ts`, `quantized_model.onnx`, `quant_report.json`, `quantized_state_dict.pth`
+  - Segmentation: `quant_seg_model.ts`, `quant_seg_model.onnx`, `quant_seg_report.json`, `quant_seg_state_dict.pth`
+- **Outputs**:
+  - Classification: Comparison metrics, Top-5 bar chart, download list
+  - Segmentation: Comparison metrics, class distribution table, overlay/mask sliders, download list
 ### What gets exported
 - Artifacts are written to `exports/`. JSON reports include the chosen options, metrics, and Top-5 results for both the baseline and optimized variants.
 - State dicts are always saved for reproducibility; disable or prune them manually if you are embedding this module elsewhere.
 ### Output Interpreting Tips
+- **Top-1 Prediction (Classification)**: Labels come from ImageNet synsets, so some entries include multiple comma-separated synonyms (e.g., `chambered nautilus, pearly nautilus`).
+- **Mask Agreement (Segmentation)**: Percentage of pixels where original and optimized models predict the same class. 100% means identical masks; lower values indicate divergence.
+- **Class Distribution (Segmentation)**: Shows the top 25 most prevalent classes by pixel coverage, with percentages and counts for both models.
+- **Image Sliders (Segmentation)**: Drag the slider to compare original vs. optimized overlays or raw masks side-by-side.
 - **Latency (ms)**: Includes the reported inference latency for each pass. Large numbers for quantized runs may indicate preprocessing overhead rather than faster model execution—see [Performance Notes](#performance-notes).
 - **Model Size (MB)**: Serialized state dictionary size after saving to disk.
 - FP16 inference is beneficial on GPUs. On CPU, PyTorch often casts half tensors back to float32, introducing overhead.
 ## Extending the Lab
+- **Classification**: Swap in different architectures by changing the `timm.create_model` call in `app.py`.
+- **Segmentation**: Add new models from the [smp-hub](https://huggingface.co/smp-hub) collection by adding entries to `SEGMENTATION_MODEL_CONFIGS`.
 - Add calibration data and static INT8 quantization to include convolution layers.
 - Cache optimized models to avoid recomputation across requests.
+- Integrate evaluation datasets to quantify accuracy drop (classification: top-1/top-5, segmentation: mIoU, pixel accuracy).
 ## CLI Mode
 - Run without the UI: `python app.py --cli --image path/to/img.jpg --mode prune --model resnet50 --device auto`

app.py CHANGED Viewed

@@ -4,6 +4,7 @@ import json
 import os
 import time
 from pathlib import Path
 import matplotlib.pyplot as plt
 import gradio as gr
@@ -13,8 +14,13 @@ import timm
 import torch
 import torch.nn as nn
 import torch.nn.utils.prune as prune
-from PIL import Image
 from torchvision import transforms
 # ---------------------------------------------
@@ -60,6 +66,103 @@ PRESETS = {
 _MODEL_CACHE: dict[str, torch.nn.Module] = {}
 _TRANSFORM_CACHE: dict[str, transforms.Compose] = {}
 def select_device(device_str: str) -> torch.device:
     """Return a valid torch.device based on user selection."""
@@ -116,6 +219,267 @@ def clone_model(model_name: str):
     return fresh
 # ---------------------------------------------
 # Image Preprocess
 # ---------------------------------------------
@@ -514,10 +878,314 @@ def run_quantized(
     print("=== RUN QUANTIZED COMPLETE ===")
     return metrics_df, chart_fig, downloads
 # ---------------------------------------------
 # GRADIO UI
 # ---------------------------------------------
 examples = [["examples/cat.jpg"], ["examples/dog.jpg"], ["examples/bird.jpg"], ["examples/car.jpg"], ["examples/elephant.jpg"]]
 def create_demo():
@@ -530,10 +1198,11 @@ def create_demo():
         if getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
             device_opts.append("mps")
         preset_opts = list(PRESETS.keys()) + ["custom"]
         with gr.Tabs():
             # ---- PRUNING TAB ----
-            with gr.Tab("Pruning"):
                 with gr.Row():
                     with gr.Column():
                         img_p = gr.Image(label="Upload Image")
@@ -551,13 +1220,34 @@ def create_demo():
                         btn_p = gr.Button("Run Pruned Model")
                         gr.Examples(examples=examples, inputs=img_p)
                         gr.Markdown(
-                            "**Option Guide**\n"
-                            "- Base Model: select the timm architecture to optimize (pretrained when available).\n"
-                            "- Hardware Preset: load device, precision, and pruning defaults for common targets; choose custom to tweak manually.\n"
-                            "- Pruning Method/Amount: set structured vs unstructured pruning and the fraction of weights removed.\n"
-                            "- Device & CUDA Toggles: force CPU/CUDA/MPS and optionally enable channels-last or AMP for CUDA speedups.\n"
-                            "- Torch compile: wrap the model with torch.compile (PyTorch 2) to experiment with graph optimizations.\n"
-                            "- Export options: drop TorchScript, ONNX, and JSON reports into the `exports/` directory."
                         )
                     with gr.Column():
@@ -587,7 +1277,7 @@ def create_demo():
                 )
             # ---- QUANTIZATION TAB ----
-            with gr.Tab("Quantization"):
                 with gr.Row():
                     with gr.Column():
                         img_q = gr.Image(label="Upload Image")
@@ -604,12 +1294,30 @@ def create_demo():
                         btn_q = gr.Button("Run Quantized Model")
                         gr.Examples(examples=examples, inputs=img_q)
                         gr.Markdown(
-                            "**Option Guide**\n"
-                            "- Base Model & Preset: pick the architecture and optional hardware profile to prefill device and quant settings.\n"
-                            "- Quantization Type: `dynamic` applies post-training int8 to linear layers (forces CPU kernels), `weight_only` stores int8 weights with fp32 activations for a lighter CPU model, while `fp16` casts the full network to half precision for GPUs with native fp16 support.\n"
-                            "- Device & CUDA Toggles: run on CPU/CUDA/MPS; channels-last and AMP only benefit CUDA workloads.\n"
-                            "- Torch compile: try PyTorch 2 compile for extra speed when supported.\n"
-                            "- Export options: generate TorchScript, ONNX, and JSON artifacts inside `exports/`."
                         )
@@ -637,6 +1345,192 @@ def create_demo():
                     outputs=[metrics_q, chart_q, downloads_q],
                 )
         return demo

 import os
 import time
 from pathlib import Path
+from dataclasses import dataclass
 import matplotlib.pyplot as plt
 import gradio as gr
 import torch
 import torch.nn as nn
 import torch.nn.utils.prune as prune
+import segmentation_models_pytorch as smp
+from PIL import Image, ImageDraw, ImageFont
 from torchvision import transforms
+try:
+    import albumentations as A
+except ModuleNotFoundError:  # pragma: no cover - optional dependency
+    A = None
 # ---------------------------------------------
 _MODEL_CACHE: dict[str, torch.nn.Module] = {}
 _TRANSFORM_CACHE: dict[str, transforms.Compose] = {}
+@dataclass(frozen=True)
+class SegmentationModelConfig:
+    name: str
+    checkpoint: str
+    classes: int = 150
+    dataset: str = "ADE20K"
+SEGMENTATION_MODEL_CONFIGS: tuple[SegmentationModelConfig, ...] = (
+    SegmentationModelConfig("SegFormer B0 (ADE20K 512x512)", "smp-hub/segformer-b0-512x512-ade-160k"),
+    SegmentationModelConfig("SegFormer B4 (ADE20K 512x512)", "smp-hub/segformer-b4-512x512-ade-160k"),
+    SegmentationModelConfig("DPT Large (ADE20K)", "smp-hub/dpt-large-ade20k"),
+    SegmentationModelConfig("UPerNet ConvNeXt-Tiny (ADE20K)", "smp-hub/upernet-convnext-tiny"),
+)
+SEGMENTATION_MODEL_MAP = {cfg.name: cfg for cfg in SEGMENTATION_MODEL_CONFIGS}
+_SEG_BASE_PALETTE = np.array(
+    [
+        [0, 0, 0],
+        [0, 114, 189],
+        [217, 83, 25],
+        [237, 177, 32],
+        [126, 47, 142],
+        [119, 172, 48],
+        [77, 190, 238],
+        [162, 20, 47],
+        [163, 200, 236],
+        [255, 127, 14],
+        [255, 188, 121],
+        [111, 118, 207],
+        [204, 121, 167],
+        [148, 103, 189],
+        [44, 160, 44],
+        [23, 190, 207],
+        [31, 119, 180],
+        [255, 152, 150],
+        [214, 39, 40],
+        [188, 189, 34],
+    ],
+    dtype=np.uint8,
+)
+_SEG_MODEL_CACHE: dict[str, torch.nn.Module] = {}
+_SEG_TRANSFORM_CACHE: dict[str, object] = {}
+_SEG_PALETTE_CACHE: dict[int, np.ndarray] = {}
+ADE20K_CLASS_NAMES = [
+    "wall", "building", "sky", "floor", "tree", "ceiling", "road", "bed", "windowpane", "grass",
+    "cabinet", "sidewalk", "person", "earth", "door", "table", "mountain", "plant", "curtain", "chair",
+    "car", "water", "painting", "sofa", "shelf", "house", "sea", "mirror", "rug", "field",
+    "armchair", "seat", "fence", "desk", "rock", "wardrobe", "lamp", "bathtub", "railing", "cushion",
+    "base", "box", "column", "signboard", "chest of drawers", "counter", "sand", "sink", "skyscraper", "fireplace",
+    "refrigerator", "grandstand", "path", "stairs", "runway", "case", "pool table", "pillow", "screen door", "stairway",
+    "river", "bridge", "bookcase", "blind", "coffee table", "toilet", "flower", "book", "hill", "bench",
+    "countertop", "stove", "palm", "kitchen island", "computer", "swivel chair", "boat", "bar", "arcade machine", "hovel",
+    "bus", "towel", "light", "truck", "tower", "chandelier", "awning", "streetlight", "booth", "television receiver",
+    "airplane", "dirt track", "apparel", "pole", "land", "bannister", "escalator", "ottoman", "bottle", "buffet",
+    "poster", "stage", "van", "ship", "fountain", "conveyer belt", "canopy", "washer", "plaything", "swimming pool",
+    "stool", "barrel", "basket", "waterfall", "tent", "bag", "minibike", "cradle", "oven", "ball",
+    "food", "step", "tank", "trade name", "microwave", "pot", "animal", "bicycle", "lake", "dishwasher",
+    "screen", "blanket", "sculpture", "hood", "sconce", "vase", "traffic light", "tray", "ashcan", "fan",
+    "pier", "crt screen", "plate", "monitor", "bulletin board", "shower", "radiator", "glass", "clock", "flag"
+]
+def add_image_label(img: Image.Image, label: str) -> Image.Image:
+    """Add a text label at the top of an image."""
+    img_array = np.array(img)
+    h, w = img_array.shape[:2]
+    # Create canvas with extra space at top for label
+    canvas = np.ones((h + 40, w, 3), dtype=np.uint8) * 255
+    canvas[40:, :] = img_array
+    # Convert back to PIL for text drawing
+    canvas_img = Image.fromarray(canvas)
+    draw = ImageDraw.Draw(canvas_img)
+    # Try to use a nice font, fall back to default if not available
+    try:
+        font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans-Bold.ttf", 20)
+    except:
+        try:
+            font = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", 20)
+        except:
+            font = ImageFont.load_default()
+    # Get text size and center it
+    bbox = draw.textbbox((0, 0), label, font=font)
+    text_width = bbox[2] - bbox[0]
+    text_x = (w - text_width) // 2
+    # Draw text
+    draw.text((text_x, 10), label, fill=(0, 0, 0), font=font)
+    return canvas_img
 def select_device(device_str: str) -> torch.device:
     """Return a valid torch.device based on user selection."""
     return fresh
+# ---------------------------------------------
+# Segmentation Utilities
+# ---------------------------------------------
+def _require_albumentations():
+    if A is None:
+        raise RuntimeError(
+            "Albumentations is required for pretrained segmentation models. "
+            "Install it with `pip install albumentations` or add it to your environment."
+        )
+def get_segmentation_model(config: SegmentationModelConfig) -> nn.Module:
+    key = config.checkpoint
+    if key not in _SEG_MODEL_CACHE:
+        model = smp.from_pretrained(config.checkpoint).eval()
+        _SEG_MODEL_CACHE[key] = model
+    return _SEG_MODEL_CACHE[key]
+def clone_segmentation_model(config: SegmentationModelConfig) -> nn.Module:
+    base = get_segmentation_model(config)
+    fresh = smp.from_pretrained(config.checkpoint).eval()
+    fresh.load_state_dict(base.state_dict())
+    return fresh
+def get_segmentation_transform(config: SegmentationModelConfig):
+    key = config.checkpoint
+    if key in _SEG_TRANSFORM_CACHE:
+        return _SEG_TRANSFORM_CACHE[key]
+    _require_albumentations()
+    try:
+        preprocessing = A.Compose.from_pretrained(config.checkpoint)
+    except Exception as exc:  # pragma: no cover - depends on network availability
+        raise RuntimeError(f"Failed to load preprocessing pipeline for {config.checkpoint}: {exc}") from exc
+    def _transform(image):
+        if image is None:
+            raise ValueError("No image provided")
+        if not isinstance(image, Image.Image):
+            if isinstance(image, np.ndarray):
+                array = image
+                if array.dtype != np.uint8:
+                    array = (np.clip(array, 0, 1) * 255).astype(np.uint8)
+                image_rgb = Image.fromarray(array)
+            else:
+                raise ValueError(f"Unsupported image type: {type(image)}")
+        else:
+            image_rgb = image
+        image_rgb = image_rgb.convert("RGB")
+        np_image = np.array(image_rgb)
+        processed = preprocessing(image=np_image)["image"]
+        if isinstance(processed, torch.Tensor):
+            processed_np = processed.detach().cpu().numpy()
+        else:
+            processed_np = np.asarray(processed, dtype=np.float32)
+        tensor = torch.from_numpy(processed_np.transpose(2, 0, 1)).float()
+        return tensor, image_rgb
+    _SEG_TRANSFORM_CACHE[key] = _transform
+    return _transform
+def get_segmentation_palette(class_count: int) -> np.ndarray:
+    if class_count in _SEG_PALETTE_CACHE:
+        return _SEG_PALETTE_CACHE[class_count]
+    base_len = len(_SEG_BASE_PALETTE)
+    if class_count <= base_len:
+        palette = _SEG_BASE_PALETTE[:class_count]
+    else:
+        palette = np.zeros((class_count, 3), dtype=np.uint8)
+        palette[:base_len] = _SEG_BASE_PALETTE
+        rng = np.random.default_rng(1337)
+        palette[base_len:] = rng.integers(0, 256, size=(class_count - base_len, 3), endpoint=False, dtype=np.uint8)
+        palette[:, 0] |= 1  # ensure colors are not pure black except index 0
+        palette[0] = np.array([0, 0, 0], dtype=np.uint8)
+    _SEG_PALETTE_CACHE[class_count] = palette
+    return palette
+def colorize_mask(mask: np.ndarray, class_count: int) -> Image.Image:
+    if mask.ndim != 2:
+        raise ValueError("Mask must be 2D for colorization")
+    palette = get_segmentation_palette(class_count)
+    indexed = np.mod(mask, class_count)
+    colored = palette[indexed]
+    return Image.fromarray(colored.astype(np.uint8))
+def overlay_mask(image: Image.Image, mask_image: Image.Image, alpha: float = 0.5) -> Image.Image:
+    base = np.array(image.convert("RGB"), dtype=np.float32)
+    mask_resized = mask_image.resize(image.size, Image.NEAREST)
+    mask_arr = np.array(mask_resized, dtype=np.float32)
+    blended = (1.0 - alpha) * base + alpha * mask_arr
+    return Image.fromarray(np.clip(blended, 0, 255).astype(np.uint8))
+def summarize_mask(mask: np.ndarray, class_count: int) -> list[dict[str, float]]:
+    flat = mask.reshape(-1)
+    counts = np.bincount(flat, minlength=class_count)
+    total = float(flat.size)
+    summary = []
+    for idx in range(class_count):
+        count = int(counts[idx])
+        percent = (count / total * 100.0) if total else 0.0
+        summary.append({"index": idx, "count": count, "percent": percent})
+    return summary
+def get_class_labels(config: SegmentationModelConfig) -> list[str]:
+    # Try to get labels from model metadata first
+    model = get_segmentation_model(config)
+    meta = getattr(model, "meta", {}) or {}
+    dataset_meta = meta.get("dataset", {}) or {}
+    labels = dataset_meta.get("class_names") or dataset_meta.get("classes_names")
+    # If not in metadata, use dataset-specific labels
+    if not labels:
+        if config.dataset == "ADE20K" and config.classes == 150:
+            labels = ADE20K_CLASS_NAMES
+        else:
+            labels = [f"Class {idx}" for idx in range(config.classes)]
+    else:
+        labels = list(labels)
+    # Ensure we have the right number of labels
+    if len(labels) < config.classes:
+        labels.extend(f"Class {len(labels) + i}" for i in range(config.classes - len(labels)))
+    return labels[: config.classes]
+def run_segmentation_inference(
+    model: nn.Module,
+    image,
+    device: torch.device,
+    transform_fn,
+    channels_last: bool,
+    warmup: bool,
+    use_amp: bool,
+    class_count: int,
+):
+    tensor, original_image = transform_fn(image)
+    model = model.to(device)
+    input_tensor = tensor.unsqueeze(0).to(device)
+    if channels_last and device.type == "cuda":
+        input_tensor = input_tensor.to(memory_format=torch.channels_last)
+    if next(model.parameters()).dtype == torch.float16:
+        input_tensor = input_tensor.half()
+    if warmup:
+        with torch.no_grad():
+            model(input_tensor)
+    amp_ctx = torch.cuda.amp.autocast(enabled=use_amp and device.type == "cuda")
+    start = time.time()
+    with torch.no_grad(), amp_ctx:
+        logits = model(input_tensor)
+    latency = (time.time() - start) * 1000
+    if isinstance(logits, (list, tuple)):
+        logits = logits[0]
+    logits = logits.detach().cpu()
+    probs = torch.softmax(logits, dim=1)
+    mask_tensor = torch.argmax(probs, dim=1)[0]
+    mask_processed = mask_tensor.cpu().numpy().astype(np.int64)
+    mean_conf = float(probs.max(dim=1)[0].mean().item())
+    mask_processed_image = colorize_mask(mask_processed, class_count)
+    mask_original_l = Image.fromarray(mask_processed.astype(np.uint8), mode="L").resize(original_image.size, Image.NEAREST)
+    mask_original_np = np.array(mask_original_l, dtype=np.int64)
+    mask_original_image = colorize_mask(mask_original_np, class_count)
+    overlay_original = overlay_mask(original_image, mask_original_image)
+    class_summary = summarize_mask(mask_original_np, class_count)
+    return {
+        "latency": latency,
+        "mask_processed": mask_processed,
+        "mask_original": mask_original_np,
+        "mask_image_processed": mask_processed_image,
+        "mask_image_original": mask_original_image,
+        "overlay_original": overlay_original,
+        "mean_confidence": mean_conf,
+        "class_summary": class_summary,
+    }
+def build_segmentation_metrics(
+    original_result: dict,
+    optimized_result: dict,
+    size_original: float,
+    size_optimized: float,
+    optimized_label: str,
+) -> pd.DataFrame:
+    mask_original = original_result["mask_original"]
+    mask_optimized = optimized_result["mask_original"]
+    agreement = float((mask_original == mask_optimized).mean() * 100.0)
+    metrics_df = pd.DataFrame(
+        {
+            "Metric": [
+                "Latency (ms)",
+                "Mean Confidence",
+                "Model Size (MB)",
+                "Mask Agreement (%)",
+            ],
+            "Original Model": [
+                f"{original_result['latency']:.2f}",
+                f"{original_result['mean_confidence']:.4f}",
+                f"{size_original:.2f}",
+                "100.00",
+            ],
+            optimized_label: [
+                f"{optimized_result['latency']:.2f}",
+                f"{optimized_result['mean_confidence']:.4f}",
+                f"{size_optimized:.2f}",
+                f"{agreement:.2f}",
+            ],
+        }
+    )
+    return metrics_df
+def build_class_distribution_df(
+    original_summary: list[dict[str, float]],
+    optimized_summary: list[dict[str, float]],
+    labels: list[str],
+    optimized_label: str,
+    max_rows: int = 25,
+) -> pd.DataFrame:
+    rows = []
+    for idx, label in enumerate(labels):
+        orig_entry = original_summary[idx]
+        opt_entry = optimized_summary[idx]
+        if orig_entry["count"] == 0 and opt_entry["count"] == 0:
+            continue
+        rows.append(
+            {
+                "Class": label,
+                "Original %": round(orig_entry["percent"], 2),
+                f"{optimized_label} %": round(opt_entry["percent"], 2),
+                "Original Pixels": orig_entry["count"],
+                f"{optimized_label} Pixels": opt_entry["count"],
+            }
+        )
+    rows.sort(key=lambda item: max(item["Original %"], item[f"{optimized_label} %"]), reverse=True)
+    if max_rows and len(rows) > max_rows:
+        rows = rows[:max_rows]
+    return pd.DataFrame(rows)
 # ---------------------------------------------
 # Image Preprocess
 # ---------------------------------------------
     print("=== RUN QUANTIZED COMPLETE ===")
     return metrics_df, chart_fig, downloads
+def run_pruned_segmentation(
+    img,
+    model_choice,
+    method,
+    amount,
+    device_choice="auto",
+    channels_last=False,
+    use_compile=False,
+    use_amp=False,
+    export_ts=False,
+    export_onnx=False,
+    export_report=False,
+    export_state=True,
+    preset=None,
+):
+    print("\n=== RUN SEGMENTATION PRUNED CALLED ===")
+    if img is None:
+        print("ERROR: Image is None")
+        empty_metrics = pd.DataFrame({"Metric": ["Error"], "Original Model": ["No image"], "Pruned Model": [""]})
+        empty_dist = pd.DataFrame({"Class": [], "Original %": [], "Pruned %": []})
+        return empty_metrics, empty_dist, None, None, pd.DataFrame(), []
+    config = SEGMENTATION_MODEL_MAP.get(model_choice, SEGMENTATION_MODEL_CONFIGS[0])
+    if preset in PRESETS:
+        preset_cfg = PRESETS[preset]
+        device_choice = preset_cfg["device"]
+        channels_last = preset_cfg["channels_last"]
+        use_compile = preset_cfg["compile"]
+        use_amp = preset_cfg.get("amp", use_amp)
+        amount = preset_cfg.get("prune_amount", amount)
+    device = select_device(device_choice)
+    base_model = get_segmentation_model(config)
+    transform_fn = get_segmentation_transform(config)
+    class_labels = get_class_labels(config)
+    class_count = config.classes
+    original_result = run_segmentation_inference(
+        base_model,
+        img,
+        device,
+        transform_fn,
+        channels_last=channels_last,
+        warmup=True,
+        use_amp=use_amp,
+        class_count=class_count,
+    )
+    fresh_model = clone_segmentation_model(config)
+    pruned_model = apply_pruning(fresh_model, amount=float(amount), method=method)
+    pruned_model = maybe_compile(pruned_model, use_compile)
+    pruned_result = run_segmentation_inference(
+        pruned_model,
+        img,
+        device,
+        transform_fn,
+        channels_last=channels_last,
+        warmup=True,
+        use_amp=use_amp,
+        class_count=class_count,
+    )
+    size_orig = get_state_dict_size_mb(base_model)
+    size_pruned = get_state_dict_size_mb(pruned_model)
+    metrics_df = build_segmentation_metrics(original_result, pruned_result, size_orig, size_pruned, "Pruned Model")
+    class_df = build_class_distribution_df(
+        original_result["class_summary"],
+        pruned_result["class_summary"],
+        class_labels,
+        "Pruned",
+    )
+    # Add labels to images for slider comparison
+    overlay_orig_labeled = add_image_label(original_result["overlay_original"], "Original Model")
+    overlay_pruned_labeled = add_image_label(pruned_result["overlay_original"], "Pruned Model")
+    mask_orig_labeled = add_image_label(original_result["mask_image_original"], "Original Mask")
+    mask_pruned_labeled = add_image_label(pruned_result["mask_image_original"], "Pruned Mask")
+    overlay_slider_value = (
+        overlay_orig_labeled,
+        overlay_pruned_labeled,
+    )
+    mask_slider_value = (
+        mask_orig_labeled,
+        mask_pruned_labeled,
+    )
+    sparsity_df = compute_sparsity(pruned_model.cpu())
+    downloads: list[str] = []
+    export_dir = Path("exports")
+    export_dir.mkdir(exist_ok=True)
+    if export_report:
+        report_path = export_dir / "pruned_seg_report.json"
+        report = {
+            "model": config.name,
+            "checkpoint": config.checkpoint,
+            "dataset": config.dataset,
+            "pruning": {"method": method, "amount": float(amount)},
+            "metrics": metrics_df.to_dict(),
+            "class_distribution": class_df.to_dict(),
+        }
+        report_path.write_text(json.dumps(report, indent=2))
+        downloads.append(str(report_path))
+    if export_state:
+        state_path = export_dir / "pruned_seg_state_dict.pth"
+        torch.save(pruned_model.state_dict(), state_path)
+        downloads.append(str(state_path))
+    sample_tensor, _ = transform_fn(img)
+    sample_batch = sample_tensor.unsqueeze(0)
+    if export_ts:
+        ts_path = export_dir / "pruned_seg_model.ts"
+        try:
+            scripted = torch.jit.trace(pruned_model.cpu(), sample_batch)
+            scripted.save(ts_path)
+            downloads.append(str(ts_path))
+        except Exception as exc:  # pragma: no cover - export best effort
+            print(f"TorchScript export failed: {exc}")
+    if export_onnx:
+        onnx_path = export_dir / "pruned_seg_model.onnx"
+        try:
+            torch.onnx.export(
+                pruned_model.cpu(),
+                sample_batch,
+                onnx_path,
+                input_names=["input"],
+                output_names=["mask"],
+                opset_version=13,
+                dynamic_axes={"input": {0: "batch"}, "mask": {0: "batch"}},
+            )
+            downloads.append(str(onnx_path))
+        except Exception as exc:  # pragma: no cover - export best effort
+            print(f"ONNX export failed: {exc}")
+    return (
+        metrics_df,
+        class_df,
+        overlay_slider_value,
+        mask_slider_value,
+        sparsity_df,
+        downloads,
+    )
+def run_quantized_segmentation(
+    img,
+    model_choice,
+    q_type,
+    device_choice="auto",
+    channels_last=False,
+    use_compile=False,
+    use_amp=False,
+    export_ts=False,
+    export_onnx=False,
+    export_report=False,
+    export_state=True,
+    preset=None,
+):
+    print("\n=== RUN SEGMENTATION QUANTIZED CALLED ===")
+    if img is None:
+        print("ERROR: Image is None")
+        empty_metrics = pd.DataFrame({"Metric": ["Error"], "Original Model": ["No image"], "Quantized Model": [""]})
+        empty_dist = pd.DataFrame({"Class": [], "Original %": [], "Quantized %": []})
+        return empty_metrics, empty_dist, None, None, []
+    config = SEGMENTATION_MODEL_MAP.get(model_choice, SEGMENTATION_MODEL_CONFIGS[0])
+    if preset in PRESETS:
+        preset_cfg = PRESETS[preset]
+        device_choice = preset_cfg["device"]
+        channels_last = preset_cfg["channels_last"]
+        use_compile = preset_cfg["compile"]
+        use_amp = preset_cfg.get("amp", use_amp)
+        q_type = preset_cfg.get("quant", q_type)
+    device = select_device(device_choice)
+    if q_type in {"dynamic", "weight_only"} and device.type != "cpu":
+        print("Dynamic quantization runs on CPU; switching device to CPU.")
+        device = torch.device("cpu")
+        channels_last = False
+        use_amp = False
+    base_model = get_segmentation_model(config)
+    transform_fn = get_segmentation_transform(config)
+    class_labels = get_class_labels(config)
+    class_count = config.classes
+    original_result = run_segmentation_inference(
+        base_model,
+        img,
+        device,
+        transform_fn,
+        channels_last=channels_last,
+        warmup=True,
+        use_amp=use_amp,
+        class_count=class_count,
+    )
+    fresh_model = clone_segmentation_model(config)
+    quant_model = apply_quantization(fresh_model, q_type)
+    quant_model = maybe_compile(quant_model, use_compile)
+    quant_result = run_segmentation_inference(
+        quant_model,
+        img,
+        device,
+        transform_fn,
+        channels_last=channels_last,
+        warmup=True,
+        use_amp=use_amp,
+        class_count=class_count,
+    )
+    size_orig = get_state_dict_size_mb(base_model)
+    size_quant = get_state_dict_size_mb(quant_model)
+    metrics_df = build_segmentation_metrics(original_result, quant_result, size_orig, size_quant, "Quantized Model")
+    class_df = build_class_distribution_df(
+        original_result["class_summary"],
+        quant_result["class_summary"],
+        class_labels,
+        "Quantized",
+    )
+    # Add labels to images for slider comparison
+    overlay_orig_labeled = add_image_label(original_result["overlay_original"], "Original Model")
+    overlay_quant_labeled = add_image_label(quant_result["overlay_original"], "Quantized Model")
+    mask_orig_labeled = add_image_label(original_result["mask_image_original"], "Original Mask")
+    mask_quant_labeled = add_image_label(quant_result["mask_image_original"], "Quantized Mask")
+    overlay_slider_value = (
+        overlay_orig_labeled,
+        overlay_quant_labeled,
+    )
+    mask_slider_value = (
+        mask_orig_labeled,
+        mask_quant_labeled,
+    )
+    downloads: list[str] = []
+    export_dir = Path("exports")
+    export_dir.mkdir(exist_ok=True)
+    if export_report:
+        report_path = export_dir / "quant_seg_report.json"
+        report = {
+            "model": config.name,
+            "checkpoint": config.checkpoint,
+            "dataset": config.dataset,
+            "quantization": q_type,
+            "metrics": metrics_df.to_dict(),
+            "class_distribution": class_df.to_dict(),
+        }
+        report_path.write_text(json.dumps(report, indent=2))
+        downloads.append(str(report_path))
+    if export_state:
+        state_path = export_dir / "quant_seg_state_dict.pth"
+        torch.save(quant_model.state_dict(), state_path)
+        downloads.append(str(state_path))
+    sample_tensor, _ = transform_fn(img)
+    sample_batch = sample_tensor.unsqueeze(0)
+    if export_ts:
+        ts_path = export_dir / "quant_seg_model.ts"
+        try:
+            scripted = torch.jit.trace(quant_model.cpu(), sample_batch)
+            scripted.save(ts_path)
+            downloads.append(str(ts_path))
+        except Exception as exc:  # pragma: no cover - export best effort
+            print(f"TorchScript export failed: {exc}")
+    if export_onnx:
+        onnx_path = export_dir / "quant_seg_model.onnx"
+        try:
+            torch.onnx.export(
+                quant_model.cpu(),
+                sample_batch,
+                onnx_path,
+                input_names=["input"],
+                output_names=["mask"],
+                opset_version=13,
+                dynamic_axes={"input": {0: "batch"}, "mask": {0: "batch"}},
+            )
+            downloads.append(str(onnx_path))
+        except Exception as exc:  # pragma: no cover - export best effort
+            print(f"ONNX export failed: {exc}")
+    return (
+        metrics_df,
+        class_df,
+        overlay_slider_value,
+        mask_slider_value,
+        downloads,
+    )
 # ---------------------------------------------
 # GRADIO UI
 # ---------------------------------------------
 examples = [["examples/cat.jpg"], ["examples/dog.jpg"], ["examples/bird.jpg"], ["examples/car.jpg"], ["examples/elephant.jpg"]]
+ade_examples = [["examples/ADE_val_00000001.jpg"], ["examples/ADE_val_00000002.jpg"], ["examples/ADE_val_00001001.jpg"], ["examples/ADE_val_00001842.jpg"]]
 def create_demo():
         if getattr(torch.backends, "mps", None) and torch.backends.mps.is_available():
             device_opts.append("mps")
         preset_opts = list(PRESETS.keys()) + ["custom"]
+        seg_model_options = [cfg.name for cfg in SEGMENTATION_MODEL_CONFIGS]
         with gr.Tabs():
             # ---- PRUNING TAB ----
+            with gr.Tab("Pruning-Classification"):
                 with gr.Row():
                     with gr.Column():
                         img_p = gr.Image(label="Upload Image")
                         btn_p = gr.Button("Run Pruned Model")
                         gr.Examples(examples=examples, inputs=img_p)
                         gr.Markdown(
+                            "### 📚 Classification Pruning Guide\n\n"
+                            "**What is Pruning?**\n"
+                            "Pruning removes less important weights from neural networks to reduce model size and potentially improve inference speed. "
+                            "This tab applies pruning to ImageNet classification models.\n\n"
+                            "**Options Explained:**\n"
+                            "- **Base Model**: Select from 7 pretrained architectures (ResNet-50, MobileNetV3, EfficientNet-B0, ConvNeXt-Tiny, ViT-Base, RegNetY-016, EfficientNet-Lite0). Each has different size/accuracy tradeoffs.\n"
+                            "- **Hardware Preset**: Quick configurations for common deployment scenarios:\n"
+                            "  - *Edge CPU*: Optimized for resource-constrained devices (CPU-only, 30% pruning, dynamic quantization)\n"
+                            "  - *Datacenter GPU*: Maximum performance on modern GPUs (CUDA, channels-last, compile, 20% pruning)\n"
+                            "  - *Apple MPS*: Tuned for Apple Silicon (M1/M2/M3 chips with Metal Performance Shaders)\n"
+                            "  - *Custom*: Manual control over all settings\n"
+                            "- **Pruning Method**:\n"
+                            "  - *Structured*: Removes entire filters/channels; better hardware support and actual speedups\n"
+                            "  - *Unstructured*: Zeros individual weights; higher compression but needs specialized sparse kernels for speedup\n"
+                            "- **Pruning Amount**: Percentage of weights to remove (0.1 = 10%, 0.9 = 90%). Higher values = smaller model but potential accuracy loss.\n"
+                            "- **Device**: Inference hardware (auto-detects best available: CUDA → MPS → CPU)\n"
+                            "- **Channels-last (CUDA only)**: Memory layout optimization for faster convolution operations on NVIDIA GPUs\n"
+                            "- **Mixed Precision (AMP)**: Uses FP16 where safe, FP32 where needed; faster on modern GPUs with Tensor Cores\n"
+                            "- **Torch Compile**: PyTorch 2.0+ graph optimization; can provide 20-40% speedup but adds compilation overhead\n\n"
+                            "**Export Options:**\n"
+                            "- *TorchScript*: Serialized model for C++ deployment or production serving\n"
+                            "- *ONNX*: Cross-framework format (TensorRT, OpenVINO, ONNX Runtime, CoreML)\n"
+                            "- *JSON Report*: Detailed metrics, settings, and Top-5 predictions for both models\n"
+                            "- *State Dict*: Always saved; PyTorch checkpoint for loading pruned weights later\n\n"
+                            "**Reading the Results:**\n"
+                            "- *Comparison Metrics*: Side-by-side accuracy, speed, and size\n"
+                            "- *Top-5 Chart*: Visual comparison of prediction confidence across models\n"
+                            "- *Layer Sparsity*: Per-layer breakdown showing which parts were pruned most"
                         )
                     with gr.Column():
                 )
             # ---- QUANTIZATION TAB ----
+            with gr.Tab("Quantization-Classification"):
                 with gr.Row():
                     with gr.Column():
                         img_q = gr.Image(label="Upload Image")
                         btn_q = gr.Button("Run Quantized Model")
                         gr.Examples(examples=examples, inputs=img_q)
                         gr.Markdown(
+                            "### 📚 Classification Quantization Guide\n\n"
+                            "**What is Quantization?**\n"
+                            "Quantization reduces model precision from 32-bit floats to lower bit-widths (INT8, FP16), decreasing memory usage and "
+                            "enabling faster inference on hardware with specialized low-precision instructions.\n\n"
+                            "**Options Explained:**\n"
+                            "- **Base Model**: Choose from 7 pretrained ImageNet classifiers with varying complexity.\n"
+                            "- **Hardware Preset**: Same presets as pruning tab, but with quantization-specific defaults.\n"
+                            "- **Quantization Type**:\n"
+                            "  - *Dynamic*: Post-training INT8 quantization on linear layers; activations quantized dynamically at runtime. **Forces CPU** (PyTorch's INT8 kernels are CPU-only). Best for transformers and MLP-heavy models.\n"
+                            "  - *Weight-only*: Stores weights as INT8, computes in FP32. Reduces memory bandwidth, smaller model files. **CPU-optimized**.\n"
+                            "  - *FP16*: Half-precision floating point; requires GPU with FP16 support (CUDA, MPS). Minimal accuracy loss, ~2x speedup on modern GPUs.\n"
+                            "- **Device**: Hardware target (dynamic/weight-only auto-switch to CPU for kernel compatibility)\n"
+                            "- **Channels-last**: CUDA memory layout optimization (ignored on CPU)\n"
+                            "- **Mixed Precision (AMP)**: Can combine with FP16 quantization on GPUs\n"
+                            "- **Torch Compile**: Graph-level optimizations from PyTorch 2.0+\n\n"
+                            "**Export Options:** Same as pruning (TorchScript, ONNX, JSON report, state dict)\n\n"
+                            "**Important Notes:**\n"
+                            "⚠️ Dynamic/weight-only quantization automatically uses CPU even if GPU is selected (PyTorch limitation)\n"
+                            "⚠️ ResNet-50 and similar CNN-heavy models see modest INT8 speedups because only linear layers are quantized\n"
+                            "⚠️ FP16 on CPU often reverts to FP32 internally, adding overhead instead of speedup\n\n"
+                            "**Reading the Results:**\n"
+                            "- *Latency*: Dynamic quantization may show higher latency due to runtime overhead; production deployments should use cached models\n"
+                            "- *Model Size*: FP16 ≈ 50% reduction, INT8 dynamic ≈ 75% reduction (varies by architecture)\n"
+                            "- *Accuracy*: Watch for confidence drops; quantization can shift predictions slightly"
                         )
                     outputs=[metrics_q, chart_q, downloads_q],
                 )
+            # ---- SEGMENTATION PRUNING TAB ----
+            with gr.Tab("Pruning-Segmentation"):
+                with gr.Row():
+                    with gr.Column():
+                        img_sp = gr.Image(label="Upload Image")
+                        model_sp = gr.Dropdown(seg_model_options, value=seg_model_options[0], label="Pretrained ADE20K Model")
+                        preset_sp = gr.Dropdown(preset_opts, value="custom", label="Hardware Preset")
+                        method_sp = gr.Dropdown(["unstructured", "structured"], value="structured", label="Pruning Method")
+                        amount_sp = gr.Slider(minimum=0.1, maximum=0.9, step=0.1, value=0.4, label="Pruning Amount")
+                        device_sp = gr.Dropdown(device_opts, value=device_opts[0], label="Device")
+                        channels_last_sp = gr.Checkbox(label="Channels-last input (CUDA)", value=True)
+                        compile_sp = gr.Checkbox(label="Torch compile (PyTorch 2)")
+                        amp_sp = gr.Checkbox(label="Mixed precision (AMP)", value=True)
+                        export_ts_sp = gr.Checkbox(label="Export TorchScript")
+                        export_onnx_sp = gr.Checkbox(label="Export ONNX")
+                        export_report_sp = gr.Checkbox(label="Export JSON report", value=True)
+                        btn_sp = gr.Button("Run Segmentation Pruning")
+                        gr.Examples(examples=ade_examples, inputs=img_sp, label="ADE20K Samples")
+                        gr.Markdown(
+                            "### 🎨 Segmentation Pruning Guide\n\n"
+                            "**What is Semantic Segmentation?**\n"
+                            "Semantic segmentation assigns a class label to every pixel in an image (e.g., sky, road, person, car). "
+                            "This tab uses ADE20K-pretrained models that recognize 150 scene categories.\n\n"
+                            "**Available Models:**\n"
+                            "- **SegFormer B0** (512x512): Lightweight transformer-based segmenter; efficient for edge deployment\n"
+                            "- **SegFormer B4** (512x512): Larger variant with better accuracy; ~4x B0 parameters\n"
+                            "- **DPT Large**: Vision-transformer-based dense prediction; state-of-the-art accuracy but slower\n"
+                            "- **UPerNet ConvNeXt-Tiny**: Unified perceptual parsing with modern CNN backbone; balanced speed/accuracy\n\n"
+                            "**Segmentation-Specific Options:**\n"
+                            "- All pruning/device/compile options work the same as classification\n"
+                            "- Models use [smp-hub](https://huggingface.co/smp-hub) pretrained checkpoints via `segmentation-models-pytorch`\n"
+                            "- Preprocessing pipelines are model-specific (loaded from Hugging Face metadata)\n"
+                            "- Images are resized based on model training resolution (usually 512x512 or 640x640)\n\n"
+                            "**Understanding Segmentation Outputs:**\n"
+                            "1. **Comparison Metrics Table**:\n"
+                            "   - *Latency*: Inference time for full-image segmentation\n"
+                            "   - *Mean Confidence*: Average softmax probability across all pixels\n"
+                            "   - *Model Size*: State dict size in MB\n"
+                            "   - *Mask Agreement*: % of pixels with identical class predictions (100% = perfect match)\n"
+                            "2. **Class Distribution Table**:\n"
+                            "   - Top 25 most prevalent classes by pixel coverage\n"
+                            "   - Shows percentage and pixel counts for both models\n"
+                            "   - Helps identify which objects dominate the scene\n"
+                            "3. **Overlay Comparison Slider**:\n"
+                            "   - Original image blended with colored segmentation masks\n"
+                            "   - Drag slider to compare original vs. pruned predictions\n"
+                            "   - Colors map to specific ADE20K classes (150 categories)\n"
+                            "4. **Mask Comparison Slider**:\n"
+                            "   - Raw segmentation masks without image overlay\n"
+                            "   - Easier to spot subtle prediction differences\n"
+                            "5. **Layer Sparsity Table**:\n"
+                            "   - Per-layer pruning statistics showing compression levels\n\n"
+                            "**Export Options:**\n"
+                            "Files saved with `_seg` suffix: `pruned_seg_model.ts`, `pruned_seg_report.json`, etc.\n\n"
+                            "**Tips:**\n"
+                            "- Use ADE20K validation images (provided examples) for meaningful class diversity\n"
+                            "- High mask agreement (>95%) indicates pruning preserved segmentation quality\n"
+                            "- Check class distribution to ensure dominant objects aren't misclassified\n"
+                            "- Structured pruning typically maintains better segmentation quality than unstructured"
+                        )
+                    with gr.Column():
+                        metrics_sp = gr.Dataframe(label="📊 Comparison Metrics")
+                        class_sp = gr.Dataframe(label="📈 Class Distribution")
+                        overlay_slider_sp = gr.ImageSlider(label="Overlay Comparison", type="pil")
+                        mask_slider_sp = gr.ImageSlider(label="Mask Comparison", type="pil")
+                        sparsity_sp = gr.Dataframe(label="Layer sparsity (%)")
+                        downloads_sp = gr.Files(label="Exports (state_dict / TorchScript / ONNX / report)")
+                btn_sp.click(
+                    fn=run_pruned_segmentation,
+                    inputs=[
+                        img_sp,
+                        model_sp,
+                        method_sp,
+                        amount_sp,
+                        device_sp,
+                        channels_last_sp,
+                        compile_sp,
+                        amp_sp,
+                        export_ts_sp,
+                        export_onnx_sp,
+                        export_report_sp,
+                        gr.State(True),
+                        preset_sp,
+                    ],
+                    outputs=[
+                        metrics_sp,
+                        class_sp,
+                        overlay_slider_sp,
+                        mask_slider_sp,
+                        sparsity_sp,
+                        downloads_sp,
+                    ],
+                )
+            # ---- SEGMENTATION QUANTIZATION TAB ----
+            with gr.Tab("Quantization-Segmentation"):
+                with gr.Row():
+                    with gr.Column():
+                        img_sq = gr.Image(label="Upload Image")
+                        model_sq = gr.Dropdown(seg_model_options, value=seg_model_options[0], label="Pretrained ADE20K Model")
+                        preset_sq = gr.Dropdown(preset_opts, value="custom", label="Hardware Preset")
+                        q_type_sq = gr.Dropdown(["dynamic", "weight_only", "fp16"], value="dynamic", label="Quantization Type")
+                        device_sq = gr.Dropdown(device_opts, value=device_opts[0], label="Device")
+                        channels_last_sq = gr.Checkbox(label="Channels-last input (CUDA)", value=True)
+                        compile_sq = gr.Checkbox(label="Torch compile (PyTorch 2)")
+                        amp_sq = gr.Checkbox(label="Mixed precision (AMP)", value=True)
+                        export_ts_sq = gr.Checkbox(label="Export TorchScript")
+                        export_onnx_sq = gr.Checkbox(label="Export ONNX")
+                        export_report_sq = gr.Checkbox(label="Export JSON report", value=True)
+                        btn_sq = gr.Button("Run Segmentation Quantization")
+                        gr.Examples(examples=ade_examples, inputs=img_sq, label="ADE20K Samples")
+                        gr.Markdown(
+                            "### 🎨 Segmentation Quantization Guide\n\n"
+                            "**Quantization for Dense Prediction:**\n"
+                            "Semantic segmentation models are typically larger and slower than classifiers, making quantization especially valuable. "
+                            "This tab applies the same quantization techniques as classification but evaluates pixel-level prediction quality.\n\n"
+                            "**Available Models & Quantization:**\n"
+                            "- **SegFormer B0/B4**: Transformer-based; dynamic quantization helps with attention/MLP layers (CPU-only)\n"
+                            "- **DPT Large**: Vision-transformer backbone; benefits significantly from FP16 on GPU (~2x speedup)\n"
+                            "- **UPerNet ConvNeXt-Tiny**: CNN-based; FP16 quantization provides best GPU acceleration\n\n"
+                            "**Quantization Type Selection:**\n"
+                            "- **Dynamic/Weight-only**: ⚠️ Automatically uses CPU (PyTorch INT8 limitation). Best for:  \n"
+                            "  - Transformer-heavy models (SegFormer, DPT)\n"
+                            "  - CPU-only deployment scenarios\n"
+                            "  - Memory-constrained environments\n"
+                            "- **FP16**: Recommended for GPU deployment (CUDA, MPS). Provides:\n"
+                            "  - ~2x inference speedup on modern GPUs\n"
+                            "  - 50% memory reduction\n"
+                            "  - Minimal segmentation quality loss (<1% mIoU typically)\n\n"
+                            "**Segmentation-Specific Metrics:**\n"
+                            "1. **Mask Agreement**: Critical metric for segmentation; >95% is good, >98% is excellent\n"
+                            "2. **Mean Confidence**: Should remain similar; large drops indicate quantization instability\n"
+                            "3. **Class Distribution**: Compare pixel percentages; mismatches show which objects are affected\n\n"
+                            "**Understanding the Outputs:**\n"
+                            "- **Overlay Slider**: Drag to compare original vs. quantized predictions on the actual image\n"
+                            "- **Mask Slider**: Raw segmentation masks for detailed comparison\n"
+                            "- **Class Distribution**: Top 25 classes help identify systematic errors (e.g., 'road' → 'sidewalk' confusion)\n\n"
+                            "**Performance Expectations:**\n"
+                            "- **FP16 on CUDA**: Expect 1.5-2x speedup with <1% accuracy loss\n"
+                            "- **Dynamic on CPU**: Model size ↓ 75%, latency may increase (first-run overhead)\n"
+                            "- **Weight-only on CPU**: Model size ↓ 50%, latency similar to FP32\n\n"
+                            "**Export Options:**\n"
+                            "Files saved with `_seg` suffix: `quant_seg_model.onnx`, `quant_seg_state_dict.pth`, etc.\n\n"
+                            "**Best Practices:**\n"
+                            "✓ Use FP16 for GPU deployment (CUDA, MPS)\n"
+                            "✓ Use dynamic quantization for CPU-bound transformer models\n"
+                            "✓ Check mask agreement before deploying; <90% needs investigation\n"
+                            "✓ Validate on multiple images; some scenes may be more sensitive to quantization\n"
+                            "✗ Avoid FP16 on CPU (performance penalty, not benefit)\n"
+                            "✗ Don't expect large speedups from dynamic quantization on CNN-heavy models (most layers are Conv2d, not Linear)"
+                        )
+                    with gr.Column():
+                        metrics_sq = gr.Dataframe(label="📊 Comparison Metrics")
+                        class_sq = gr.Dataframe(label="📈 Class Distribution")
+                        overlay_slider_sq = gr.ImageSlider(label="Overlay Comparison", type="pil")
+                        mask_slider_sq = gr.ImageSlider(label="Mask Comparison", type="pil")
+                        downloads_sq = gr.Files(label="Exports (state_dict / TorchScript / ONNX / report)")
+                btn_sq.click(
+                    fn=run_quantized_segmentation,
+                    inputs=[
+                        img_sq,
+                        model_sq,
+                        q_type_sq,
+                        device_sq,
+                        channels_last_sq,
+                        compile_sq,
+                        amp_sq,
+                        export_ts_sq,
+                        export_onnx_sq,
+                        export_report_sq,
+                        gr.State(True),
+                        preset_sq,
+                    ],
+                    outputs=[
+                        metrics_sq,
+                        class_sq,
+                        overlay_slider_sq,
+                        mask_slider_sq,
+                        downloads_sq,
+                    ],
+                )
         return demo

examples/ADE_val_00000001.jpg ADDED Viewed

examples/ADE_val_00000002.jpg ADDED Viewed

requirements.txt CHANGED Viewed

@@ -2,6 +2,9 @@
 torch>=2.2.0
 torchvision>=0.17.0
 timm>=0.9.12
 # UI
 gradio>=4.19.2

 torch>=2.2.0
 torchvision>=0.17.0
 timm>=0.9.12
+segmentation-models-pytorch>=0.3.3
+huggingface-hub>=0.23.0
+albumentations>=1.4.8
 # UI
 gradio>=4.19.2