---
library_name: transformers
tags:
- image-segmentation
- pathology
- dpt
pipeline_tag: image-segmentation
---

# Lung structures Segmentation (DPT)

Pathology segmentation for lung structures (blood vessels and airways). 
- Encoder (freezed): H-optimus-0 ViT backbone (pretrained on histopathology data).
- Decoder (trained): custom DPT head with multi-scale feature fusion.

## Usage

The model expects a normalized `(B, 3, H, W)` float tensor as `pixel_values`.
Use ImageNet mean/std — same stats applied at training time (matches the
H-optimus-0 backbone's expected input distribution).

Input image: 224x224 @ 1.5 MPP

```python
import numpy as np
import torch
from PIL import Image
from torchvision.transforms import ToTensor, Normalize, Resize, Compose
from transformers import AutoModel

model = AutoModel.from_pretrained("RendeiroLab/MetPredict-lung-structure-segmentation", trust_remote_code=True).eval()
device = next(model.parameters()).device

transform = Compose([
    ToTensor(),
    Resize((224, 224)),
    Normalize(
        mean=[0.485, 0.456, 0.406], 
        std=[0.229, 0.224, 0.225]
    ),
])

img = Image.open("tile.png").convert("RGB")
x = transform(img)
pixel_values = x.unsqueeze(0).to(device)

with torch.inference_mode():
    out = model(pixel_values)
logits = out.logits                                    # (1, n_classes, H, W)
pred = logits.argmax(dim=1)                            # (1, H, W)
```