---
license: mit
pipeline_tag: image-classification
---
## Model Details

### Model Description

MorphEm is a self supervised learning framework trained with the DINO Bag of Channels recipe on the entire CHAMMI-75 dataset. 
It serves as a benchmark for performance for self-supervised models.

- **Developed by:** Vidit Agrawal, John Peters, Juan Caicedo
- **Shared by:** [Caicedo Lab](https://morgridge.org/research/labs/caicedo/)
- **Model type:** Vision Transformer Small
- **License:** MIT License

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/CaicedoLab/CHAMMI-75
<!-- - **Paper** -->
- **Demo:** https://github.com/CaicedoLab/CHAMMI-75/tree/main/aws-tutorials

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoModel
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms as v2
import numpy as np

# Noise Injector transformation
class SaturationNoiseInjector(nn.Module):
    def __init__(self, low=200, high=255):
        super().__init__()
        self.low = low
        self.high = high

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        channel = x[0].clone()
        noise = torch.empty_like(channel).uniform_(self.low, self.high)
        mask = (channel == 255).float()
        noise_masked = noise * mask
        channel[channel == 255] = 0
        channel = channel + noise_masked
        x[0] = channel
        return x


# Self Normalize transformation
class PerImageNormalize(nn.Module):
    def __init__(self, eps=1e-7):
        super().__init__()
        self.eps = eps
        self.instance_norm = nn.InstanceNorm2d(
            num_features=1,
            affine=False,
            track_running_stats=False,
            eps=self.eps,
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if x.dim() == 3:
            x = x.unsqueeze(0)
        x = self.instance_norm(x)
        if x.shape[0] == 1:
            x = x.squeeze(0)
        return x


# Load model
device = "cuda"
model = AutoModel.from_pretrained("CaicedoLab/MorphEm", trust_remote_code=True)
model.to(device).eval()

# Define transforms
transform = v2.Compose([
    SaturationNoiseInjector(),
    PerImageNormalize(),
    v2.Resize(size=(224, 224), antialias=True),
])

# Generate random batch (N, C, H, W)
batch_size = 2
num_channels = 3
images = torch.randint(0, 256, (batch_size, num_channels, 512, 512), dtype=torch.float32)

print(f"Input shape: {images.shape} (N={batch_size}, C={num_channels}, H=512, W=512)")
print()

# Bag of Channels (BoC) - process each channel independently
with torch.no_grad():
    batch_feat = []
    images = images.to(device)
    
    for c in range(images.shape[1]):
        # Extract single channel: (N, C, H, W) -> (N, 1, H, W)
        single_channel = images[:, c, :, :].unsqueeze(1)
        
        # Apply transforms
        single_channel = transform(single_channel.squeeze(1)).unsqueeze(1)
        
        # Extract features
        output = model.forward_features(single_channel)
        feat_temp = output["x_norm_clstoken"].cpu().detach().numpy()
        batch_feat.append(feat_temp)

# Concatenate features from all channels
features = np.concatenate(batch_feat, axis=1)

print(f"Output shape: {features.shape}")
print(f"  - Batch size (N): {features.shape[0]}")
print(f"  - Feature dimension (C * feature_dim): {features.shape[1]}")
```


## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

MorphEm was pre-trained on the entire CHAMMI-75 pre-training data. 
The CHAMMI-75 dataset consists of 75 heterogenous studies and 2.8 million multi-channel images. 

### Training Procedure

We have utilized the self-supervised learning framework called DINO. We pre-trained a model which inputs a single channel one at a time. For evaluation, you would concatenate each channel specifically.

#### Preprocessing

We used three transforms mainly for preprocessing: SaturationNoiseInjector(), SelfImageNormalize(), Resize(224,224)

```python
# Noise Injector transformation
class SaturationNoiseInjector(nn.Module):
    def __init__(self, low=200, high=255):
        super().__init__()
        self.low = low
        self.high = high

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        channel = x[0].clone()
        noise = torch.empty_like(channel).uniform_(self.low, self.high)
        mask = (channel == 255).float()
        noise_masked = noise * mask
        channel[channel == 255] = 0
        channel = channel + noise_masked
        x[0] = channel
        return x


# Self Normalize transformation
class PerImageNormalize(nn.Module):
    def __init__(self, eps=1e-7):
        super().__init__()
        self.eps = eps
        self.instance_norm = nn.InstanceNorm2d(
            num_features=1,
            affine=False,
            track_running_stats=False,
            eps=self.eps,
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if x.dim() == 3:
            x = x.unsqueeze(0)
        x = self.instance_norm(x)
        if x.shape[0] == 1:
            x = x.squeeze(0)
        return x
```


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
We have evaluated this model on 6 different benchmarks. The model is highly competitive in most of them. The benchmarks are listed below:

1. CHAMMI
2. HPAv23
3. Jump-CP
4. IDR0017
5. CELLPHIE
6. RBC-MC

More details can be found in the paper: 

#### Summary

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

- **Hardware Type:** Nvidia RTX A6000
- **Hours used:** 2352
- **Cloud Provider:** Private Infrastructure
- **Compute Region:** Private Infrastructure
- **Carbon Emitted:** 304 kg CO2

## Technical Specifications


The model is a ViT Small trained on 2500 Nvidia A6000 GPU hours. The model was trained on a multi-node system with 2 nodes, each containing 7 GPUs.

## Citation

Can be cited as the following:


<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

<!-- **BibTeX:** -->


<!-- **APA:** -->

## Model Card Authors

Vidit Agrawal, John Peters, Juan C. Caicedo

## Model Card Contact

vagrawal22@wisc.edu, jgpeters3@wisc.edu, juan.caicedo@wisc.edu