--- license: mit pipeline_tag: image-classification --- ## Model Details ### Model Description MorphEm is a self supervised learning framework trained with the DINO Bag of Channels recipe on the entire CHAMMI-75 dataset. It serves as a benchmark for performance for self-supervised models. - **Developed by:** Vidit Agrawal, John Peters, Juan Caicedo - **Shared by:** [Caicedo Lab](https://morgridge.org/research/labs/caicedo/) - **Model type:** Vision Transformer Small - **License:** MIT License ### Model Sources - **Repository:** https://github.com/CaicedoLab/CHAMMI-75 - **Demo:** https://github.com/CaicedoLab/CHAMMI-75/tree/main/aws-tutorials ## Uses ### Direct Use [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModel import torch import torch.nn as nn import torchvision from torchvision import transforms as v2 import numpy as np # Noise Injector transformation class SaturationNoiseInjector(nn.Module): def __init__(self, low=200, high=255): super().__init__() self.low = low self.high = high def forward(self, x: torch.Tensor) -> torch.Tensor: channel = x[0].clone() noise = torch.empty_like(channel).uniform_(self.low, self.high) mask = (channel == 255).float() noise_masked = noise * mask channel[channel == 255] = 0 channel = channel + noise_masked x[0] = channel return x # Self Normalize transformation class PerImageNormalize(nn.Module): def __init__(self, eps=1e-7): super().__init__() self.eps = eps self.instance_norm = nn.InstanceNorm2d( num_features=1, affine=False, track_running_stats=False, eps=self.eps, ) def forward(self, x: torch.Tensor) -> torch.Tensor: if x.dim() == 3: x = x.unsqueeze(0) x = self.instance_norm(x) if x.shape[0] == 1: x = x.squeeze(0) return x # Load model device = "cuda" model = AutoModel.from_pretrained("CaicedoLab/MorphEm", trust_remote_code=True) model.to(device).eval() # Define transforms transform = v2.Compose([ SaturationNoiseInjector(), PerImageNormalize(), v2.Resize(size=(224, 224), antialias=True), ]) # Generate random batch (N, C, H, W) batch_size = 2 num_channels = 3 images = torch.randint(0, 256, (batch_size, num_channels, 512, 512), dtype=torch.float32) print(f"Input shape: {images.shape} (N={batch_size}, C={num_channels}, H=512, W=512)") print() # Bag of Channels (BoC) - process each channel independently with torch.no_grad(): batch_feat = [] images = images.to(device) for c in range(images.shape[1]): # Extract single channel: (N, C, H, W) -> (N, 1, H, W) single_channel = images[:, c, :, :].unsqueeze(1) # Apply transforms single_channel = transform(single_channel.squeeze(1)).unsqueeze(1) # Extract features output = model.forward_features(single_channel) feat_temp = output["x_norm_clstoken"].cpu().detach().numpy() batch_feat.append(feat_temp) # Concatenate features from all channels features = np.concatenate(batch_feat, axis=1) print(f"Output shape: {features.shape}") print(f" - Batch size (N): {features.shape[0]}") print(f" - Feature dimension (C * feature_dim): {features.shape[1]}") ``` ## Training Details ### Training Data MorphEm was pre-trained on the entire CHAMMI-75 pre-training data. The CHAMMI-75 dataset consists of 75 heterogenous studies and 2.8 million multi-channel images. ### Training Procedure We have utilized the self-supervised learning framework called DINO. We pre-trained a model which inputs a single channel one at a time. For evaluation, you would concatenate each channel specifically. #### Preprocessing We used three transforms mainly for preprocessing: SaturationNoiseInjector(), SelfImageNormalize(), Resize(224,224) ```python # Noise Injector transformation class SaturationNoiseInjector(nn.Module): def __init__(self, low=200, high=255): super().__init__() self.low = low self.high = high def forward(self, x: torch.Tensor) -> torch.Tensor: channel = x[0].clone() noise = torch.empty_like(channel).uniform_(self.low, self.high) mask = (channel == 255).float() noise_masked = noise * mask channel[channel == 255] = 0 channel = channel + noise_masked x[0] = channel return x # Self Normalize transformation class PerImageNormalize(nn.Module): def __init__(self, eps=1e-7): super().__init__() self.eps = eps self.instance_norm = nn.InstanceNorm2d( num_features=1, affine=False, track_running_stats=False, eps=self.eps, ) def forward(self, x: torch.Tensor) -> torch.Tensor: if x.dim() == 3: x = x.unsqueeze(0) x = self.instance_norm(x) if x.shape[0] == 1: x = x.squeeze(0) return x ``` ## Evaluation We have evaluated this model on 6 different benchmarks. The model is highly competitive in most of them. The benchmarks are listed below: 1. CHAMMI 2. HPAv23 3. Jump-CP 4. IDR0017 5. CELLPHIE 6. RBC-MC More details can be found in the paper: #### Summary ## Environmental Impact - **Hardware Type:** Nvidia RTX A6000 - **Hours used:** 2352 - **Cloud Provider:** Private Infrastructure - **Compute Region:** Private Infrastructure - **Carbon Emitted:** 304 kg CO2 ## Technical Specifications The model is a ViT Small trained on 2500 Nvidia A6000 GPU hours. The model was trained on a multi-node system with 2 nodes, each containing 7 GPUs. ## Citation Can be cited as the following: ## Model Card Authors Vidit Agrawal, John Peters, Juan C. Caicedo ## Model Card Contact vagrawal22@wisc.edu, jgpeters3@wisc.edu, juan.caicedo@wisc.edu