|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: image-classification |
|
|
--- |
|
|
## Model Details |
|
|
|
|
|
### Model Description |
|
|
|
|
|
MorphEm is a self supervised learning framework trained with the DINO Bag of Channels recipe on the entire CHAMMI-75 dataset. |
|
|
It serves as a benchmark for performance for self-supervised models. |
|
|
|
|
|
- **Developed by:** Vidit Agrawal, John Peters, Juan Caicedo |
|
|
- **Shared by:** [Caicedo Lab](https://morgridge.org/research/labs/caicedo/) |
|
|
- **Model type:** Vision Transformer Small |
|
|
- **License:** MIT License |
|
|
|
|
|
### Model Sources |
|
|
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
|
|
- **Repository:** https://github.com/CaicedoLab/CHAMMI-75 |
|
|
<!-- - **Paper** --> |
|
|
- **Demo:** https://github.com/CaicedoLab/CHAMMI-75/tree/main/aws-tutorials |
|
|
|
|
|
## Uses |
|
|
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
|
|
### Direct Use |
|
|
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
### Out-of-Scope Use |
|
|
|
|
|
<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> |
|
|
|
|
|
[More Information Needed] |
|
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
|
|
Use the code below to get started with the model. |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
import torch |
|
|
import torch.nn as nn |
|
|
import torchvision |
|
|
from torchvision import transforms as v2 |
|
|
import numpy as np |
|
|
|
|
|
# Noise Injector transformation |
|
|
class SaturationNoiseInjector(nn.Module): |
|
|
def __init__(self, low=200, high=255): |
|
|
super().__init__() |
|
|
self.low = low |
|
|
self.high = high |
|
|
|
|
|
def forward(self, x: torch.Tensor) -> torch.Tensor: |
|
|
channel = x[0].clone() |
|
|
noise = torch.empty_like(channel).uniform_(self.low, self.high) |
|
|
mask = (channel == 255).float() |
|
|
noise_masked = noise * mask |
|
|
channel[channel == 255] = 0 |
|
|
channel = channel + noise_masked |
|
|
x[0] = channel |
|
|
return x |
|
|
|
|
|
|
|
|
# Self Normalize transformation |
|
|
class PerImageNormalize(nn.Module): |
|
|
def __init__(self, eps=1e-7): |
|
|
super().__init__() |
|
|
self.eps = eps |
|
|
self.instance_norm = nn.InstanceNorm2d( |
|
|
num_features=1, |
|
|
affine=False, |
|
|
track_running_stats=False, |
|
|
eps=self.eps, |
|
|
) |
|
|
|
|
|
def forward(self, x: torch.Tensor) -> torch.Tensor: |
|
|
if x.dim() == 3: |
|
|
x = x.unsqueeze(0) |
|
|
x = self.instance_norm(x) |
|
|
if x.shape[0] == 1: |
|
|
x = x.squeeze(0) |
|
|
return x |
|
|
|
|
|
|
|
|
# Load model |
|
|
device = "cuda" |
|
|
model = AutoModel.from_pretrained("CaicedoLab/MorphEm", trust_remote_code=True) |
|
|
model.to(device).eval() |
|
|
|
|
|
# Define transforms |
|
|
transform = v2.Compose([ |
|
|
SaturationNoiseInjector(), |
|
|
PerImageNormalize(), |
|
|
v2.Resize(size=(224, 224), antialias=True), |
|
|
]) |
|
|
|
|
|
# Generate random batch (N, C, H, W) |
|
|
batch_size = 2 |
|
|
num_channels = 3 |
|
|
images = torch.randint(0, 256, (batch_size, num_channels, 512, 512), dtype=torch.float32) |
|
|
|
|
|
print(f"Input shape: {images.shape} (N={batch_size}, C={num_channels}, H=512, W=512)") |
|
|
print() |
|
|
|
|
|
# Bag of Channels (BoC) - process each channel independently |
|
|
with torch.no_grad(): |
|
|
batch_feat = [] |
|
|
images = images.to(device) |
|
|
|
|
|
for c in range(images.shape[1]): |
|
|
# Extract single channel: (N, C, H, W) -> (N, 1, H, W) |
|
|
single_channel = images[:, c, :, :].unsqueeze(1) |
|
|
|
|
|
# Apply transforms |
|
|
single_channel = transform(single_channel.squeeze(1)).unsqueeze(1) |
|
|
|
|
|
# Extract features |
|
|
output = model.forward_features(single_channel) |
|
|
feat_temp = output["x_norm_clstoken"].cpu().detach().numpy() |
|
|
batch_feat.append(feat_temp) |
|
|
|
|
|
# Concatenate features from all channels |
|
|
features = np.concatenate(batch_feat, axis=1) |
|
|
|
|
|
print(f"Output shape: {features.shape}") |
|
|
print(f" - Batch size (N): {features.shape[0]}") |
|
|
print(f" - Feature dimension (C * feature_dim): {features.shape[1]}") |
|
|
``` |
|
|
|
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
|
|
|
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
|
|
|
|
MorphEm was pre-trained on the entire CHAMMI-75 pre-training data. |
|
|
The CHAMMI-75 dataset consists of 75 heterogenous studies and 2.8 million multi-channel images. |
|
|
|
|
|
### Training Procedure |
|
|
|
|
|
We have utilized the self-supervised learning framework called DINO. We pre-trained a model which inputs a single channel one at a time. For evaluation, you would concatenate each channel specifically. |
|
|
|
|
|
#### Preprocessing |
|
|
|
|
|
We used three transforms mainly for preprocessing: SaturationNoiseInjector(), SelfImageNormalize(), Resize(224,224) |
|
|
|
|
|
```python |
|
|
# Noise Injector transformation |
|
|
class SaturationNoiseInjector(nn.Module): |
|
|
def __init__(self, low=200, high=255): |
|
|
super().__init__() |
|
|
self.low = low |
|
|
self.high = high |
|
|
|
|
|
def forward(self, x: torch.Tensor) -> torch.Tensor: |
|
|
channel = x[0].clone() |
|
|
noise = torch.empty_like(channel).uniform_(self.low, self.high) |
|
|
mask = (channel == 255).float() |
|
|
noise_masked = noise * mask |
|
|
channel[channel == 255] = 0 |
|
|
channel = channel + noise_masked |
|
|
x[0] = channel |
|
|
return x |
|
|
|
|
|
|
|
|
# Self Normalize transformation |
|
|
class PerImageNormalize(nn.Module): |
|
|
def __init__(self, eps=1e-7): |
|
|
super().__init__() |
|
|
self.eps = eps |
|
|
self.instance_norm = nn.InstanceNorm2d( |
|
|
num_features=1, |
|
|
affine=False, |
|
|
track_running_stats=False, |
|
|
eps=self.eps, |
|
|
) |
|
|
|
|
|
def forward(self, x: torch.Tensor) -> torch.Tensor: |
|
|
if x.dim() == 3: |
|
|
x = x.unsqueeze(0) |
|
|
x = self.instance_norm(x) |
|
|
if x.shape[0] == 1: |
|
|
x = x.squeeze(0) |
|
|
return x |
|
|
``` |
|
|
|
|
|
|
|
|
## Evaluation |
|
|
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
We have evaluated this model on 6 different benchmarks. The model is highly competitive in most of them. The benchmarks are listed below: |
|
|
|
|
|
1. CHAMMI |
|
|
2. HPAv23 |
|
|
3. Jump-CP |
|
|
4. IDR0017 |
|
|
5. CELLPHIE |
|
|
6. RBC-MC |
|
|
|
|
|
More details can be found in the paper: |
|
|
|
|
|
#### Summary |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> |
|
|
|
|
|
- **Hardware Type:** Nvidia RTX A6000 |
|
|
- **Hours used:** 2352 |
|
|
- **Cloud Provider:** Private Infrastructure |
|
|
- **Compute Region:** Private Infrastructure |
|
|
- **Carbon Emitted:** 304 kg CO2 |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
|
|
|
The model is a ViT Small trained on 2500 Nvidia A6000 GPU hours. The model was trained on a multi-node system with 2 nodes, each containing 7 GPUs. |
|
|
|
|
|
## Citation |
|
|
|
|
|
Can be cited as the following: |
|
|
|
|
|
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
|
|
<!-- **BibTeX:** --> |
|
|
|
|
|
|
|
|
<!-- **APA:** --> |
|
|
|
|
|
## Model Card Authors |
|
|
|
|
|
Vidit Agrawal, John Peters, Juan C. Caicedo |
|
|
|
|
|
## Model Card Contact |
|
|
|
|
|
vagrawal22@wisc.edu, jgpeters3@wisc.edu, juan.caicedo@wisc.edu |