File size: 7,108 Bytes
f93d9fa
 
 
 
 
 
 
 
04feddf
9649db9
f93d9fa
1c4692d
c893904
1c4692d
aca2fea
f93d9fa
1c4692d
f93d9fa
 
 
 
1c4692d
aca2fea
f93d9fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36b957a
f0584a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e61d09
f0584a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f93d9fa
 
 
 
 
 
 
9649db9
 
f93d9fa
 
 
9649db9
f93d9fa
1c4692d
f93d9fa
9649db9
f93d9fa
36b957a
f0584a5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f93d9fa
 
 
9649db9
 
 
 
 
 
 
 
 
 
f93d9fa
 
 
 
 
 
 
 
9649db9
f93d9fa
 
9649db9
f93d9fa
1c4692d
f93d9fa
 
1c4692d
f93d9fa
9649db9
 
 
 
f93d9fa
 
 
1c4692d
f93d9fa
 
1c4692d
f93d9fa
1c4692d
f93d9fa
1c4692d
f93d9fa
 
 
1c4692d
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
---
license: mit
pipeline_tag: image-classification
---
## Model Details

### Model Description

MorphEm is a self supervised learning framework trained with the DINO Bag of Channels recipe on the entire CHAMMI-75 dataset. 
It serves as a benchmark for performance for self-supervised models.

- **Developed by:** Vidit Agrawal, John Peters, Juan Caicedo
- **Shared by:** [Caicedo Lab](https://morgridge.org/research/labs/caicedo/)
- **Model type:** Vision Transformer Small
- **License:** MIT License

### Model Sources

<!-- Provide the basic links for the model. -->

- **Repository:** https://github.com/CaicedoLab/CHAMMI-75
<!-- - **Paper** -->
- **Demo:** https://github.com/CaicedoLab/CHAMMI-75/tree/main/aws-tutorials

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoModel
import torch
import torch.nn as nn
import torchvision
from torchvision import transforms as v2
import numpy as np

# Noise Injector transformation
class SaturationNoiseInjector(nn.Module):
    def __init__(self, low=200, high=255):
        super().__init__()
        self.low = low
        self.high = high

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        channel = x[0].clone()
        noise = torch.empty_like(channel).uniform_(self.low, self.high)
        mask = (channel == 255).float()
        noise_masked = noise * mask
        channel[channel == 255] = 0
        channel = channel + noise_masked
        x[0] = channel
        return x


# Self Normalize transformation
class PerImageNormalize(nn.Module):
    def __init__(self, eps=1e-7):
        super().__init__()
        self.eps = eps
        self.instance_norm = nn.InstanceNorm2d(
            num_features=1,
            affine=False,
            track_running_stats=False,
            eps=self.eps,
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if x.dim() == 3:
            x = x.unsqueeze(0)
        x = self.instance_norm(x)
        if x.shape[0] == 1:
            x = x.squeeze(0)
        return x


# Load model
device = "cuda"
model = AutoModel.from_pretrained("CaicedoLab/MorphEm", trust_remote_code=True)
model.to(device).eval()

# Define transforms
transform = v2.Compose([
    SaturationNoiseInjector(),
    PerImageNormalize(),
    v2.Resize(size=(224, 224), antialias=True),
])

# Generate random batch (N, C, H, W)
batch_size = 2
num_channels = 3
images = torch.randint(0, 256, (batch_size, num_channels, 512, 512), dtype=torch.float32)

print(f"Input shape: {images.shape} (N={batch_size}, C={num_channels}, H=512, W=512)")
print()

# Bag of Channels (BoC) - process each channel independently
with torch.no_grad():
    batch_feat = []
    images = images.to(device)
    
    for c in range(images.shape[1]):
        # Extract single channel: (N, C, H, W) -> (N, 1, H, W)
        single_channel = images[:, c, :, :].unsqueeze(1)
        
        # Apply transforms
        single_channel = transform(single_channel.squeeze(1)).unsqueeze(1)
        
        # Extract features
        output = model.forward_features(single_channel)
        feat_temp = output["x_norm_clstoken"].cpu().detach().numpy()
        batch_feat.append(feat_temp)

# Concatenate features from all channels
features = np.concatenate(batch_feat, axis=1)

print(f"Output shape: {features.shape}")
print(f"  - Batch size (N): {features.shape[0]}")
print(f"  - Feature dimension (C * feature_dim): {features.shape[1]}")
```


## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

MorphEm was pre-trained on the entire CHAMMI-75 pre-training data. 
The CHAMMI-75 dataset consists of 75 heterogenous studies and 2.8 million multi-channel images. 

### Training Procedure

We have utilized the self-supervised learning framework called DINO. We pre-trained a model which inputs a single channel one at a time. For evaluation, you would concatenate each channel specifically.

#### Preprocessing

We used three transforms mainly for preprocessing: SaturationNoiseInjector(), SelfImageNormalize(), Resize(224,224)

```python
# Noise Injector transformation
class SaturationNoiseInjector(nn.Module):
    def __init__(self, low=200, high=255):
        super().__init__()
        self.low = low
        self.high = high

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        channel = x[0].clone()
        noise = torch.empty_like(channel).uniform_(self.low, self.high)
        mask = (channel == 255).float()
        noise_masked = noise * mask
        channel[channel == 255] = 0
        channel = channel + noise_masked
        x[0] = channel
        return x


# Self Normalize transformation
class PerImageNormalize(nn.Module):
    def __init__(self, eps=1e-7):
        super().__init__()
        self.eps = eps
        self.instance_norm = nn.InstanceNorm2d(
            num_features=1,
            affine=False,
            track_running_stats=False,
            eps=self.eps,
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if x.dim() == 3:
            x = x.unsqueeze(0)
        x = self.instance_norm(x)
        if x.shape[0] == 1:
            x = x.squeeze(0)
        return x
```


## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->
We have evaluated this model on 6 different benchmarks. The model is highly competitive in most of them. The benchmarks are listed below:

1. CHAMMI
2. HPAv23
3. Jump-CP
4. IDR0017
5. CELLPHIE
6. RBC-MC

More details can be found in the paper: 

#### Summary

## Environmental Impact

<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->

- **Hardware Type:** Nvidia RTX A6000
- **Hours used:** 2352
- **Cloud Provider:** Private Infrastructure
- **Compute Region:** Private Infrastructure
- **Carbon Emitted:** 304 kg CO2

## Technical Specifications


The model is a ViT Small trained on 2500 Nvidia A6000 GPU hours. The model was trained on a multi-node system with 2 nodes, each containing 7 GPUs.

## Citation

Can be cited as the following:


<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

<!-- **BibTeX:** -->


<!-- **APA:** -->

## Model Card Authors

Vidit Agrawal, John Peters, Juan C. Caicedo

## Model Card Contact

vagrawal22@wisc.edu, jgpeters3@wisc.edu, juan.caicedo@wisc.edu