--- license: mit pipeline_tag: image-classification --- ## Model Details ### Model Description MorphEm is a self supervised learning framework trained with the DINO Bag of Channels recipe on the entire CHAMMI-75 dataset. It serves as a benchmark for performance for self-supervised models. - **Developed by:** Vidit Agrawal, John Peters, Juan Caicedo - **Shared by:** [Caicedo Lab](https://morgridge.org/research/labs/caicedo/) - **Model type:** Vision Transformer Small - **License:** MIT License ### Model Sources - **Repository:** https://github.com/CaicedoLab/CHAMMI-75 - **Demo:** https://github.com/CaicedoLab/CHAMMI-75/tree/main/aws-tutorials ## Uses The model was pre-trained with a heterogenous dataset of microscopy images with the goal of obtaining cell morphology embeddings for biological applications. ### Direct Use The primary use of this model is feature extraction of cellular morphology in image-based biological experiments. The model takes single-channel images as input and produces feature vectors with discriminative information of cellular phenotypes. The input images should be segmented ahead of time; this model does not identify the location of cells automatically. The feature embeddings have been tested in single-cell analysis problems. If the images of interest are multi-channel, each channel can be processed independently and then the feature embeddings of all channels are concatenated for downstream applications. The applications of the embeddings produced by this model include basic biology research, functional genomics studies, drug discovery projects, among others. ### Out-of-Scope Use This model is not useful for cell segmentation, its primary use is feature extraction only. The model should be use for analyzing imaging data in biological laboratories. It is not intented to be use in clinical practice or in diagnostic applications. The model should not be used for applications that involve biological weapons, or any other type of biological manipulation that could harm humans or the natural environment. ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoModel import torch import torch.nn as nn import torchvision from torchvision import transforms as v2 import numpy as np # Noise Injector transformation class SaturationNoiseInjector(nn.Module): def __init__(self, low=200, high=255): super().__init__() self.low = low self.high = high def forward(self, x: torch.Tensor) -> torch.Tensor: channel = x[0].clone() noise = torch.empty_like(channel).uniform_(self.low, self.high) mask = (channel == 255).float() noise_masked = noise * mask channel[channel == 255] = 0 channel = channel + noise_masked x[0] = channel return x # Self Normalize transformation class PerImageNormalize(nn.Module): def __init__(self, eps=1e-7): super().__init__() self.eps = eps self.instance_norm = nn.InstanceNorm2d( num_features=1, affine=False, track_running_stats=False, eps=self.eps, ) def forward(self, x: torch.Tensor) -> torch.Tensor: if x.dim() == 3: x = x.unsqueeze(0) x = self.instance_norm(x) if x.shape[0] == 1: x = x.squeeze(0) return x # Load model device = "cuda" model = AutoModel.from_pretrained("CaicedoLab/MorphEm", trust_remote_code=True) model.to(device).eval() # Define transforms transform = v2.Compose([ SaturationNoiseInjector(), PerImageNormalize(), v2.Resize(size=(224, 224), antialias=True), ]) # Generate random batch (N, C, H, W) batch_size = 2 num_channels = 3 images = torch.randint(0, 256, (batch_size, num_channels, 512, 512), dtype=torch.float32) print(f"Input shape: {images.shape} (N={batch_size}, C={num_channels}, H=512, W=512)") print() # Bag of Channels (BoC) - process each channel independently with torch.no_grad(): batch_feat = [] images = images.to(device) for c in range(images.shape[1]): # Extract single channel: (N, C, H, W) -> (N, 1, H, W) single_channel = images[:, c, :, :].unsqueeze(1) # Apply transforms single_channel = transform(single_channel.squeeze(1)).unsqueeze(1) # Extract features output = model.forward_features(single_channel) feat_temp = output["x_norm_clstoken"].cpu().detach().numpy() batch_feat.append(feat_temp) # Concatenate features from all channels features = np.concatenate(batch_feat, axis=1) print(f"Output shape: {features.shape}") print(f" - Batch size (N): {features.shape[0]}") print(f" - Feature dimension (C * feature_dim): {features.shape[1]}") ``` ## Training Details ### Training Data MorphEm was pre-trained on the entire CHAMMI-75 pre-training data. The CHAMMI-75 dataset consists of 75 heterogenous studies and 2.8 million multi-channel images. ### Training Procedure We have utilized the self-supervised learning framework called DINO. We pre-trained a model which inputs a single channel one at a time. For evaluation, you would concatenate each channel specifically. #### Preprocessing We used three transforms mainly for preprocessing: SaturationNoiseInjector(), SelfImageNormalize(), Resize(224,224) ```python # Noise Injector transformation class SaturationNoiseInjector(nn.Module): def __init__(self, low=200, high=255): super().__init__() self.low = low self.high = high def forward(self, x: torch.Tensor) -> torch.Tensor: channel = x[0].clone() noise = torch.empty_like(channel).uniform_(self.low, self.high) mask = (channel == 255).float() noise_masked = noise * mask channel[channel == 255] = 0 channel = channel + noise_masked x[0] = channel return x # Self Normalize transformation class PerImageNormalize(nn.Module): def __init__(self, eps=1e-7): super().__init__() self.eps = eps self.instance_norm = nn.InstanceNorm2d( num_features=1, affine=False, track_running_stats=False, eps=self.eps, ) def forward(self, x: torch.Tensor) -> torch.Tensor: if x.dim() == 3: x = x.unsqueeze(0) x = self.instance_norm(x) if x.shape[0] == 1: x = x.squeeze(0) return x ``` ## Evaluation We have evaluated this model on 6 different benchmarks. The model is highly competitive in most of them. The benchmarks are listed below: 1. CHAMMI 2. HPAv23 3. Jump-CP 4. IDR0017 5. CELLPHIE 6. RBC-MC More details can be found in the paper: #### Summary ## Environmental Impact - **Hardware Type:** Nvidia RTX A6000 - **Hours used:** 2352 - **Cloud Provider:** Private Infrastructure - **Compute Region:** Private Infrastructure - **Carbon Emitted:** 304 kg CO2 ## Technical Specifications The model is a ViT Small trained on 2500 Nvidia A6000 GPU hours. The model was trained on a multi-node system with 2 nodes, each containing 7 GPUs. ## Citation Can be cited as the following: ## Model Card Authors Vidit Agrawal, John Peters, Juan C. Caicedo ## Model Card Contact vagrawal22@wisc.edu, jgpeters3@wisc.edu, juan.caicedo@wisc.edu