MOOZY: A Patient-First Foundation Model for Computational Pathology
MOOZY is a slide and patient-level foundation model for computational pathology. The patient case, not the individual slide, is the core unit of representation. A vision-only slide encoder pretrained with masked self-distillation on 77,134 public slides is aligned with clinical semantics through multi-task supervision over 333 tasks (205 classification, 128 survival) from 56 public datasets spanning 23 anatomical sites. A case transformer explicitly models dependencies across all slides from the same patient, replacing the naive early/late fusion used by prior methods. 85.77M total parameters. Trained entirely on public data.
Table of Contents
Installation
pip install moozy
The checkpoint and task definitions are downloaded automatically from this repository on first use.
Usage
From pre-computed H5 feature files
The faster path. Pass .h5 files containing patch features extracted with lunit_vit_small_patch8_dino at 224x224 patch size. Compatible with AtlasPatch and TRIDENT outputs.
moozy encode slide_1.h5 slide_2.h5 --output case_embedding.h5
From raw whole-slide images
Pass slide files directly (.svs, .tiff, .ndpi, .mrxs, etc.). MOOZY calls AtlasPatch under the hood to segment tissue, extract patches, and compute features. Requires atlas-patch, sam2, and the OpenSlide system library (see the AtlasPatch installation guide).
moozy encode slide_1.svs slide_2.svs --output case_embedding.h5 --target_mag 20
Python API
from moozy.encoding import run_encoding
# From H5 feature files
run_encoding(
slide_paths=["slide_1.h5", "slide_2.h5"],
output_path="case_embedding.h5",
)
# From raw slides
run_encoding(
slide_paths=["slide_1.svs", "slide_2.svs"],
output_path="case_embedding.h5",
target_mag=20,
)
Arguments
| Argument | Default | Description |
|---|---|---|
SLIDES |
(required) | One or more H5 feature files or raw slide files forming a single case. Cannot mix the two types. |
--output, -o |
(required) | Output H5 file path. |
--mixed_precision |
off | Enable bfloat16 mixed precision. |
--target_mag |
20 | Magnification for patch extraction from raw slides. Ignored for H5. |
--step_size |
224 | Stride between patch centers in pixels. Set < 224 for overlap. Ignored for H5. |
--mpp_csv |
- | CSV with wsi,mpp columns for microns-per-pixel overrides. Ignored for H5. |
Output format
The output H5 file contains a features dataset (768-D float32 case embedding) and a coords dataset with slide metadata.
Architecture
| Component | Architecture | Params | Output dim |
|---|---|---|---|
| Patch encoder | ViT-S/8 (Lunit DINO) | 21.67M | 384 |
| Slide encoder | ViT, 6 layers, 768-D, 12 heads, 2D ALiBi | 42.8M | 768 |
| Case transformer | 3 layers, 12 heads | 21.3M | 768 |
Tasks
This repository includes 333 task definitions in the tasks/ directory. Each task has a config.yaml (task type, organ, label mapping) and a task.csv (annotations and splits). The tasks cover 205 classification and 128 survival endpoints across all 32 TCGA cohorts, all 10 CPTAC cohorts, REG, BC-Therapy, BRACS, CAMELYON17, DHMC Kidney, DHMC LUAD, EBRAINS, IMP Colorectum, IMP Cervix, MBC, MUT-HET-RCC, NADT Prostate, NAT-BRCA, and PANDA.
Citation
@article{kotp2026moozy,
title = {MOOZY: A Patient-First Foundation Model for Computational Pathology},
author = {Kotp, Yousef and Trinh, Vincent Quoc-Huy and Pal, Christopher and Hosseini, Mahdi S.},
journal = {arXiv preprint arXiv:XXXX.XXXXX},
year = {2026}
}
License
CC BY-NC-SA 4.0. Research and non-commercial use only.
Model tree for AtlasAnalyticsLab/MOOZY
Base model
1aurent/vit_small_patch8_224.lunit_dinoDataset used to train AtlasAnalyticsLab/MOOZY
Evaluation results
- Weighted F1 on BC Therapyself-reported0.560
- Weighted ROC-AUC on BC Therapyself-reported0.740
- Balanced Accuracy on BC Therapyself-reported0.510
- Weighted F1 on CPTAC-BRCAself-reported0.870
- Weighted ROC-AUC on CPTAC-BRCAself-reported0.860
- Balanced Accuracy on CPTAC-BRCAself-reported0.860
- Weighted F1 on CPTAC-CCRCCself-reported0.890
- Weighted ROC-AUC on CPTAC-CCRCCself-reported0.790
- Balanced Accuracy on CPTAC-CCRCCself-reported0.780
- Weighted F1 on CPTAC-COADself-reported0.910
- Weighted ROC-AUC on CPTAC-COADself-reported0.910
- Balanced Accuracy on CPTAC-COADself-reported0.900
- Weighted F1 on CPTAC-LSCCself-reported0.780
- Weighted ROC-AUC on CPTAC-LSCCself-reported0.750
- Balanced Accuracy on CPTAC-LSCCself-reported0.770
- Weighted F1 on CPTAC-LUADself-reported0.850
- Weighted ROC-AUC on CPTAC-LUADself-reported0.800
- Balanced Accuracy on CPTAC-LUADself-reported0.790
- Weighted F1 on EBRAINSself-reported0.970
- Weighted ROC-AUC on EBRAINSself-reported0.990
