MOOZY: A Patient-First Foundation Model for Computational Pathology

MOOZY is a slide and patient-level foundation model for computational pathology. The patient case, not the individual slide, is the core unit of representation. A vision-only slide encoder pretrained with masked self-distillation on 77,134 public slides is aligned with clinical semantics through multi-task supervision over 333 tasks (205 classification, 128 survival) from 56 public datasets spanning 23 anatomical sites. A case transformer explicitly models dependencies across all slides from the same patient, replacing the naive early/late fusion used by prior methods. 85.77M total parameters. Trained entirely on public data.

Installation
Usage
Architecture
Tasks
Citation
License

Installation

pip install moozy

The checkpoint and task definitions are downloaded automatically from this repository on first use.

Usage

From pre-computed H5 feature files

The faster path. Pass .h5 files containing patch features extracted with lunit_vit_small_patch8_dino at 224x224 patch size. Compatible with AtlasPatch and TRIDENT outputs.

moozy encode slide_1.h5 slide_2.h5 --output case_embedding.h5

From raw whole-slide images

Pass slide files directly (.svs, .tiff, .ndpi, .mrxs, etc.). MOOZY calls AtlasPatch under the hood to segment tissue, extract patches, and compute features. Requires atlas-patch, sam2, and the OpenSlide system library (see the AtlasPatch installation guide).

moozy encode slide_1.svs slide_2.svs --output case_embedding.h5 --target_mag 20

Python API

from moozy.encoding import run_encoding

# From H5 feature files
run_encoding(
    slide_paths=["slide_1.h5", "slide_2.h5"],
    output_path="case_embedding.h5",
)

# From raw slides
run_encoding(
    slide_paths=["slide_1.svs", "slide_2.svs"],
    output_path="case_embedding.h5",
    target_mag=20,
)

Arguments

Argument	Default	Description
`SLIDES`	(required)	One or more H5 feature files or raw slide files forming a single case. Cannot mix the two types.
`--output`, `-o`	(required)	Output H5 file path.
`--mixed_precision`	off	Enable bfloat16 mixed precision.
`--target_mag`	20	Magnification for patch extraction from raw slides. Ignored for H5.
`--step_size`	224	Stride between patch centers in pixels. Set < 224 for overlap. Ignored for H5.
`--mpp_csv`	-	CSV with `wsi,mpp` columns for microns-per-pixel overrides. Ignored for H5.

Output format

The output H5 file contains a features dataset (768-D float32 case embedding) and a coords dataset with slide metadata.

Architecture

Component	Architecture	Params	Output dim
Patch encoder	ViT-S/8 (Lunit DINO)	21.67M	384
Slide encoder	ViT, 6 layers, 768-D, 12 heads, 2D ALiBi	42.8M	768
Case transformer	3 layers, 12 heads	21.3M	768

Tasks

This repository includes 333 task definitions in the tasks/ directory. Each task has a config.yaml (task type, organ, label mapping) and a task.csv (annotations and splits). The tasks cover 205 classification and 128 survival endpoints across all 32 TCGA cohorts, all 10 CPTAC cohorts, REG, BC-Therapy, BRACS, CAMELYON17, DHMC Kidney, DHMC LUAD, EBRAINS, IMP Colorectum, IMP Cervix, MBC, MUT-HET-RCC, NADT Prostate, NAT-BRCA, and PANDA.

Citation

@article{kotp2026moozy,
  title   = {MOOZY: A Patient-First Foundation Model for Computational Pathology},
  author  = {Kotp, Yousef and Trinh, Vincent Quoc-Huy and Pal, Christopher and Hosseini, Mahdi S.},
  journal = {arXiv preprint arXiv:XXXX.XXXXX},
  year    = {2026}
}

License

CC BY-NC-SA 4.0. Research and non-commercial use only.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AtlasAnalyticsLab/MOOZY

Base model

1aurent/vit_small_patch8_224.lunit_dino

Finetuned

(1)

this model

Dataset used to train AtlasAnalyticsLab/MOOZY

Evaluation results

Weighted F1 on BC Therapy
self-reported

0.560
Weighted ROC-AUC on BC Therapy
self-reported

0.740
Balanced Accuracy on BC Therapy
self-reported

0.510
Weighted F1 on CPTAC-BRCA
self-reported

0.870
Weighted ROC-AUC on CPTAC-BRCA
self-reported

0.860
Balanced Accuracy on CPTAC-BRCA
self-reported

0.860
Weighted F1 on CPTAC-CCRCC
self-reported

0.890
Weighted ROC-AUC on CPTAC-CCRCC
self-reported

0.790
Balanced Accuracy on CPTAC-CCRCC
self-reported

0.780
Weighted F1 on CPTAC-COAD
self-reported

0.910
Weighted ROC-AUC on CPTAC-COAD
self-reported

0.910
Balanced Accuracy on CPTAC-COAD
self-reported

0.900
Weighted F1 on CPTAC-LSCC
self-reported

0.780
Weighted ROC-AUC on CPTAC-LSCC
self-reported

0.750
Balanced Accuracy on CPTAC-LSCC
self-reported

0.770
Weighted F1 on CPTAC-LUAD
self-reported

0.850
Weighted ROC-AUC on CPTAC-LUAD
self-reported

0.800
Balanced Accuracy on CPTAC-LUAD
self-reported

0.790
Weighted F1 on EBRAINS
self-reported

0.970
Weighted ROC-AUC on EBRAINS
self-reported

0.990

AtlasAnalyticsLab
/

MOOZY