--- license: apache-2.0 --- # SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the **[DINO](https://github.com/facebookresearch/dino)** framework and adapts it to the unique remote sensing data. [ **[Paper](https://arxiv.org/abs/2508.21402v1)** ], [ **[GitHub](https://github.com/strakaj/SatDINO)** ] ## Pretrained models The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks. | arch | patch size | params. | GFLOPs | linear | hugging face | weights | weights-finetune | |-----------|------------|---------|--------|--------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------| | ViT-S | 16 | 21.59 | 8.54 | 72.75 | [strakajk/satdino-vit_small-16](https://huggingface.co/strakajk/satdino-vit_small-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16-finetune.pth) | | ViT-S | 8 | 21.37 | 33.56 | 73.53 | [strakajk/satdino-vit_small-8](https://huggingface.co/strakajk/satdino-vit_small-8) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8-finetune.pth) | | ViT-B | 16 | 85.65 | 33.90 | 73.52 | [strakajk/satdino-vit_base-16](https://huggingface.co/strakajk/satdino-vit_base-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16-finetune.pth) | ### Create from HF You can create a model using Hugging Face or from the official **[GitHub](https://github.com/strakaj/SatDINO)** repository. ```python import torch from transformers import AutoModel model = AutoModel.from_pretrained("strakajk/satdino-vit_small-16", trust_remote_code=True) model.eval() # predict x = torch.randn(1, 3, 224, 224) y = model(x) # out: torch.Size([1, 384]) ``` ## Results | Dataset | **SatDINO8** | **SatDINO16** | **Scale-MAE** | **SatMAE** | |-----------|-----------------|--------------------|---------------|------------| | EuroSAT | **87.72** | 85.96 | 85.42 | 81.43 | | RESISC45 | **85.29** | 82.32 | 79.96 | 65.96 | | UC Merced | **94.82** | 93.21 | 84.58 | 78.45 | | WHU-RS19 | **98.18** | 97.82 | 89.32 | 86.41 | | RS-C11 | **96.91** | 96.61 | 93.03 | 83.96 | | SIRI-WHU | **91.82** | 87.19 | 84.84 | 77.76 | Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%). --- | **Dataset** | **Small16** | **Small8** | **Base** | |-------------|------------------|---------------|---------------| | EuroSAT | 98.69 | 98.76 | **98.83** | | RESISC45 | 95.68 | 95.16 | **96.05** | | UC Merced | 98.33 | **98.81** | 98.57 | | WHU-RS19 | **98.54** | 98.06 | 97.57 | | RS-C11 | **98.01** | 96.81 | 96.02 | | SIRI-WHU | **98.54** | 97.08 | 97.08 | SatDINO fine-tuning classification accuracy. --- | **Model** | **Backbone** | **Potsdam 2242** | **Potsdam 5122** | **Vaihingen 2242** | **Vaihingen 5122** | **LoveDA 2242** | **LoveDA 5122** | |-----------|------------------|---------------------|---------------------|-----------------------|-----------------------|--------------------|--------------------| | SatMAE | ViT-Large | 67.88 | 70.39 | 64,81 | 69.13 | 46.28 | 52.28 | | Scale-MAE | ViT-Large | 69.74 | **72.21** | 67.97 | **71.65** | **49.37** | **53.70** | | SatDINO | ViT-Small16 | 67.93 | 71.80 | 63.38 | 68.32 | 44.77 | 49.65 | | SatDINO | ViT-Small8 | **70.71** | 71.45 | **68.69** | 67.71 | 47.53 | 50.20 | | SatDINO | ViT-Base | 67.65 | 71.63 | 64.85 | 69.37 | 44.25 | 50.08 | Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU). ## License This repository is released under the Apache 2.0 license as found in the LICENSE file. ## Citation If you find this repository useful, please consider citing it: ``` @misc{straka2025satdinodeepdiveselfsupervised, title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing}, author={Jakub Straka and Ivan Gruber}, year={2025}, eprint={2508.21402}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2508.21402}, } ```