---
license: apache-2.0
---

# SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing


These are official weights for "SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing" — a self-supervised learning framework tailored for satellite imagery. SatDINO builds upon the **[DINO](https://github.com/facebookresearch/dino)** framework and adapts it to the unique remote sensing data.

[ **[Paper](https://arxiv.org/abs/2508.21402v1)** ], [ **[GitHub](https://github.com/strakaj/SatDINO)** ]


## Pretrained models

The models are pretrained on the RGB variant of the fMoW dataset and evaluated across multiple standard remote sensing benchmarks.

| arch      | patch size | params. | GFLOPs | linear | hugging face                                                                          | weights                                                                                           | weights-finetune                                                                                           |
|-----------|------------|---------|--------|--------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| ViT-S | 16         | 21.59   | 8.54   | 72.75  | [strakajk/satdino-vit_small-16](https://huggingface.co/strakajk/satdino-vit_small-16) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16.pth) | [ckp](https://huggingface.co/strakajk/satdino-vit_small-16/resolve/main/satdino-vit_small-16-finetune.pth) |
| ViT-S | 8          | 21.37   | 33.56  | 73.53  | [strakajk/satdino-vit_small-8](https://huggingface.co/strakajk/satdino-vit_small-8)   | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8.pth)   | [ckp](https://huggingface.co/strakajk/satdino-vit_small-8/resolve/main/satdino-vit_small-8-finetune.pth)   |
| ViT-B  | 16         | 85.65   | 33.90  | 73.52  | [strakajk/satdino-vit_base-16](https://huggingface.co/strakajk/satdino-vit_base-16)   | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16.pth)   | [ckp](https://huggingface.co/strakajk/satdino-vit_base-16/resolve/main/satdino-vit_base-16-finetune.pth)   |


### Create from HF
You can create a model using Hugging Face or from the official **[GitHub](https://github.com/strakaj/SatDINO)** repository.

```python
import torch
from transformers import AutoModel

model = AutoModel.from_pretrained("strakajk/satdino-vit_small-16", trust_remote_code=True)
model.eval()

# predict
x = torch.randn(1, 3, 224, 224)
y = model(x)   # out: torch.Size([1, 384])
```


## Results
| Dataset   | **SatDINO<sub>8</sub>** | **SatDINO<sub>16</sub>** | **Scale-MAE** | **SatMAE** |
|-----------|-----------------|--------------------|---------------|------------|
| EuroSAT   | **87.72**       | 85.96              | 85.42         | 81.43      |
| RESISC45  | **85.29**       | 82.32              | 79.96         | 65.96      |
| UC Merced | **94.82**       | 93.21              | 84.58         | 78.45      |
| WHU-RS19  | **98.18**       | 97.82              | 89.32         | 86.41      |
| RS-C11    | **96.91**       | 96.61              | 93.03         | 83.96      |
| SIRI-WHU  | **91.82**       | 87.19              | 84.84         | 77.76      |

Average kNN classification accuracy across multiple scales (12.5%, 25%, 50%, and 100%).

---

| **Dataset** | **Small<sub>16</sub>** | **Small<sub>8</sub>** | **Base**      |
|-------------|------------------|---------------|---------------|
| EuroSAT     | 98.69            | 98.76         | **98.83**     |
| RESISC45    | 95.68            | 95.16         | **96.05**     |
| UC Merced   | 98.33            | **98.81**     | 98.57         |
| WHU-RS19    | **98.54**        | 98.06         | 97.57         |
| RS-C11      | **98.01**        | 96.81         | 96.02         |
| SIRI-WHU    | **98.54**        | 97.08         | 97.08         |

SatDINO fine-tuning classification accuracy.

---

| **Model** | **Backbone**     | **Potsdam 224<sup>2</sup>** | **Potsdam 512<sup>2</sup>** | **Vaihingen 224<sup>2</sup>** | **Vaihingen 512<sup>2</sup>** | **LoveDA 224<sup>2</sup>** | **LoveDA 512<sup>2</sup>** |
|-----------|------------------|---------------------|---------------------|-----------------------|-----------------------|--------------------|--------------------|
| SatMAE    | ViT-Large        | 67.88               | 70.39               | 64,81                 | 69.13                 | 46.28              | 52.28              |
| Scale-MAE | ViT-Large        | 69.74               | **72.21**           | 67.97                 | **71.65**             | **49.37**          | **53.70**          |
| SatDINO   | ViT-Small<sub>16</sub> | 67.93               | 71.80               | 63.38                 | 68.32                 | 44.77              | 49.65              |
| SatDINO   | ViT-Small<sub>8</sub>    | **70.71**           | 71.45               | **68.69**             | 67.71                 | 47.53              | 50.20              |
| SatDINO   | ViT-Base         | 67.65               | 71.63               | 64.85                 | 69.37                 | 44.25              | 50.08              |

Semantic segmentation performance across multiple datasets and image scales. All results are reported in terms of mean Intersection over Union (mIoU).


## License
This repository is released under the Apache 2.0 license as found in the LICENSE file.


## Citation
If you find this repository useful, please consider citing it:
```
@misc{straka2025satdinodeepdiveselfsupervised,
      title={SatDINO: A Deep Dive into Self-Supervised Pretraining for Remote Sensing}, 
      author={Jakub Straka and Ivan Gruber},
      year={2025},
      eprint={2508.21402},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.21402}, 
}
```