Realcat's picture
add: ripe
e6ac593
#
<p align="center">
<h1 align="center"> <ins>RIPE</ins>:<br> Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction <br><br>🌊🌺 ICCV 2025 🌺🌊</h1>
<p align="center">
<a href="https://scholar.google.com/citations?user=ybMR38kAAAAJ">Johannes KΓΌnzel</a>
Β·
<a href="https://scholar.google.com/citations?user=5yTuyGIAAAAJ">Anna Hilsmann</a>
Β·
<a href="https://scholar.google.com/citations?user=BCElyCkAAAAJ">Peter Eisert</a>
</p>
<h2 align="center"><p>
<a href="https://arxiv.org/abs/2507.04839" align="center">Arxiv</a> |
<a href="https://fraunhoferhhi.github.io/RIPE/" align="center">Project Page</a> |
<a href="https://huggingface.co/spaces/JohannesK14/RIPE" align="center">πŸ€—DemoπŸ€—</a>
</p></h2>
<div align="center"></div>
</p>
<br/>
<p align="center">
<img src="assets/teaser_image.png" alt="example" width=80%>
<br>
<em>RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.</em>
</p>
## Setup
πŸ’‘**Alternative**πŸ’‘ Install nothing locally and try our Hugging Face demo: [πŸ€—DemoπŸ€—](https://huggingface.co/spaces/JohannesK14/RIPE)
1. Install mamba by following the instructions given here: [Mamba Installation](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html)
2. Create a new environment with:
```bash
mamba create -f conda_env.yml
mamba activate ripe-env
```
## How to use
Or just check [demo.py](demo.py)
```python
import cv2
import kornia.feature as KF
import kornia.geometry as KG
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision.io import decode_image
from ripe import vgg_hyper
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = vgg_hyper().to(dev)
model.eval()
image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0)
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0)
kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048)
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048)
matcher = KF.DescriptorMatcher("mnn") # threshold is not used with mnn
match_dists, match_idxs = matcher(desc_1, desc_2)
matched_pts_1 = kpts_1[match_idxs[:, 0]]
matched_pts_2 = kpts_2[match_idxs[:, 1]]
H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2)
matchesMask = mask.int().ravel().tolist()
result_ransac = cv2.drawMatches(
(image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_1, score_1),
(image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_2, score_2),
cv2_matches_from_kornia(match_dists, match_idxs),
None,
matchColor=(0, 255, 0),
matchesMask=matchesMask,
# matchesMask=None, # without RANSAC filtering
singlePointColor=(0, 0, 255),
flags=cv2.DrawMatchesFlags_DEFAULT,
)
plt.imshow(result_ransac)
plt.axis("off")
plt.tight_layout()
plt.show()
# plt.savefig("result_ransac.png")
```
## Reproduce the results
### MegaDepth 1500 & HPatches
1. Download and install [Glue Factory](https://github.com/cvg/glue-factory)
2. Add this repo as a submodule to Glue Factory:
```bash
cd glue-factory
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe
```
3. Create the new file ripe.py under gluefactory/models/extractors/ with the following content:
<details>
<summary>ripe.py</summary>
```python
import sys
from pathlib import Path
import torch
import torchvision.transforms as transforms
from ..base_model import BaseModel
ripe_path = Path(__file__).parent / "../../../thirdparty/ripe"
print(f"RIPE Path: {ripe_path.resolve()}")
# check if the path exists
if not ripe_path.exists():
raise RuntimeError(f"RIPE path not found: {ripe_path}")
sys.path.append(str(ripe_path))
from ripe import vgg_hyper
class RIPE(BaseModel):
default_conf = {
"name": "RIPE",
"model_path": None,
"chunk": 4,
"dense_outputs": False,
"threshold": 1.0,
"top_k": 2048,
}
required_data_keys = ["image"]
# Initialize the line matcher
def _init(self, conf):
self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
self.model = vgg_hyper(model_path=conf.model_path)
self.model.eval()
self.set_initialized()
def _forward(self, data):
image = data["image"]
keypoints, scores, descriptors = [], [], []
chunk = self.conf.chunk
for i in range(0, image.shape[0], chunk):
if self.conf.dense_outputs:
raise NotImplementedError("Dense outputs are not supported")
else:
im = image[: min(image.shape[0], i + chunk)]
im = self.normalizer(im)
H, W = im.shape[-2:]
kpt, desc, score = self.model.detectAndCompute(
im,
threshold=self.conf.threshold,
top_k=self.conf.top_k,
)
keypoints += [kpt.squeeze(0)]
scores += [score.squeeze(0)]
descriptors += [desc.squeeze(0)]
del kpt
del desc
del score
keypoints = torch.stack(keypoints, 0)
scores = torch.stack(scores, 0)
descriptors = torch.stack(descriptors, 0)
pred = {
# "keypoints": keypoints.to(image) + 0.5,
"keypoints": keypoints.to(image),
"keypoint_scores": scores.to(image),
"descriptors": descriptors.to(image),
}
return pred
def loss(self, pred, data):
raise NotImplementedError
```
</details>
4. Create ripe+NN.yaml in gluefactory/configs with the following content:
<details>
<summary>ripe+NN.yaml</summary>
```yaml
model:
name: two_view_pipeline
extractor:
name: extractors.ripe
threshold: 1.0
top_k: 2048
matcher:
name: matchers.nearest_neighbor_matcher
benchmarks:
megadepth1500:
data:
preprocessing:
side: long
resize: 1600
eval:
estimator: poselib
ransac_th: 0.5
hpatches:
eval:
estimator: poselib
ransac_th: 0.5
model:
extractor:
top_k: 1024 # overwrite config above
```
5. Run the MegaDepth 1500 evaluation script:
```bash
python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500
```
Should result in:
```bash
'rel_pose_error@10Β°': 0.6834,
'rel_pose_error@20Β°': 0.7803,
'rel_pose_error@5Β°': 0.5511,
```
6. Run the HPatches evaluation script:
```bash
python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches
```
Should result in:
```bash
'H_error_ransac@1px': 0.3793,
'H_error_ransac@3px': 0.5893,
'H_error_ransac@5px': 0.692,
```
## Training
1. Create a .env file with the following content:
```bash
OUTPUT_DIR="/output"
DATA_DIR="/data"
```
2. Download the required datasets:
<details>
<summary>DISK Megadepth subset</summary>
To download the dataset used by [DISK](https://github.com/cvlab-epfl/disk) execute the following commands:
```bash
cd data
bash download_disk_data.sh
```
</details>
<details>
<summary>Tokyo 24/7</summary>
- ⚠️**Optional**⚠️: Only if you are interest in the model used in Section 4.6 of the paper!
- Download the Tokyo 24/7 query images from here: [Tokyo 24/7 Query Images V3](http://www.ok.ctrl.titech.ac.jp/~torii/project/247/download/247query_v3.zip) from the official [website](http://www.ok.ctrl.titech.ac.jp/~torii/project/247/_).
- extract them into data/Tolyo_Query_V3
```bash
Tokyo_Query_V3/
β”œβ”€β”€ 00001.csv
β”œβ”€β”€ 00001.jpg
β”œβ”€β”€ 00002.csv
β”œβ”€β”€ 00002.jpg
β”œβ”€β”€ ...
β”œβ”€β”€ 01125.csv
β”œβ”€β”€ 01125.jpg
β”œβ”€β”€ Readme.txt
└── Readme.txt~
```
</details>
<details>
<summary>ACDC</summary>
- ⚠️**Optional**⚠️: Only if you are interest in the model used in Section 6.1 (supplementary) of the paper!
- Download the RGB images from here: [ACDC RGB Images](https://acdc.vision.ee.ethz.ch/rgb_anon_trainvaltest.zip)
- extract them into data/ACDC
```bash
ACDC/
rgb_anon
β”œβ”€β”€ fog
β”‚Β Β  β”œβ”€β”€ test
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0475
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0477
β”‚Β Β  β”œβ”€β”€ test_ref
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0475
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0477
β”‚Β Β  β”œβ”€β”€ train
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0475
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ GOPR0476
β”œβ”€β”€ night
```
</details>
3. Run the training script:
```bash
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline
```
You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset:
```bash
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo
```
## Acknowledgements
Our code is partly based on the following repositories:
- [DALF](https://github.com/verlab/DALF_CVPR_2023) Apache License 2.0
- [DeDoDe](https://github.com/Parskatt/DeDoDe) MIT License
- [DISK](https://github.com/cvlab-epfl/disk) Apache License 2.0
Our evaluation was based on the following repositories:
- [Glue Factory](https://github.com/cvg/glue-factory)
- [hloc](https://github.com/cvg/Hierarchical-Localization)
We would like to thank the authors of these repositories for their great work and for making their code available.
Our project webpage is based on the [Acadamic Project Page Template](https://github.com/eliahuhorwitz/Academic-project-page-template) by Eliahu Horwitz.
## BibTex Citation
```
@article{ripe2025,
year = {2025},
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}},
author = {KΓΌnzel, Johannes and Hilsmann, Anna and Eisert, Peter},
journal = {arXiv},
eprint = {2507.04839},
}
```