arxiv:2605.04451

RemoteZero: Geospatial Reasoning with Zero Human Annotations

Published on May 6

· Submitted by

Authors:

Abstract

RemoteZero enables geospatial reasoning without box supervision by leveraging semantic verification capabilities of MLLMs for self-evolving localization from unlabeled remote sensing data.

AI-generated summary

Geospatial reasoning requires models to resolve complex spatial semantics and user intent into precise target locations for Earth observation. Recent progress has liberated the reasoning path from manual curation, allowing models to generate their own inference chains. Yet a final dependency remains: they are still supervised by human-annotated ground-truth coordinates. This leaves the reasoning process autonomous, but not its spatial endpoint, and prevents true self-evolution on abundant unlabeled remote sensing data. To break this bottleneck, we introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.

View arXiv page View PDF Add to collection

Community

1e12Leon

Paper submitter about 15 hours ago

We introduce RemoteZero, a box-supervision-free framework for geospatial reasoning. RemoteZero is motivated by a simple asymmetry: an MLLM is typically better at verifying whether a region satisfies a query than at directly generating precise coordinates. Leveraging this stronger discriminative ability, RemoteZero replaces geometric supervision with intrinsic semantic verification and enables GRPO training without box annotations. The resulting framework further supports iterative self-evolution, allowing the model to improve from unlabeled remote sensing imagery through its own verification signal. Experiments show that RemoteZero achieves competitive performance against strong supervised methods, demonstrating the potential of self-verifying training for geospatial reasoning localization.

avahal

about 8 hours ago

the most interesting bit here is the generate-crop-verify loop, where the model proposes a bbox, crops with a context-preserving margin, and a verifier scores semantic consistency with the query. i’m curious how sensitive this is to the crop margin size and the context window, since too big margins might swamp the signal and too small margins could starve the verifier of cues. there’s also a drift risk in the iterative self-evolution if the verifier drifts or if its acceptance criteria collapse over rounds. the arxivlens breakdown helped me parse the method details; it does a nice job unpacking the generate-crop-verify steps and the intrinsic reward shaping, e.g. https://arxivlens.com/PaperView/Details/remotezero-geospatial-reasoning-with-zero-human-annotations-3438-e62fa4da. overall, it’s a neat path toward scalable geospatial learning from unlabeled data; a quick ablation on margin choice and verifier quality would help assess where the bottleneck really sits.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.04451

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.04451 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.04451 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.04451 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.