Fast Spatial Memory with Elastic Test-Time Training
Abstract
Elastic Test-Time Training with fast spatial memory enables efficient 4D reconstruction through multi-chunk adaptation while maintaining stability against catastrophic forgetting.
Large Chunk Test-Time Training (LaCT) has shown strong performance on long-context 3D reconstruction, but its fully plastic inference-time updates remain vulnerable to catastrophic forgetting and overfitting. As a result, LaCT is typically instantiated with a single large chunk spanning the full input sequence, falling short of the broader goal of handling arbitrarily long sequences in a single pass. We propose Elastic Test-Time Training inspired by elastic weight consolidation, that stabilizes LaCT fast-weight updates with a Fisher-weighted elastic prior around a maintained anchor state. The anchor evolves as an exponential moving average of past fast weights to balance stability and plasticity. Based on this updated architecture, we introduce Fast Spatial Memory (FSM), an efficient and scalable model for 4D reconstruction that learns spatiotemporal representations from long observation sequences and renders novel view-time combinations. We pre-trained FSM on large-scale curated 3D/4D data to capture the dynamics and semantics of complex spatial environments. Extensive experiments show that FSM supports fast adaptation over long sequences and delivers high-quality 3D/4D reconstruction with smaller chunks and mitigating the camera-interpolation shortcut. Overall, we hope to advance LaCT beyond the bounded single-chunk setting toward robust multi-chunk adaptation, a necessary step for generalization to genuinely longer sequences, while substantially alleviating the activation-memory bottleneck.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mem3R: Streaming 3D Reconstruction with Hybrid Memory via Test-Time Training (2026)
- LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory (2026)
- TTA-Vid: Generalized Test-Time Adaptation for Video Reasoning (2026)
- MeMix: Writing Less, Remembering More for Streaming 3D Reconstruction (2026)
- tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction (2026)
- Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training (2026)
- SR-TTT: Surprisal-Aware Residual Test-Time Training (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.07350 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper