Papers
arxiv:2601.23039

Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference

Published on Jan 30
Β· Submitted by
YIZHI LIU
on Feb 9
Authors:

Abstract

Researchers identify and address premature mode collapse in optimal transport-based structural prediction models through an adaptive stability control algorithm that prevents gradient explosions during large-scale training.

AI-generated summary

Differentiable matching layers and residual connection paradigms, often implemented via entropy-regularized Optimal Transport (OT), serve as critical mechanisms in structural prediction and architectural scaling. However, recovering discrete permutations or maintaining identity mappings via annealing Ξ΅to 0 is notoriously unstable. In this work, we identify a fundamental mechanism for this failure: Premature Mode Collapse. By analyzing the non-normal dynamics of the Sinkhorn fixed-point map, we reveal a theoretical thermodynamic speed limit: standard exponential cooling outpaces the contraction rate of the inference operator, which degrades as O(1/Ξ΅). To address this, we propose Efficient Piecewise Hybrid Adaptive Stability Control (EPH-ASC), an adaptive scheduling algorithm that monitors the stability of the inference process. We demonstrate that EPH-ASC is essential for stabilizing Manifold-Constrained Hyper-Connections (mHC) during large-scale training on the FineWeb-Edu dataset, effectively preventing late-stage gradient explosions by enforcing a linear stability law.

Community

Paper author Paper submitter

Excited to share my latest work: "Avoiding Premature Collapse: Adaptive Annealing for Entropy-Regularized Structural Inference" (arXiv:2601.23039) πŸš€

We identify a key failure mode in Sinkhorn annealing (Ξ΅β†’0): Premature Mode Collapse, caused by a "Thermodynamic Speed Limit" β€” standard exponential schedules violate stability due to O(1/Ξ΅) sensitivity and vanishing spectral gap.

Solution: EPH-ASC (Efficient Piecewise Hybrid Adaptive Stability Control) β€” lightweight adaptive scheduler that monitors primal drift and triggers "Thermodynamic Pause" when needed. Overhead <0.5%!

Results:

  • SPair-71k keypoint matching: 1.6Γ— speedup over Gumbel-Sinkhorn (47 vs 75 epochs to 90%)
  • FineWeb-Edu + Nano Gemma w/ mHC: detects instability early, prevents late gradient explosion with 340-step safety margin

Also built an interactive demo to visualize annealing strategies: https://huggingface.co/spaces/leon0923/torch-sinkhorn-asc-demo

Would love feedback from OT, routing, or mHC folks! How does this resonate with your experiences in large-scale training? @leon0923 (me) happy to discuss / collaborate on integrations.

#OptimalTransport #Sinkhorn #MachineLearning #mHC

arXivLens breakdown of this paper πŸ‘‰ https://arxivlens.com/PaperView/Details/avoiding-premature-collapse-adaptive-annealing-for-entropy-regularized-structural-inference-4548-ff098f18

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.23039 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.23039 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.23039 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.