LoopCTR: Unlocking the Loop Scaling Power for Click-Through Rate Prediction
Abstract
LoopCTR introduces a loop scaling paradigm for CTR models that increases training computation through recursive layer reuse while maintaining efficient inference, achieving state-of-the-art performance with enhanced adaptive inference potential.
Scaling Transformer-based click-through rate (CTR) models by stacking more parameters brings growing computational and storage overhead, creating a widening gap between scaling ambitions and the stringent industrial deployment constraints. We propose LoopCTR, which introduces a loop scaling paradigm that increases training-time computation through recursive reuse of shared model layers, decoupling computation from parameter growth. LoopCTR adopts a sandwich architecture enhanced with Hyper-Connected Residuals and Mixture-of-Experts, and employs process supervision at every loop depth to encode multi-loop benefits into the shared parameters. This enables a train-multi-loop, infer-zero-loop strategy where a single forward pass without any loop already outperforms all baselines. Experiments on three public benchmarks and one industrial dataset demonstrate state-of-the-art performance. Oracle analysis further reveals 0.02--0.04 AUC of untapped headroom, with models trained with fewer loops exhibiting higher oracle ceilings, pointing to a promising frontier for adaptive inference.
Community
🔥 Recently, OpenMythos has been making waves in the AI community with its Recurrent-Depth Transformer, showing that scaling does not have to rely solely on stacking more layers or adding more parameters. Instead, recursive computation with shared parameters can also effectively enhance a model’s reasoning capability.
Interestingly, we have just completed a new study in the recommender systems domain: LoopCTR. To the best of our knowledge, this is the first work to systematically explore loop scaling 🔁 in recommendation models.
However, loops in recommendation scenarios cannot simply reuse “the same layer over and over” ♻️. Naively sharing parameters may lead to limited expressiveness, while a fixed computation flow struggles to adapt to different samples and different loop depths.
To address this, we introduce two key designs in the Loop Block:
🧩 MoE-based expert mixing: expands the expressive capacity of the shared layer, allowing a single layer to carry richer parameter capacity.
🕸️ Hyper-connected residual structure: enables input-aware dynamic computation allocation, breaking the limitations of fixed residual information flow.
On top of this, LoopCTR incorporates intermediate supervision 🔍, which implicitly strengthens self-distillation while significantly reducing online inference latency.
🚀 Train deep, infer shallow: the model can be trained with multiple loops but deployed with fewer loops, or even zero-loop inference, making it highly suitable for the strict latency constraints of industrial recommendation systems.
🔭 Train shallow, infer deep: somewhat counterintuitively, our oracle analysis shows that models trained with shallow loops can achieve even higher performance ceilings under deeper inference settings.
This also suggests that different samples may require different computation depths. Adaptive implicit loop inference remains a highly promising direction, although our attempts with various strategies have not yet led to a fully effective solution 😢.
The experimental results are strong 💥. Even under the zero-loop inference setting, LoopCTR consistently outperforms baseline models, achieving significant performance gains with extremely low online serving overhead ⚡. This makes it highly practical for industrial deployment 🏭.
In short, LoopCTR moves beyond the traditional “just add more parameters” scaling paradigm in recommendation systems. It opens up a new dimension of loop scaling, leveraging shared-parameter architectures with better inductive biases to enable deeper, more flexible implicit reasoning.
Our oracle experiments further show that there is still substantial untapped potential in existing approaches. How to achieve adaptive and efficient latent reasoning, and fully unlock the upper bound of loop scaling, remains an exciting open problem worth exploring. 🤔
Get this paper in your agent:
hf papers read 2604.19550 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper