Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models
Abstract
The Reactive Transformer (RxT) addresses the limitations of stateless Transformers in conversational AI by using an event-driven paradigm with a fixed-size Short-Term Memory (STM) system, achieving linear scaling and low latency.
The Transformer architecture has become the de facto standard for Large Language Models (LLMs), demonstrating remarkable capabilities in language understanding and generation. However, its application in conversational AI is fundamentally constrained by its stateless nature and the quadratic computational complexity (O(L^2)) with respect to sequence length L. Current models emulate memory by reprocessing an ever-expanding conversation history with each turn, leading to prohibitive costs and latency in long dialogues. This paper introduces the Reactive Transformer (RxT), a novel architecture designed to overcome these limitations by shifting from a data-driven to an event-driven paradigm. RxT processes each conversational turn as a discrete event in real-time, maintaining context in an integrated, fixed-size Short-Term Memory (STM) system. The architecture features a distinct operational cycle where a generator-decoder produces a response based on the current query and the previous memory state, after which a memory-encoder and a dedicated Memory Attention network asynchronously update the STM with a representation of the complete interaction. This design fundamentally alters the scaling dynamics, reducing the total user-facing cost of a conversation from quadratic (O(N^2 cdot T)) to linear (O(N cdot T)) with respect to the number of interactions N. By decoupling response generation from memory updates, RxT achieves low latency, enabling truly real-time, stateful, and economically viable long-form conversations. We validated our architecture with a series of proof-of-concept experiments on synthetic data, demonstrating superior performance and constant-time inference latency compared to a baseline stateless model of comparable size.
Community
Paper is introducing Reactive Transfomer (RxT) architecture for stateful real-time processing, that's outperforming same size stateless decoder-only model in small-scale experiments. Architecture advantages:
- natively trained for conversations
- linear conversation cost scaling instead of quadratic in LLMs
- no prompt phase latency, thanks to asynchronous memory update
- constant computational cost and memory usage for every message
- natively encoded context in memory layers
- better multi-turn conversations quality
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- From TLinFormer to TConstFormer: The Leap to Constant-Time Transformer Attention: Achieving O(1) Computation and O(1) KV Cache during Autoregressive Inference (2025)
- Hydra: A 1.6B-Parameter State-Space Language Model with Sparse Attention, Mixture-of-Experts, and Memory (2025)
- Memory-Augmented Transformers: A Systematic Review from Neuroscience Principles to Enhanced Model Architectures (2025)
- CIFLEX: Contextual Instruction Flow for Sub-task Execution in Multi-Turn Interactions with a Single On-Device LLM (2025)
- DySK-Attn: A Framework for Efficient, Real-Time Knowledge Updating in Large Language Models via Dynamic Sparse Knowledge Attention (2025)
- Memory in Large Language Models: Mechanisms, Evaluation and Evolution (2025)
- Thai Semantic End-of-Turn Detection for Real-Time Voice Agents (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper