| --- |
| license: other |
| license_name: raml-v1.0 |
| datasets: |
| - ReactiveAI/Beta-Pre-Train-Corpus |
| language: |
| - en |
| - pl |
| pipeline_tag: fill-mask |
| tags: |
| - agent |
| gated: true |
| extra_gated_prompt: >- |
| Accept [Reactive AI Model & Architecture License (RAML) |
| v1.0](https://github.com/RxAI-dev/rxlm/blob/main/MODELS_LICENSE.md) terms to |
| access the repository and use model. Reactive Transformer (pending patent |
| #P.453260) is available for free for non-commercial usage. For commercial |
| usage please contact Reactive AI at licensing@rxai.dev |
| extra_gated_fields: |
| Company: text |
| Country: country |
| I want to use this model for: |
| type: select |
| options: |
| - Research |
| - Education |
| - label: Other |
| value: other |
| I agree to use this model for non-commercial use ONLY: checkbox |
| extra_gated_heading: >- |
| You need to agree to use this model only for research or education purposes |
| under Reactive AI Model & Architecture License (RAML) v1.0 |
| extra_gated_description: The repository will be available instantly after accepting license terms |
| extra_gated_button_content: Accept license terms |
| --- |
| |
| <img src="https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base/resolve/main/logo_rxt_beta.png" width="512" /> |
|
|
| # RxT-Beta Encoder Base (97M) |
| **RxT-Beta** is the world's first real-scale stateful **Reactive Language Model (RxLM)** with infinite memory & context, made to confirm new **Reactive Transformer (RxT)** |
| scaling laws and solve **all** the biggest stateless LLMs problems. **RxT** models are natively conversational (and agentic) - instead of reprocessing all the |
| conversation history (chat template) like all the LLMs, it processes only single interactions in real-time and moves the context to dedicated embedding-based memory, |
| that's updated asynchronously between the interactions. It introduces unique features like: |
| - infinite conversation & global context through Mixture-of-Memory (MoM) |
| - live continual learning from interactions in real-time |
| - true real-time processing with near-zero latency |
| - linear conversation cost scaling |
| - fixed computational cost and memory usage for each interaction |
| - increasing quality of responses with subsequent steps of dialogue, without "long-term hallucinations" |
| - natively encoded memory, impossible to read without the model |
| - extreme pre-training efficiency |
| - hybrid stateful reasoning |
|
|
| In first small scale experiments **RxT-Alpha** models achieved about **50% higher accuracy** and almost **2x lower perplexity**, than the same size stateless |
| decoder-only baseline, trained on the same simple synthetic dataset (additionally, decoder-only model was pre-trained on 5x more tokens). These results were |
| then confirmed on small 10B tokens subset of real-world data and ~0.3B models (**RxT-Beta Micro**), where **RxT** advantage was even bigger. These promising |
| results, along with all the unique features, demonstrate that **Reactive Transformer** is a revolutionary generational leap and a crucial milestone on the |
| path to **Artificial General Intelligence (AGI)**. Of course, if we will confirm this at scale, which is what we plan to do with **RxT-Beta**. |
|
|
| The goal is to compete with ~1-3B params dense stateless LLMs, pre-trained on trillions tokens, using model with only 190M active parameters and about 400B |
| training tokens, and significantly outperform them on long multi-turn conversations. |
|
|
| ## Base models |
| **Reactive Transformer** models require new dedicated training pipeline to handle its asynchronous memory and reversed decoder-encoder order. Base models are |
| result of the first supervised stage - _**Joint LM Pre-Training with "cheated context" teacher forcing**_ (more info in [Decoder Card](https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base)). |
|
|
| Base encoder (this model) requires further training and should be connected with decoder and memory attention network, so it's the starting point for next stages. |
| It's pre-trained for general knowledge (with focus on reasoning) using textbook quality datasets and it could be further fine-tuned for custom use cases (under the |
| terms of the [RAML v1.0 license](https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base/blob/main/LICENSE.md)). |
|
|
| ## Usage outside Reactive Transformer |
| **RxT** encoder is a standard bidirectional transformer encoder, so after adding custom head, it could be used for different tasks, like classification, etc. In |
| example, we use fine-tuned version as a critic in Reinforcement Learning scenarios. |
|
|
| ## Encoder architecture |
| - layers: 21 |
| - dim: 512 |
| - self-attention: Gated Symmetric Sparse Query Attention (SQA) with 8/16 query/key/value heads |
| - feed forward: Dense MLP / 1536 dim with SwiGLU activation |
| - vocab: 65k (english + polish) |
| - sequence: 4096 (base after full pre-training) / 8192 (Interaction SFT) |
| - params: 97M |