Update README.md

2efb9c3 verified about 1 month ago

4.77 kB

	---
	license: other
	license_name: raml-v1.0
	datasets:
	- ReactiveAI/Beta-Pre-Train-Corpus
	language:
	- en
	- pl
	pipeline_tag: fill-mask
	tags:
	- agent
	gated: true
	extra_gated_prompt: >-
	Accept [Reactive AI Model & Architecture License (RAML)
	v1.0](https://github.com/RxAI-dev/rxlm/blob/main/MODELS_LICENSE.md) terms to
	access the repository and use model. Reactive Transformer (pending patent
	#P.453260) is available for free for non-commercial usage. For commercial
	usage please contact Reactive AI at licensing@rxai.dev
	extra_gated_fields:
	Company: text
	Country: country
	I want to use this model for:
	type: select
	options:
	- Research
	- Education
	- label: Other
	value: other
	I agree to use this model for non-commercial use ONLY: checkbox
	extra_gated_heading: >-
	You need to agree to use this model only for research or education purposes
	under Reactive AI Model & Architecture License (RAML) v1.0
	extra_gated_description: The repository will be available instantly after accepting license terms
	extra_gated_button_content: Accept license terms
	---

	<img src="https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base/resolve/main/logo_rxt_beta.png" width="512" />

	# RxT-Beta Encoder Base (97M)
	RxT-Beta is the world's first real-scale stateful Reactive Language Model (RxLM) with infinite memory & context, made to confirm new Reactive Transformer (RxT)
	scaling laws and solve all the biggest stateless LLMs problems. RxT models are natively conversational (and agentic) - instead of reprocessing all the
	conversation history (chat template) like all the LLMs, it processes only single interactions in real-time and moves the context to dedicated embedding-based memory,
	that's updated asynchronously between the interactions. It introduces unique features like:
	- infinite conversation & global context through Mixture-of-Memory (MoM)
	- live continual learning from interactions in real-time
	- true real-time processing with near-zero latency
	- linear conversation cost scaling
	- fixed computational cost and memory usage for each interaction
	- increasing quality of responses with subsequent steps of dialogue, without "long-term hallucinations"
	- natively encoded memory, impossible to read without the model
	- extreme pre-training efficiency
	- hybrid stateful reasoning

	In first small scale experiments RxT-Alpha models achieved about 50% higher accuracy and almost 2x lower perplexity, than the same size stateless
	decoder-only baseline, trained on the same simple synthetic dataset (additionally, decoder-only model was pre-trained on 5x more tokens). These results were
	then confirmed on small 10B tokens subset of real-world data and ~0.3B models (RxT-Beta Micro), where RxT advantage was even bigger. These promising
	results, along with all the unique features, demonstrate that Reactive Transformer is a revolutionary generational leap and a crucial milestone on the
	path to Artificial General Intelligence (AGI). Of course, if we will confirm this at scale, which is what we plan to do with RxT-Beta.

	The goal is to compete with ~1-3B params dense stateless LLMs, pre-trained on trillions tokens, using model with only 190M active parameters and about 400B
	training tokens, and significantly outperform them on long multi-turn conversations.

	## Base models
	Reactive Transformer models require new dedicated training pipeline to handle its asynchronous memory and reversed decoder-encoder order. Base models are
	result of the first supervised stage - _Joint LM Pre-Training with "cheated context" teacher forcing_ (more info in [Decoder Card](https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base)).

	Base encoder (this model) requires further training and should be connected with decoder and memory attention network, so it's the starting point for next stages.
	It's pre-trained for general knowledge (with focus on reasoning) using textbook quality datasets and it could be further fine-tuned for custom use cases (under the
	terms of the [RAML v1.0 license](https://huggingface.co/ReactiveAI/RxT-Beta-Decoder-Base/blob/main/LICENSE.md)).

	## Usage outside Reactive Transformer
	RxT encoder is a standard bidirectional transformer encoder, so after adding custom head, it could be used for different tasks, like classification, etc. In
	example, we use fine-tuned version as a critic in Reinforcement Learning scenarios.

	## Encoder architecture
	- layers: 21
	- dim: 512
	- self-attention: Gated Symmetric Sparse Query Attention (SQA) with 8/16 query/key/value heads
	- feed forward: Dense MLP / 1536 dim with SwiGLU activation
	- vocab: 65k (english + polish)
	- sequence: 4096 (base after full pre-training) / 8192 (Interaction SFT)
	- params: 97M