--- license: apache-2.0 tags: - conversational - efficient - i3-architecture - custom_code datasets: - starhopp3r/TinyChat language: - en pipeline_tag: text-generation library_name: transformers --- # i3 Model - Ultra-Efficient Pretraining Language Model ## Model Description The **i3 Model** is designed to optimize **pretraining efficiency** while retaining core language modeling capabilities. Its architecture allows training on **memory-constrained hardware**, including CPU-only setups, without sacrificing sequence modeling performance. > [!Note] > The i3 architecture is present within the model for highly efficient pretraining. It is designed to **reduce memory usage**, **speed up training**, and allow pretraining from scratch on tiny hardware. Internal details are abstracted for simplicity. --- ## Use ```python from transformers import pipeline pipe = pipeline("text-generation", model="FlameF0X/i3-12m") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages) ```` --- ## Model Statistics * **Vocabulary Size:** 4,466 (variable-length chunks) * **Hidden Dimension:** 512 * **Number of Layers:** 12 * **Max Sequence Length:** 256 * **Total Parameters:** 12,691,186 * **Tokenization:** Memory-efficient variable-length chunking (2–3 characters) * **Total tokens:** 334,524,736 --- ## Key Features 1. **Memory-Optimized:** Designed to train on tiny hardware with minimal RAM usage 2. **Pretraining-Focused Architecture:** i3 layers provide efficient sequence modeling, low-rank linear updates, and factorized attention 3. **Variable-Length Tokenization:** 2–3 character chunks for compact embeddings 4. **Conversational Readiness:** Optimized for dialogue and text generation --- ## i3 Architecture (Abstract Overview) ### Design Philosophy The i3 model targets **CPU-friendly, memory-constrained pretraining**, emphasizing: * Long-range sequence modeling * Low-rank weight updates for memory savings * Efficient factorized attention * 4-bit weights and microbatching for minimal memory footprint ## Technologies used in the i3 Architecture that are open-sourced by me: * [Low-Rank Pre-training](https://github.com/FlameF0X/Low-Rank-Pretraining) - LoRa for pre-training. ### Conceptual Layout ``` Input Tokens │ +-----------------+ | Embedding Layer | +-----------------+ │ +-----------------+ | i3 Architecture | +-----------------+ │ +------------------------+ | KQV Low-Rank Attention | +------------------------+ │ +-----------------------+ | LayerNorm + Residuals | +-----------------------+ │ +-------------------+ | Output Projection | +-------------------+ │ Predicted Tokens ``` > Key idea: Every component is optimized for **memory efficiency** and **pretraining speed** on small hardware, while preserving essential transformer dynamics. --- ## Training Details * **Sequence length:** 128–512 tokens * **Model size:** ~12M parameters (CPU-friendly) * **Optimizer:** AdamW or Lion (4-bit / mixed precision) * **Dataset:** TinyChat (~50–200 MB) * **Training loop:** gradient checkpointing + recomputation * **Objective:** token prediction / text generation ## Citation ```bibtex @software{lorpt2025, title={LoRPt: Low-Rank Pretraining for Resource-Efficient Language Models}, author={[FlameF0X]}, year={2025}, url={https://github.com/FlameF0X/Low-Rank-Pretraining} } ```