YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Wire-Speed Transformer: Real-Time Learning from Live Network Streams
A novel approach to transformer training that learns directly from network traffic in real-time.
π₯ Key Results
| Time | Tokens | Loss | Notes |
|---|---|---|---|
| 0s | 0 | - | Start |
| 14s | 10k | 50.08 | Initial |
| 192s | 100k | 22.32 | -55% |
| 302s | 170k | 16.78 | -66% |
| 355s | 190k | 15.91 | -68% |
Loss dropped from 50 β 16 in under 6 minutes using only 32-token micro-batches from raw, uncurated web data.
π§ What Makes This Different
Traditional transformer training requires:
- Large batch sizes (4096+)
- Multiple epochs over curated data
- Expensive preprocessing pipelines
- Hours/days of training
Wire-Speed Learning uses:
- 32-token micro-batches (125x smaller)
- Single pass (no epochs)
- Raw web data (no curation)
- Online SGD (update every 32 tokens)
- Real-time network stream (Rust crawler β Python trainer)
ποΈ Architecture
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β Rust Crawler ββββββΆβ Tokenizer ββββββΆβ Python Trainer β
β (500 workers) β β (DeepSeek) β β (36M params) β
β ~500 pages/s β β 128k vocab β β ~500 tok/s β
βββββββββββββββββββ ββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
Live Internet Gradient Update
(no robots.txt) (every 32 tokens)
π Model Config
CONFIG = {
"d": 256, # embedding dim
"layers": 4, # transformer layers
"heads": 8, # attention heads
"rank": 32, # tuneable attention rank
"vocab": 128256, # DeepSeek V3.2 tokenizer
"ctx": 512, # context window
}
# Total: 35,993,088 parameters (36M)
π Quick Start
Requirements
- CUDA GPU (8GB+ VRAM)
- Rust toolchain
- Python 3.8+
- PyTorch 2.0+
Installation
# Clone
git clone https://huggingface.co/OpenTransformer/wire-speed-transformer
cd wire-speed-transformer
# Install Rust (if needed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source ~/.cargo/env
# Build Rust crawler
cd feeder && cargo build --release && cd ..
# Download DeepSeek tokenizer
curl -sL https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/tokenizer.json -o tokenizer.json
# Install Python deps
pip install torch
# Run!
./feeder/target/release/wire_feeder 2>feeder.log | python3 stream_trainer.py
π Files
stream_trainer.py- Python transformer trainer (online learning)feeder/- Rust high-speed web crawler + tokenizertokenizer.json- DeepSeek V3.2 tokenizer (download separately)run.sh- Launch script
π¬ Why This Works (Hypotheses)
- Small models converge faster - 36M params needs less data than 7B
- High update frequency - More gradient signal despite noise
- Web has structure - HTML patterns, common phrases provide learning signal
- DeepSeek tokenizer - High-quality tokenization from SOTA model
β οΈ Limitations
- No evaluation yet (just training loss)
- Model is tiny (36M) - won't match GPT-4
- Catastrophic forgetting not measured
- Raw web data quality unknown
π Citation
@misc{wirespeed2026,
title={Wire-Speed Transformer: Real-Time Learning from Live Network Streams},
author={OpenTransformers},
year={2026},
url={https://huggingface.co/OpenTransformer/wire-speed-transformer}
}
π Acknowledgments
- DeepSeek for the tokenizer
- Anthropic's Claude for pair programming
- vast.ai for GPU compute
π License
MIT
Built by OpenTransformers - Pushing the boundaries of what's possible with transformers.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support