YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
LeWM Model Collection
Quantized and architecture-variant world models derived from LeWM (Lucas Maes et al., Mila/NYU/Samsung SAIL/Brown).
All models are inference-ready checkpoints with full training provenance, quantization experiments, and hardware benchmark data.
TL;DR
| Model | Size | Quality | Best For |
|---|---|---|---|
| Slim 96d/4e/4p | 9.8 MB (INT8+Q4) | cos=0.9982 | Production ESP32 / browser |
| Hybrid ALAL 64d | 3.9 MB (LQ40) | cosβ0.99 | Tiny edge / FPGA |
| Baseline Q4 | 23.6 MB | cos=0.998 | Research baseline |
| WANDA 40% | 6.8 MB | cosβ0.97 | Max compression |
| Hardwired | 0 MB (gates) | cos=1.000 | ASIC / custom silicon |
What is LeWM?
LeWM (Latent Encoder World Model) is a JEPA-based vision world model: a ViT encoder compresses 224Γ224 images into a compact latent vector, and a DiT-style predictor forecasts the next latent given the current state and robot action. It was trained on the PushT environment.
These variants explore:
- Architecture changes: Latent bottleneck dimensions, layer counts, hybrid ALAL attention
- Quantization: INT8, Q4, ternary, WANDA pruning
- Hardware specializations: ESP32-P4 LQ40 binary, FPGA hardwired shift-add, browser WASM
Model Index
Slim Architecture Variants (epoch 1)
| Model | Latent | Enc | Pred | f32 Size | Quantized | Cos vs f32 | Binary |
|---|---|---|---|---|---|---|---|
| slim_48d_2e_2p | 48 | 2 | 2 | ~2 MB | ~1 MB | pending | LQ40 |
| slim_64d_3e_3p | 64 | 3 | 3 | ~3 MB | ~2 MB | pending | LQ40 |
| slim_96d_2e_3p | 96 | 2 | 3 | ~3.5 MB | ~2 MB | pending | LQ40 |
| slim_96d_4e_4p | 96 | 4 | 4 | 36.8 MB | 9.8 MB | 0.9982 | LQ40 |
| slim_128d_4e_4p | 128 | 4 | 4 | ~5 MB | ~3 MB | pending | LQ40 |
| slim_192d_4e_4p | 192 | 4 | 4 | ~40 MB | ~12 MB | pending | LQ40 |
Hybrid ALAL Encoder Variants (epoch 1)
| Model | Hidden | Enc | Pred | Params | Binary | Cos vs f32 |
|---|---|---|---|---|---|---|
| hybrid_ALAL_64d_4e_4p | 64d | 4 | 4 | 3.0M | 3.9 MB | pending |
Baseline + Quantizations (epoch 100+ expert)
| Model | Format | Size | Cos vs f32 | Notes |
|---|---|---|---|---|
| baseline_192d_6e_6p | f32 safetensors | 54.6 MB | 1.000 | Converged expert |
| baseline_full | INT8+Q4 LQ40 | 10.9 MB | 0.999 | Production format |
| baseline_q4_pred | Q4 pred only | 23.6 MB | 0.998 | Encoder stays f32 |
| baseline_wanda20_q4 | 20% pruned Q4 | 22.0 MB | ~0.99 | Wanda sparsity |
| baseline_wanda40_q4 | 40% pruned Q4 | 25.1 MB | ~0.97 | Aggressive prune |
Note: All slim variants are epoch 1. Quality will improve with longer training (100+ epochs like the baseline expert). See Training Status.
Hardware Benchmarks
Apple Silicon (M5, Zig SIMD NEON)
| Model | encode (ms) | predict (ms) | 20-step rollout |
|---|---|---|---|
| baseline 192d/6e/6p f32 | 52 | 32 | 640 |
| baseline 192d/6e/6p INT8+Q4 | 25 | 21 | 420 |
| slim 96d/4e/4p f32 | 18 | 18 | 360 |
| slim 96d/4e/4p INT8+Q4 | 10 | 15 | 300 |
| hybrid ALAL 64d/4e/4p f32 | 8 | 10 | 200 |
ESP32-P4 (32MB PSRAM @ 200MHz, PIE SIMD)
| Model | predict_next | encode | Total (enc + 3Γpred) |
|---|---|---|---|
| baseline full (INT8+Q4) | 828 ms | ~10,000 ms | ~12,500 ms |
| slim 96d full | 583 ms | 6,416 ms | 7,165 ms |
| slim 96d full + PIE opts | 583 ms | 922 ms | 2,669 ms |
| hybrid ALAL 64d + PIE opts | 152 ms | 922 ms | 1,382 ms |
Browser WASM (no SIMD, pure Rust)
| Model | Format | Size | encode | predict |
|---|---|---|---|---|
| slim 96d | f32 | 39.2 MB | ~1,500 ms | ~500 ms |
| slim 96d | INT8+Q4 | 9.8 MB | ~2,000 ms | ~600 ms |
Format Guide
safetensors (standard)
All variants exported as HuggingFace-compatible safetensors + config.json:
slim_96d_4e_4p/
βββ config.json # Architecture config
βββ lejepa_weights.safetensors # f32 weights
βββ README.md # This model's card
Use with Synapse, transformers, or any safetensors-compatible loader.
LQ40 (edge/embedded)
Custom binary format for microcontrollers. No JSON parsing at load time, no filesystem needed.
βββββββββββββββββββββββββββββββββββββββ
β Magic: "LQ40" (4 bytes) β
β config_len: uint32 (4 bytes) β
β JSON config (config_len) β
β Weight data (binary) β
βββββββββββββββββββββββββββββββββββββββ
Supports: f32, INT8 (encoder), Q4 (predictor), full Q4, WANDA-pruned Q4.
Use with:
Quantization Deep Dive
See docs/quantization.md for full details.
What Was Tried
| Approach | Compression | Quality Loss | Status |
|---|---|---|---|
| INT8 encoder | 4x | <0.1% | Production |
| Q4 predictor | 2x additional | <0.2% | Production |
| Full Q4 (no INT8) | 6.4x total | ~7% | Research |
| Ternary ({-1,0,+1}) | 16x | ~15% | Experimental |
| WANDA 20% prune | 20% weights gone | <1% | Experimental |
| WANDA 40% prune | 40% weights gone | ~3% | Experimental |
Why Full Q4 Loses Quality
The encoder's INT8 path uses per-channel symmetric quantization with f32 scales. When we skip INT8 and quantize the encoder directly to Q4:
- Encoder INT8: cos=0.9998 vs f32
- Encoder Q4: cos=0.93 vs f32 (7% quality drop)
The issue is the ViT encoder has high dynamic range in intermediate activations. INT8 preserves more signal per channel. Q4's 32-element block granularity doesn't match the encoder's channel statistics.
Why Ternary Underperforms
Ternary weights ({-1, 0, +1}) theoretically compress 8x more than Q4. In practice:
- Q4 cos: 0.998 vs f32
- Ternary cos: ~0.85 vs f32
The predictor's adaLN modulation is sensitive to weight magnitude, not just sign. Ternary destroys the scale information that adaLN relies on.
Hardwired / ASIC Exploration
Q4 weights (integers -8 to 7) decompose into shift-and-add trees. Multiplication by a constant becomes a wire + adder network β no multiplier circuit, no memory fetch.
Results (RTL synthesized with Yosys):
- 0 BRAM, 0 multipliers for weight storage
- 98.3% of all multiplier circuits eliminated
- cos=1.000 vs f32 (mathematically identical)
- Fits in $129 Arty A7 at 12% LUT utilization
Conversion Pipeline
1. Download from W&B
pip install wandb
wandb login
python scripts/download_from_wandb.py \
--project eren23/crucible-lewm \
--name lewm_slim_96d_4e_4p_epoch_1 \
--output-dir slim_96d_4e_4p/
2. Convert .ckpt β safetensors + config.json
python scripts/convert_lewm_ckpt.py \
--input slim_96d_4e_4p/lewm_slim_96d_4e_4p_epoch_1.ckpt \
--output slim_96d_4e_4p/
The converter uses a stub unpickler β no jepa or module package required.
3. Export to LQ40 (edge formats)
cd synapse
cargo run --release -p synapse-inference --example export_lewm_q4 -- \
--checkpoint ../lewm-models/slim_96d_4e_4p/lejepa_weights.safetensors \
--config ../lewm-models/slim_96d_4e_4p/config.json \
--mode full \
--output ../lewm-models/slim_96d_4e_4p/lewm-slim-96d-full.bin
Modes: full (INT8+Q4), q4-pred (Q4 predictor only), wanda20-q4, wanda40-q4.
Citation
If you use these models or the quantization results, cite the original LeWM paper:
@article{maes2025lewm,
title={LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels},
author={Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},
journal={arXiv},
year={2025}
}
For the quantization and architecture experiments, cite this collection:
@misc{lewm_models_2026,
title={LeWM Model Collection: Quantized and Architecture Variants},
author={Attocoder Team},
year={2026},
publisher={GitHub},
url={https://github.com/attocode/lewm-models}
}
License
All models are derived from LeWM, which is licensed under CC BY-NC 4.0.
You are free to:
- Share: copy and redistribute the material
- Adapt: remix, transform, and build upon the material
Under the following terms:
- Attribution: You must give appropriate credit to the original LeWM authors
- NonCommercial: You may not use the material for commercial purposes
See CC BY-NC 4.0 for details.