YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LeWM Model Collection

Quantized and architecture-variant world models derived from LeWM (Lucas Maes et al., Mila/NYU/Samsung SAIL/Brown).

All models are inference-ready checkpoints with full training provenance, quantization experiments, and hardware benchmark data.

TL;DR

Model	Size	Quality	Best For
Slim 96d/4e/4p	9.8 MB (INT8+Q4)	cos=0.9982	Production ESP32 / browser
Hybrid ALAL 64d	3.9 MB (LQ40)	cos≈0.99	Tiny edge / FPGA
Baseline Q4	23.6 MB	cos=0.998	Research baseline
WANDA 40%	6.8 MB	cos≈0.97	Max compression
Hardwired	0 MB (gates)	cos=1.000	ASIC / custom silicon

See full comparison table →

What is LeWM?

LeWM (Latent Encoder World Model) is a JEPA-based vision world model: a ViT encoder compresses 224×224 images into a compact latent vector, and a DiT-style predictor forecasts the next latent given the current state and robot action. It was trained on the PushT environment.

These variants explore:

Architecture changes: Latent bottleneck dimensions, layer counts, hybrid ALAL attention
Quantization: INT8, Q4, ternary, WANDA pruning
Hardware specializations: ESP32-P4 LQ40 binary, FPGA hardwired shift-add, browser WASM

Model Index

Slim Architecture Variants (epoch 1)

Model	Latent	Enc	Pred	f32 Size	Quantized	Cos vs f32	Binary
slim_48d_2e_2p	48	2	2	~2 MB	~1 MB	pending	LQ40
slim_64d_3e_3p	64	3	3	~3 MB	~2 MB	pending	LQ40
slim_96d_2e_3p	96	2	3	~3.5 MB	~2 MB	pending	LQ40
slim_96d_4e_4p	96	4	4	36.8 MB	9.8 MB	0.9982	LQ40
slim_128d_4e_4p	128	4	4	~5 MB	~3 MB	pending	LQ40
slim_192d_4e_4p	192	4	4	~40 MB	~12 MB	pending	LQ40

Hybrid ALAL Encoder Variants (epoch 1)

Model	Hidden	Enc	Pred	Params	Binary	Cos vs f32
hybrid_ALAL_64d_4e_4p	64d	4	4	3.0M	3.9 MB	pending

Baseline + Quantizations (epoch 100+ expert)

Model	Format	Size	Cos vs f32	Notes
baseline_192d_6e_6p	f32 safetensors	54.6 MB	1.000	Converged expert
baseline_full	INT8+Q4 LQ40	10.9 MB	0.999	Production format
baseline_q4_pred	Q4 pred only	23.6 MB	0.998	Encoder stays f32
baseline_wanda20_q4	20% pruned Q4	22.0 MB	~0.99	Wanda sparsity
baseline_wanda40_q4	40% pruned Q4	25.1 MB	~0.97	Aggressive prune

Note: All slim variants are epoch 1. Quality will improve with longer training (100+ epochs like the baseline expert). See Training Status.

Hardware Benchmarks

Apple Silicon (M5, Zig SIMD NEON)

Model	encode (ms)	predict (ms)	20-step rollout
baseline 192d/6e/6p f32	52	32	640
baseline 192d/6e/6p INT8+Q4	25	21	420
slim 96d/4e/4p f32	18	18	360
slim 96d/4e/4p INT8+Q4	10	15	300
hybrid ALAL 64d/4e/4p f32	8	10	200

ESP32-P4 (32MB PSRAM @ 200MHz, PIE SIMD)

Model	predict_next	encode	Total (enc + 3×pred)
baseline full (INT8+Q4)	828 ms	~10,000 ms	~12,500 ms
slim 96d full	583 ms	6,416 ms	7,165 ms
slim 96d full + PIE opts	583 ms	922 ms	2,669 ms
hybrid ALAL 64d + PIE opts	152 ms	922 ms	1,382 ms

Browser WASM (no SIMD, pure Rust)

Model	Format	Size	encode	predict
slim 96d	f32	39.2 MB	~1,500 ms	~500 ms
slim 96d	INT8+Q4	9.8 MB	~2,000 ms	~600 ms

Format Guide

safetensors (standard)

All variants exported as HuggingFace-compatible safetensors + config.json:

slim_96d_4e_4p/
├── config.json          # Architecture config
├── lejepa_weights.safetensors  # f32 weights
└── README.md            # This model's card

Use with Synapse, transformers, or any safetensors-compatible loader.

LQ40 (edge/embedded)

Custom binary format for microcontrollers. No JSON parsing at load time, no filesystem needed.

┌─────────────────────────────────────┐
│ Magic: "LQ40"      (4 bytes)        │
│ config_len: uint32   (4 bytes)       │
│ JSON config          (config_len)    │
│ Weight data          (binary)         │
└─────────────────────────────────────┘

Supports: f32, INT8 (encoder), Q4 (predictor), full Q4, WANDA-pruned Q4.

Use with:

Quantization Deep Dive

See docs/quantization.md for full details.

What Was Tried

Approach	Compression	Quality Loss	Status
INT8 encoder	4x	<0.1%	Production
Q4 predictor	2x additional	<0.2%	Production
Full Q4 (no INT8)	6.4x total	~7%	Research
Ternary ({-1,0,+1})	16x	~15%	Experimental
WANDA 20% prune	20% weights gone	<1%	Experimental
WANDA 40% prune	40% weights gone	~3%	Experimental

Why Full Q4 Loses Quality

The encoder's INT8 path uses per-channel symmetric quantization with f32 scales. When we skip INT8 and quantize the encoder directly to Q4:

Encoder INT8: cos=0.9998 vs f32
Encoder Q4: cos=0.93 vs f32 (7% quality drop)

The issue is the ViT encoder has high dynamic range in intermediate activations. INT8 preserves more signal per channel. Q4's 32-element block granularity doesn't match the encoder's channel statistics.

Why Ternary Underperforms

Ternary weights ({-1, 0, +1}) theoretically compress 8x more than Q4. In practice:

Q4 cos: 0.998 vs f32
Ternary cos: ~0.85 vs f32

The predictor's adaLN modulation is sensitive to weight magnitude, not just sign. Ternary destroys the scale information that adaLN relies on.

Hardwired / ASIC Exploration

See docs/hardwired_lewm.md.

Q4 weights (integers -8 to 7) decompose into shift-and-add trees. Multiplication by a constant becomes a wire + adder network — no multiplier circuit, no memory fetch.

Results (RTL synthesized with Yosys):

0 BRAM, 0 multipliers for weight storage
98.3% of all multiplier circuits eliminated
cos=1.000 vs f32 (mathematically identical)
Fits in $129 Arty A7 at 12% LUT utilization

Conversion Pipeline

1. Download from W&B

pip install wandb
wandb login
python scripts/download_from_wandb.py \
  --project eren23/crucible-lewm \
  --name lewm_slim_96d_4e_4p_epoch_1 \
  --output-dir slim_96d_4e_4p/

2. Convert .ckpt → safetensors + config.json

python scripts/convert_lewm_ckpt.py \
  --input slim_96d_4e_4p/lewm_slim_96d_4e_4p_epoch_1.ckpt \
  --output slim_96d_4e_4p/

The converter uses a stub unpickler — no jepa or module package required.

3. Export to LQ40 (edge formats)

cd synapse
cargo run --release -p synapse-inference --example export_lewm_q4 -- \
  --checkpoint ../lewm-models/slim_96d_4e_4p/lejepa_weights.safetensors \
  --config ../lewm-models/slim_96d_4e_4p/config.json \
  --mode full \
  --output ../lewm-models/slim_96d_4e_4p/lewm-slim-96d-full.bin

Modes: full (INT8+Q4), q4-pred (Q4 predictor only), wanda20-q4, wanda40-q4.

Citation

If you use these models or the quantization results, cite the original LeWM paper:

@article{maes2025lewm,
  title={LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels},
  author={Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},
  journal={arXiv},
  year={2025}
}

For the quantization and architecture experiments, cite this collection:

@misc{lewm_models_2026,
  title={LeWM Model Collection: Quantized and Architecture Variants},
  author={Attocoder Team},
  year={2026},
  publisher={GitHub},
  url={https://github.com/attocode/lewm-models}
}

License

All models are derived from LeWM, which is licensed under CC BY-NC 4.0.

You are free to:

Share: copy and redistribute the material
Adapt: remix, transform, and build upon the material

Under the following terms:

Attribution: You must give appropriate credit to the original LeWM authors
NonCommercial: You may not use the material for commercial purposes

See CC BY-NC 4.0 for details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support