YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

LeWM Model Collection

Quantized and architecture-variant world models derived from LeWM (Lucas Maes et al., Mila/NYU/Samsung SAIL/Brown).

All models are inference-ready checkpoints with full training provenance, quantization experiments, and hardware benchmark data.


TL;DR

Model Size Quality Best For
Slim 96d/4e/4p 9.8 MB (INT8+Q4) cos=0.9982 Production ESP32 / browser
Hybrid ALAL 64d 3.9 MB (LQ40) cosβ‰ˆ0.99 Tiny edge / FPGA
Baseline Q4 23.6 MB cos=0.998 Research baseline
WANDA 40% 6.8 MB cosβ‰ˆ0.97 Max compression
Hardwired 0 MB (gates) cos=1.000 ASIC / custom silicon

See full comparison table β†’


What is LeWM?

LeWM (Latent Encoder World Model) is a JEPA-based vision world model: a ViT encoder compresses 224Γ—224 images into a compact latent vector, and a DiT-style predictor forecasts the next latent given the current state and robot action. It was trained on the PushT environment.

These variants explore:

  • Architecture changes: Latent bottleneck dimensions, layer counts, hybrid ALAL attention
  • Quantization: INT8, Q4, ternary, WANDA pruning
  • Hardware specializations: ESP32-P4 LQ40 binary, FPGA hardwired shift-add, browser WASM

Model Index

Slim Architecture Variants (epoch 1)

Model Latent Enc Pred f32 Size Quantized Cos vs f32 Binary
slim_48d_2e_2p 48 2 2 ~2 MB ~1 MB pending LQ40
slim_64d_3e_3p 64 3 3 ~3 MB ~2 MB pending LQ40
slim_96d_2e_3p 96 2 3 ~3.5 MB ~2 MB pending LQ40
slim_96d_4e_4p 96 4 4 36.8 MB 9.8 MB 0.9982 LQ40
slim_128d_4e_4p 128 4 4 ~5 MB ~3 MB pending LQ40
slim_192d_4e_4p 192 4 4 ~40 MB ~12 MB pending LQ40

Hybrid ALAL Encoder Variants (epoch 1)

Model Hidden Enc Pred Params Binary Cos vs f32
hybrid_ALAL_64d_4e_4p 64d 4 4 3.0M 3.9 MB pending

Baseline + Quantizations (epoch 100+ expert)

Model Format Size Cos vs f32 Notes
baseline_192d_6e_6p f32 safetensors 54.6 MB 1.000 Converged expert
baseline_full INT8+Q4 LQ40 10.9 MB 0.999 Production format
baseline_q4_pred Q4 pred only 23.6 MB 0.998 Encoder stays f32
baseline_wanda20_q4 20% pruned Q4 22.0 MB ~0.99 Wanda sparsity
baseline_wanda40_q4 40% pruned Q4 25.1 MB ~0.97 Aggressive prune

Note: All slim variants are epoch 1. Quality will improve with longer training (100+ epochs like the baseline expert). See Training Status.


Hardware Benchmarks

Apple Silicon (M5, Zig SIMD NEON)

Model encode (ms) predict (ms) 20-step rollout
baseline 192d/6e/6p f32 52 32 640
baseline 192d/6e/6p INT8+Q4 25 21 420
slim 96d/4e/4p f32 18 18 360
slim 96d/4e/4p INT8+Q4 10 15 300
hybrid ALAL 64d/4e/4p f32 8 10 200

ESP32-P4 (32MB PSRAM @ 200MHz, PIE SIMD)

Model predict_next encode Total (enc + 3Γ—pred)
baseline full (INT8+Q4) 828 ms ~10,000 ms ~12,500 ms
slim 96d full 583 ms 6,416 ms 7,165 ms
slim 96d full + PIE opts 583 ms 922 ms 2,669 ms
hybrid ALAL 64d + PIE opts 152 ms 922 ms 1,382 ms

Browser WASM (no SIMD, pure Rust)

Model Format Size encode predict
slim 96d f32 39.2 MB ~1,500 ms ~500 ms
slim 96d INT8+Q4 9.8 MB ~2,000 ms ~600 ms

Format Guide

safetensors (standard)

All variants exported as HuggingFace-compatible safetensors + config.json:

slim_96d_4e_4p/
β”œβ”€β”€ config.json          # Architecture config
β”œβ”€β”€ lejepa_weights.safetensors  # f32 weights
└── README.md            # This model's card

Use with Synapse, transformers, or any safetensors-compatible loader.

LQ40 (edge/embedded)

Custom binary format for microcontrollers. No JSON parsing at load time, no filesystem needed.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Magic: "LQ40"      (4 bytes)        β”‚
β”‚ config_len: uint32   (4 bytes)       β”‚
β”‚ JSON config          (config_len)    β”‚
β”‚ Weight data          (binary)         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Supports: f32, INT8 (encoder), Q4 (predictor), full Q4, WANDA-pruned Q4.

Use with:


Quantization Deep Dive

See docs/quantization.md for full details.

What Was Tried

Approach Compression Quality Loss Status
INT8 encoder 4x <0.1% Production
Q4 predictor 2x additional <0.2% Production
Full Q4 (no INT8) 6.4x total ~7% Research
Ternary ({-1,0,+1}) 16x ~15% Experimental
WANDA 20% prune 20% weights gone <1% Experimental
WANDA 40% prune 40% weights gone ~3% Experimental

Why Full Q4 Loses Quality

The encoder's INT8 path uses per-channel symmetric quantization with f32 scales. When we skip INT8 and quantize the encoder directly to Q4:

  • Encoder INT8: cos=0.9998 vs f32
  • Encoder Q4: cos=0.93 vs f32 (7% quality drop)

The issue is the ViT encoder has high dynamic range in intermediate activations. INT8 preserves more signal per channel. Q4's 32-element block granularity doesn't match the encoder's channel statistics.

Why Ternary Underperforms

Ternary weights ({-1, 0, +1}) theoretically compress 8x more than Q4. In practice:

  • Q4 cos: 0.998 vs f32
  • Ternary cos: ~0.85 vs f32

The predictor's adaLN modulation is sensitive to weight magnitude, not just sign. Ternary destroys the scale information that adaLN relies on.


Hardwired / ASIC Exploration

See docs/hardwired_lewm.md.

Q4 weights (integers -8 to 7) decompose into shift-and-add trees. Multiplication by a constant becomes a wire + adder network β€” no multiplier circuit, no memory fetch.

Results (RTL synthesized with Yosys):

  • 0 BRAM, 0 multipliers for weight storage
  • 98.3% of all multiplier circuits eliminated
  • cos=1.000 vs f32 (mathematically identical)
  • Fits in $129 Arty A7 at 12% LUT utilization

Conversion Pipeline

1. Download from W&B

pip install wandb
wandb login
python scripts/download_from_wandb.py \
  --project eren23/crucible-lewm \
  --name lewm_slim_96d_4e_4p_epoch_1 \
  --output-dir slim_96d_4e_4p/

2. Convert .ckpt β†’ safetensors + config.json

python scripts/convert_lewm_ckpt.py \
  --input slim_96d_4e_4p/lewm_slim_96d_4e_4p_epoch_1.ckpt \
  --output slim_96d_4e_4p/

The converter uses a stub unpickler β€” no jepa or module package required.

3. Export to LQ40 (edge formats)

cd synapse
cargo run --release -p synapse-inference --example export_lewm_q4 -- \
  --checkpoint ../lewm-models/slim_96d_4e_4p/lejepa_weights.safetensors \
  --config ../lewm-models/slim_96d_4e_4p/config.json \
  --mode full \
  --output ../lewm-models/slim_96d_4e_4p/lewm-slim-96d-full.bin

Modes: full (INT8+Q4), q4-pred (Q4 predictor only), wanda20-q4, wanda40-q4.


Citation

If you use these models or the quantization results, cite the original LeWM paper:

@article{maes2025lewm,
  title={LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels},
  author={Maes, Lucas and Le Lidec, Quentin and Scieur, Damien and LeCun, Yann and Balestriero, Randall},
  journal={arXiv},
  year={2025}
}

For the quantization and architecture experiments, cite this collection:

@misc{lewm_models_2026,
  title={LeWM Model Collection: Quantized and Architecture Variants},
  author={Attocoder Team},
  year={2026},
  publisher={GitHub},
  url={https://github.com/attocode/lewm-models}
}

License

All models are derived from LeWM, which is licensed under CC BY-NC 4.0.

You are free to:

  • Share: copy and redistribute the material
  • Adapt: remix, transform, and build upon the material

Under the following terms:

  • Attribution: You must give appropriate credit to the original LeWM authors
  • NonCommercial: You may not use the material for commercial purposes

See CC BY-NC 4.0 for details.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support