LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
Paper • 2511.08544 • Published • 11
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
An ablation of adding text conditioning to the predictor in LeJEPA. We do not predict text — we condition the JEPA predictor on text and keep the original two-term objective L = (1-λ)·L_inv + λ·SIGReg.
| Variant | Conditioning |
|---|---|
baseline |
vanilla LeJEPA, MLP predictor |
film |
FiLM on MLP predictor, text → (γ, β) |
xattn |
Patch tokens cross-attend to text |
wrong_text |
xattn with permuted label-text map |
Backbone: ViT-Small/16 at 128×128. Text tower: OpenCLIP ViT-B/32 (frozen). Dataset: CIFAR-100.
comparison.md — results, figures, and answers to the four research questions.tclejepa_src/modules.py — TCLeJEPAModel, predictor variants, SIGReg.tclejepa_src/train.py — training loop (same loss for every variant).tclejepa_src/evaluate.py — linear probe, SIGReg↔acc correlation, t-SNE steering.Checkpoints, figures, logs, and comparison.{md,json} live at
https://huggingface.co/adipanda/lejepa.
set -a && source .env && set +a # loads WANDB_API_KEY and HF_TOKEN
uv sync
EPOCHS=30 BS=512 WORKERS=12 ./run_all.sh # runs all 4 variants sequentially
uv run python -m tclejepa_src.evaluate # produces comparison.{json,md} + figures