Ruurd Kuiper PRO

Ruurd

AI & ML interests

None yet

Recent Activity

liked a model 18 days ago

Juvoly/J1-Llama-8B-exp

updated a Space 21 days ago

Ruurd/lad

updated a Space about 1 month ago

Ruurd/tini-lad

View all activity

Organizations

Posts 1

Post

2330

The past year I have been trying to get diffusion models to work for language generation, without having to retrain a LLM from scratch. And recently, we finally succeeded:

We introduce "LAD: LoRA-Adapted Denoiser", a method to convert a LLaMA model into a text diffusion model using LoRA finetuning and structured input corruption.

🎯 Try the demo and read the write-up here!
https://ruurdkuiper.github.io/tini-lad/

Unlike autoregressive (word-for-word) models like ChatGPT, diffusion models iteratively refine a noised sequence. However, most current diffusion approaches rely on all-parameter retraining and repeatedly remasking tokens, which is costly and slow during both training and inference!

🧠 With LAD:
- We can finetune an autoregressive model for diffusive generation in just 10 hours on a single GPU.
- Test-time compute is fully adjustable: fewer steps means faster outputs while more steps improve output quality.
- Due to our unique noising schedule, remasking is not always needed during inference. All tokens are attended to in each iteration!

🔍 LAD is built using:
– A frozen LLaMA-8B backbone
– Structured noising: token swaps, duplications, replacements, span shifts
– Modified attention masks for bidirectional decoding

💡 We show that even small, fast-trained models can perform diffusive generation — with competitive benchmark performance, perplexity and more flexible test-time behavior than traditional transformers.