tini-lad

Running on Zero

Ruurd commited on Jun 5

Commit

a2ec89b

verified ·

1 Parent(s): 5887cd4

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -20,9 +20,9 @@ Inspired by diffusion processes in vision models, the system gradually improves
 This implementation has several benefits:
 - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**, although this currently works best for simple or short questions.
-- Scalable test time compute: By increasing the number of iterations, the answer quality improves.
-- Reduced inference time: Most questions can be answered with less iterations then the number of tokens generated!
-- Greatly reduced training time: By finetuning an autoregressive Llama-8B model using only LoRA for diffusive generation, we trained this model within several hours on a single GPU.
 ## 🔧 Settings
 - **Disable Intermediate Noising**: Speeds up convergence by skipping the noising step between iterations. Works best for short, factual questions.

 This implementation has several benefits:
 - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**, although this currently works best for simple or short questions.
+- *Scalable test time compute*: By increasing the number of iterations, the answer quality improves.
+- *Reduced inference time*: Most questions can be answered with less iterations then the number of tokens generated!
+- *Greatly reduced training time*: By finetuning an autoregressive Llama-8B model using only LoRA for diffusive generation, we trained this model within several hours on a single GPU.
 ## 🔧 Settings
 - **Disable Intermediate Noising**: Speeds up convergence by skipping the noising step between iterations. Works best for short, factual questions.