Spaces:
Ruurd
/
Running on Zero

Ruurd commited on
Commit
b92c569
·
verified ·
1 Parent(s): a2ec89b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -19,20 +19,26 @@ This is an interactive demo of a **diffusion-style language model**, which gener
19
  Inspired by diffusion processes in vision models, the system gradually improves a corrupted text sequence until convergence.
20
 
21
  This implementation has several benefits:
22
- - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**, although this currently works best for simple or short questions.
23
  - *Scalable test time compute*: By increasing the number of iterations, the answer quality improves.
24
  - *Reduced inference time*: Most questions can be answered with less iterations then the number of tokens generated!
25
  - *Greatly reduced training time*: By finetuning an autoregressive Llama-8B model using only LoRA for diffusive generation, we trained this model within several hours on a single GPU.
26
 
 
 
27
  ## 🔧 Settings
28
  - **Disable Intermediate Noising**: Speeds up convergence by skipping the noising step between iterations. Works best for short, factual questions.
29
  - **Iterations**: Number of refinement steps. More iterations means more time to refine the answer.
30
  - **Pause Between Steps**: Slows down the process so you can visually follow the changes.
31
 
 
 
32
  ## 🖍️ Visualization
33
  - **Red tokens**: Masked (noised) tokens that will be regenerated.
34
  - **Green tokens**: Newly generated tokens compared to the previous step.
35
 
 
 
36
  ## 🧪 Example Prompt
37
  For noiseless diffusion, try short questions like:
38
  > What's the capital of France?
 
19
  Inspired by diffusion processes in vision models, the system gradually improves a corrupted text sequence until convergence.
20
 
21
  This implementation has several benefits:
22
+ - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**.
23
  - *Scalable test time compute*: By increasing the number of iterations, the answer quality improves.
24
  - *Reduced inference time*: Most questions can be answered with less iterations then the number of tokens generated!
25
  - *Greatly reduced training time*: By finetuning an autoregressive Llama-8B model using only LoRA for diffusive generation, we trained this model within several hours on a single GPU.
26
 
27
+ ---
28
+
29
  ## 🔧 Settings
30
  - **Disable Intermediate Noising**: Speeds up convergence by skipping the noising step between iterations. Works best for short, factual questions.
31
  - **Iterations**: Number of refinement steps. More iterations means more time to refine the answer.
32
  - **Pause Between Steps**: Slows down the process so you can visually follow the changes.
33
 
34
+ ---
35
+
36
  ## 🖍️ Visualization
37
  - **Red tokens**: Masked (noised) tokens that will be regenerated.
38
  - **Green tokens**: Newly generated tokens compared to the previous step.
39
 
40
+ ---
41
+
42
  ## 🧪 Example Prompt
43
  For noiseless diffusion, try short questions like:
44
  > What's the capital of France?