Spaces:
Ruurd
/
Running on Zero

Ruurd commited on
Commit
77f8336
·
verified ·
1 Parent(s): b92c569

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -22,7 +22,7 @@ This implementation has several benefits:
22
  - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**.
23
  - *Scalable test time compute*: By increasing the number of iterations, the answer quality improves.
24
  - *Reduced inference time*: Most questions can be answered with less iterations then the number of tokens generated!
25
- - *Greatly reduced training time*: By finetuning an autoregressive Llama-8B model using only LoRA for diffusive generation, we trained this model within several hours on a single GPU.
26
 
27
  ---
28
 
@@ -51,9 +51,11 @@ See how low you can go with the number of iterations while still receiving adequ
51
  ---
52
 
53
  More technical details (architecture, training, and evaluation) can be found in the accompanying blog post:
 
54
  📘 [Read the blog post here](https://example.com/diffusion-language-model-blog)
55
 
56
  For a more tweakable version that includes all inference parameters, check out this version:
 
57
  🎛️ [Explore the model here](https://huggingface.co/spaces/Ruurd/tini)
58
 
59
  Paper coming out soon! If you already want to cite this model, please refer to the blogpost
 
22
  - **Noiseless convergence**: A unique feature of this implementation is its ability to convergence **without intermediate noising**.
23
  - *Scalable test time compute*: By increasing the number of iterations, the answer quality improves.
24
  - *Reduced inference time*: Most questions can be answered with less iterations then the number of tokens generated!
25
+ - *Greatly reduced training time*: By LoRA-based finetuning of an autoregressive model, this model can be trained within several hours on a single GPU.
26
 
27
  ---
28
 
 
51
  ---
52
 
53
  More technical details (architecture, training, and evaluation) can be found in the accompanying blog post:
54
+
55
  📘 [Read the blog post here](https://example.com/diffusion-language-model-blog)
56
 
57
  For a more tweakable version that includes all inference parameters, check out this version:
58
+
59
  🎛️ [Explore the model here](https://huggingface.co/spaces/Ruurd/tini)
60
 
61
  Paper coming out soon! If you already want to cite this model, please refer to the blogpost