File size: 4,053 Bytes
ed6708e
 
 
 
 
 
bee16e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c56301
 
 
 
d7fe676
 
8c56301
d7fe676
 
 
393d1c1
d7fe676
 
393d1c1
d7fe676
 
393d1c1
8c56301
 
bee16e6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---

# Diffusion Text Demo Model

A prototype **diffusion-based language model** implemented in PyTorch and trained on a subset of the [**TinyStories** dataset](https://huggingface.co/datasets/roneneldan/TinyStories).
This model demonstrates iterative denoising for text generation, conditioned on an input prompt.

---

## Training Details

* **Dataset:** 50,000 samples from [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
* **Epochs:** 50
* **Batch size:** 16
* **Learning rate:** 1e-5
* **Diffusion steps (T):** 10
* **Tokenizer:** Naive whitespace (for demo purposes)

---

## πŸ“‰ Training Loss

| Stage        | Start Loss | End Loss |
| ------------ | ---------- | -------- |
| Epochs 1–10  | 8.38       | 6.13     |
| Epochs 11–20 | 6.12       | 6.04     |
| Epochs 21–50 | 6.04       | 5.92     |

**Final Loss (Epoch 50): 5.92**

### Loss Curve

<img src="diffusion_textmodel_loss.png" width="800" />

---

## Usage

### Install Requirements

```bash
pip install torch huggingface_hub
```

### Load the Model

```python
import torch
from modeling_diffusion import DiffusionTextModel

# Load directly from Hub
model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
model.eval()

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
```

---
### Vocabulary Initialization

```python
import json
from huggingface_hub import hf_hub_download

vocab_file = hf_hub_download("yasserrmd/diffusion-text-demo", "vocab.json")
with open(vocab_file) as f:
    vocab = json.load(f)

# Reverse mapping (IDs β†’ tokens)
id_to_word = {int(v): k for k, v in vocab.items()}

# Special IDs
pad_id, mask_id = vocab["[PAD]"], vocab["[MASK]"]

```

### Inference with Prompt

```python
def generate_with_prompt(model, input_text, max_length, T=10):
    model.eval()
    input_tokens = input_text.split()
    input_ids = [vocab.get(tok, mask_id) for tok in input_tokens]

    seq = torch.full((1, max_length), mask_id, dtype=torch.long, device=device)
    seq[0, :len(input_ids)] = torch.tensor(input_ids, device=device)

    for step in range(T, 0, -1):
        with torch.no_grad():
            logits = model(seq, torch.tensor([step], device=device))
            probs = torch.softmax(logits, dim=-1)
            for pos in range(len(input_ids), max_length):
                if seq[0, pos].item() == mask_id:
                    seq[0, pos] = torch.multinomial(probs[0, pos], 1)

    ids = seq[0].tolist()
    if pad_id in ids:
        ids = ids[:ids.index(pad_id)]
    return " ".join(id_to_word[i] for i in ids)

print(generate_with_prompt(model, "the cat", max_length=50))
```

---

## Use in a Hugging Face Space

```python
import gradio as gr
from modeling_diffusion import DiffusionTextModel

model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
model.eval()

def infer(prompt):
    return generate_with_prompt(model, prompt, max_length=50)

gr.Interface(fn=infer, inputs="text", outputs="text").launch()
```

---

## References

This model was inspired by several works on diffusion for text:

* Li et al. (2022) – [**Diffusion-LM Improves Controllable Text Generation**](https://arxiv.org/abs/2205.14217)
* Austin et al. (2021) – [**Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM)**](https://arxiv.org/abs/2107.03006)
* He et al. (2023) – [**DiffusionBERT: Improving Generative Masked Language Models with Diffusion**](https://arxiv.org/abs/2211.15029)
* Gong et al. (2023) – [**DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models**](https://arxiv.org/abs/2211.11694)
* Nie et al. (2025) – [**Large Language Diffusion Models (LLaDA)**](https://arxiv.org/abs/2501.04687)

---

⚠️ **Disclaimer:** This is a research prototype. Generations may not be coherent, since the model is trained with a simple tokenizer and on a limited dataset subset. For production-quality results, train longer with a subword tokenizer (e.g., GPT-2 BPE) and scale model size.

---