File size: 4,053 Bytes
ed6708e bee16e6 8c56301 d7fe676 8c56301 d7fe676 393d1c1 d7fe676 393d1c1 d7fe676 393d1c1 8c56301 bee16e6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
---
tags:
- model_hub_mixin
- pytorch_model_hub_mixin
---
# Diffusion Text Demo Model
A prototype **diffusion-based language model** implemented in PyTorch and trained on a subset of the [**TinyStories** dataset](https://huggingface.co/datasets/roneneldan/TinyStories).
This model demonstrates iterative denoising for text generation, conditioned on an input prompt.
---
## Training Details
* **Dataset:** 50,000 samples from [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)
* **Epochs:** 50
* **Batch size:** 16
* **Learning rate:** 1e-5
* **Diffusion steps (T):** 10
* **Tokenizer:** Naive whitespace (for demo purposes)
---
## π Training Loss
| Stage | Start Loss | End Loss |
| ------------ | ---------- | -------- |
| Epochs 1β10 | 8.38 | 6.13 |
| Epochs 11β20 | 6.12 | 6.04 |
| Epochs 21β50 | 6.04 | 5.92 |
**Final Loss (Epoch 50): 5.92**
### Loss Curve
<img src="diffusion_textmodel_loss.png" width="800" />
---
## Usage
### Install Requirements
```bash
pip install torch huggingface_hub
```
### Load the Model
```python
import torch
from modeling_diffusion import DiffusionTextModel
# Load directly from Hub
model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
model.eval()
device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)
```
---
### Vocabulary Initialization
```python
import json
from huggingface_hub import hf_hub_download
vocab_file = hf_hub_download("yasserrmd/diffusion-text-demo", "vocab.json")
with open(vocab_file) as f:
vocab = json.load(f)
# Reverse mapping (IDs β tokens)
id_to_word = {int(v): k for k, v in vocab.items()}
# Special IDs
pad_id, mask_id = vocab["[PAD]"], vocab["[MASK]"]
```
### Inference with Prompt
```python
def generate_with_prompt(model, input_text, max_length, T=10):
model.eval()
input_tokens = input_text.split()
input_ids = [vocab.get(tok, mask_id) for tok in input_tokens]
seq = torch.full((1, max_length), mask_id, dtype=torch.long, device=device)
seq[0, :len(input_ids)] = torch.tensor(input_ids, device=device)
for step in range(T, 0, -1):
with torch.no_grad():
logits = model(seq, torch.tensor([step], device=device))
probs = torch.softmax(logits, dim=-1)
for pos in range(len(input_ids), max_length):
if seq[0, pos].item() == mask_id:
seq[0, pos] = torch.multinomial(probs[0, pos], 1)
ids = seq[0].tolist()
if pad_id in ids:
ids = ids[:ids.index(pad_id)]
return " ".join(id_to_word[i] for i in ids)
print(generate_with_prompt(model, "the cat", max_length=50))
```
---
## Use in a Hugging Face Space
```python
import gradio as gr
from modeling_diffusion import DiffusionTextModel
model = DiffusionTextModel.from_pretrained("yasserrmd/diffusion-text-demo")
model.eval()
def infer(prompt):
return generate_with_prompt(model, prompt, max_length=50)
gr.Interface(fn=infer, inputs="text", outputs="text").launch()
```
---
## References
This model was inspired by several works on diffusion for text:
* Li et al. (2022) β [**Diffusion-LM Improves Controllable Text Generation**](https://arxiv.org/abs/2205.14217)
* Austin et al. (2021) β [**Structured Denoising Diffusion Models in Discrete State-Spaces (D3PM)**](https://arxiv.org/abs/2107.03006)
* He et al. (2023) β [**DiffusionBERT: Improving Generative Masked Language Models with Diffusion**](https://arxiv.org/abs/2211.15029)
* Gong et al. (2023) β [**DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models**](https://arxiv.org/abs/2211.11694)
* Nie et al. (2025) β [**Large Language Diffusion Models (LLaDA)**](https://arxiv.org/abs/2501.04687)
---
β οΈ **Disclaimer:** This is a research prototype. Generations may not be coherent, since the model is trained with a simple tokenizer and on a limited dataset subset. For production-quality results, train longer with a subword tokenizer (e.g., GPT-2 BPE) and scale model size.
---
|