File size: 5,450 Bytes
81ac0b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
---
language: en
license: apache-2.0
tags:
- text-generation
- domain-names
- reformer
- character-level
datasets:
- custom
metrics:
- loss
model-index:
- name: domain-generator-reformer
  results:
  - task:
      type: text-generation
      name: Domain Name Generation
    metrics:
    - type: loss
      value: 0.9716
      name: Validation Loss
---

# Domain Name Generator - Reformer Character-Level Model

A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names.

## Model Description

This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags.

### Key Features
- **Character-level generation**: Generates domains character by character for maximum flexibility
- **Tag-based prompting**: Uses 3-4 descriptive tags to guide generation
- **Style-aware**: Understands style tags like "modern", "minimal", "playful"
- **Position-independent**: Tag order doesn't matter due to training-time shuffling

## Model Details

- **Architecture**: Reformer with LSH attention
- **Base Model**: google/reformer-enwik8
- **Model Size**: ~597M parameters
- **Vocabulary Size**: 258 (byte-level encoding)
- **Max Sequence Length**: 256 characters
- **Hidden Size**: 1024
- **Layers**: 12
- **Attention Heads**: 8

## Training Details

### Training Data
- **Primary Dataset**: 250k real domains from BrandBucket
- **Synthetic Dataset**: 1.75M AI-generated domains
- **Total Examples**: ~2M domains
- **Data Split**: 80% synthetic, 20% real

### Training Configuration
- **Epochs**: 5
- **Batch Size**: 256 (128 Γ— 2 gradient accumulation)
- **Learning Rate**: 5e-05
- **Tag Dropout**: 10%
- **Style Tag Probability**: 30%
- **Hardware**: NVIDIA H100 GPU
- **Training Time**: 17.6 hours

### Training Results
- **Final Training Loss**: 1.1113
- **Best Validation Loss**: 0.9716
- **Loss Reduction**: 75%
- **Training Stability**: std=0.0014 (very stable)

## Intended Use

### Primary Use Cases
- Generate domain names for startups and businesses
- Brainstorm creative domain ideas based on keywords
- Explore domain variations with different styles

### Input Format
```
tags: tag1;tag2;tag3 domain:
```

### Supported Tags

**Content Tags** (examples):
- `tech`, `ai`, `startup`, `app`, `software`
- `health`, `wellness`, `fitness`, `medical`
- `eco`, `green`, `sustainable`, `organic`
- `fashion`, `beauty`, `style`, `boutique`
- `food`, `restaurant`, `cafe`, `delivery`

**Style Tags**:
- `modern` - Clean, contemporary
- `classic` - Traditional, timeless
- `playful` - Fun, casual
- `bold` - Strong, impactful
- `elegant` - Sophisticated, refined
- `techy` - Technical, digital
- `eco` - Environmental, green
- `luxury` - Premium, high-end
- `minimal` - Simple, short
- `creative` - Artistic, unique
- `professional` - Business-oriented
- `casual` - Relaxed, informal
- `trendy` - Current, fashionable
- `simple` - Straightforward
- `unique` - Distinctive

## Usage

### With Transformers Library

```python
from transformers import ReformerModelWithLMHead, AutoTokenizer
import torch

# Load model
model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator")
model.eval()

# Character encoding (Reformer standard)
def encode_text(text):
    return [c + 2 for c in text.encode('utf-8')]

def decode_ids(ids):
    return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore')

# Generate domain
prompt = "tags: tech;startup;modern domain:"
input_ids = torch.tensor([encode_text(prompt)])

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=50,
        temperature=1.2,
        top_p=0.95,
        do_sample=True,
        pad_token_id=0,
        eos_token_id=2
    )

generated = decode_ids(output[0].tolist())
domain = generated.split("domain:")[-1].strip()
print(f"Generated: {domain}")
```

### Generation Parameters
- **Temperature**: 1.2 (recommended for creativity)
- **Top-p**: 0.95
- **Max Length**: 50 tokens after prompt

## Examples

### Input β†’ Output Examples

```
tags: tech;startup;ai β†’ techflow.ai
tags: eco;sustainable;modern β†’ greenleaf.eco
tags: health;wellness;minimal β†’ purelife.health
tags: fashion;luxury;elegant β†’ velvetrose.com
tags: food;delivery;playful β†’ snackdash.io
```

## Limitations

- Best results with 3-4 tags (trained range)
- May occasionally generate non-standard TLDs
- Domain availability not guaranteed
- Works best with English keywords

## Ethical Considerations

- Generated domains should be checked for trademark conflicts
- May reflect biases present in training data
- Should not be used to generate misleading or deceptive domains

## Model Card Contact

For questions or issues, please open an issue in the repository.

## Citation

If you use this model, please cite:

```bibtex
@software{domain_generator_reformer,
  title = {Domain Generator - Character-Level Reformer},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/domain-generator-reformer}
}
```

## Changelog

- **v1.0** (2024-01): Initial release
  - 5 epochs training on combined dataset
  - 0.9716 validation loss
  - Stable generation quality