File size: 5,450 Bytes
81ac0b7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
---
language: en
license: apache-2.0
tags:
- text-generation
- domain-names
- reformer
- character-level
datasets:
- custom
metrics:
- loss
model-index:
- name: domain-generator-reformer
results:
- task:
type: text-generation
name: Domain Name Generation
metrics:
- type: loss
value: 0.9716
name: Validation Loss
---
# Domain Name Generator - Reformer Character-Level Model
A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names.
## Model Description
This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags.
### Key Features
- **Character-level generation**: Generates domains character by character for maximum flexibility
- **Tag-based prompting**: Uses 3-4 descriptive tags to guide generation
- **Style-aware**: Understands style tags like "modern", "minimal", "playful"
- **Position-independent**: Tag order doesn't matter due to training-time shuffling
## Model Details
- **Architecture**: Reformer with LSH attention
- **Base Model**: google/reformer-enwik8
- **Model Size**: ~597M parameters
- **Vocabulary Size**: 258 (byte-level encoding)
- **Max Sequence Length**: 256 characters
- **Hidden Size**: 1024
- **Layers**: 12
- **Attention Heads**: 8
## Training Details
### Training Data
- **Primary Dataset**: 250k real domains from BrandBucket
- **Synthetic Dataset**: 1.75M AI-generated domains
- **Total Examples**: ~2M domains
- **Data Split**: 80% synthetic, 20% real
### Training Configuration
- **Epochs**: 5
- **Batch Size**: 256 (128 Γ 2 gradient accumulation)
- **Learning Rate**: 5e-05
- **Tag Dropout**: 10%
- **Style Tag Probability**: 30%
- **Hardware**: NVIDIA H100 GPU
- **Training Time**: 17.6 hours
### Training Results
- **Final Training Loss**: 1.1113
- **Best Validation Loss**: 0.9716
- **Loss Reduction**: 75%
- **Training Stability**: std=0.0014 (very stable)
## Intended Use
### Primary Use Cases
- Generate domain names for startups and businesses
- Brainstorm creative domain ideas based on keywords
- Explore domain variations with different styles
### Input Format
```
tags: tag1;tag2;tag3 domain:
```
### Supported Tags
**Content Tags** (examples):
- `tech`, `ai`, `startup`, `app`, `software`
- `health`, `wellness`, `fitness`, `medical`
- `eco`, `green`, `sustainable`, `organic`
- `fashion`, `beauty`, `style`, `boutique`
- `food`, `restaurant`, `cafe`, `delivery`
**Style Tags**:
- `modern` - Clean, contemporary
- `classic` - Traditional, timeless
- `playful` - Fun, casual
- `bold` - Strong, impactful
- `elegant` - Sophisticated, refined
- `techy` - Technical, digital
- `eco` - Environmental, green
- `luxury` - Premium, high-end
- `minimal` - Simple, short
- `creative` - Artistic, unique
- `professional` - Business-oriented
- `casual` - Relaxed, informal
- `trendy` - Current, fashionable
- `simple` - Straightforward
- `unique` - Distinctive
## Usage
### With Transformers Library
```python
from transformers import ReformerModelWithLMHead, AutoTokenizer
import torch
# Load model
model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator")
model.eval()
# Character encoding (Reformer standard)
def encode_text(text):
return [c + 2 for c in text.encode('utf-8')]
def decode_ids(ids):
return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore')
# Generate domain
prompt = "tags: tech;startup;modern domain:"
input_ids = torch.tensor([encode_text(prompt)])
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=50,
temperature=1.2,
top_p=0.95,
do_sample=True,
pad_token_id=0,
eos_token_id=2
)
generated = decode_ids(output[0].tolist())
domain = generated.split("domain:")[-1].strip()
print(f"Generated: {domain}")
```
### Generation Parameters
- **Temperature**: 1.2 (recommended for creativity)
- **Top-p**: 0.95
- **Max Length**: 50 tokens after prompt
## Examples
### Input β Output Examples
```
tags: tech;startup;ai β techflow.ai
tags: eco;sustainable;modern β greenleaf.eco
tags: health;wellness;minimal β purelife.health
tags: fashion;luxury;elegant β velvetrose.com
tags: food;delivery;playful β snackdash.io
```
## Limitations
- Best results with 3-4 tags (trained range)
- May occasionally generate non-standard TLDs
- Domain availability not guaranteed
- Works best with English keywords
## Ethical Considerations
- Generated domains should be checked for trademark conflicts
- May reflect biases present in training data
- Should not be used to generate misleading or deceptive domains
## Model Card Contact
For questions or issues, please open an issue in the repository.
## Citation
If you use this model, please cite:
```bibtex
@software{domain_generator_reformer,
title = {Domain Generator - Character-Level Reformer},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/your-username/domain-generator-reformer}
}
```
## Changelog
- **v1.0** (2024-01): Initial release
- 5 epochs training on combined dataset
- 0.9716 validation loss
- Stable generation quality |