|
--- |
|
language: en |
|
license: apache-2.0 |
|
tags: |
|
- text-generation |
|
- domain-names |
|
- reformer |
|
- character-level |
|
datasets: |
|
- custom |
|
metrics: |
|
- loss |
|
model-index: |
|
- name: domain-generator-reformer |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Domain Name Generation |
|
metrics: |
|
- type: loss |
|
value: 0.9716 |
|
name: Validation Loss |
|
--- |
|
|
|
# Domain Name Generator - Reformer Character-Level Model |
|
|
|
A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names. |
|
|
|
## Model Description |
|
|
|
This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags. |
|
|
|
### Key Features |
|
- **Character-level generation**: Generates domains character by character for maximum flexibility |
|
- **Tag-based prompting**: Uses 3-4 descriptive tags to guide generation |
|
- **Style-aware**: Understands style tags like "modern", "minimal", "playful" |
|
- **Position-independent**: Tag order doesn't matter due to training-time shuffling |
|
|
|
## Model Details |
|
|
|
- **Architecture**: Reformer with LSH attention |
|
- **Base Model**: google/reformer-enwik8 |
|
- **Model Size**: ~597M parameters |
|
- **Vocabulary Size**: 258 (byte-level encoding) |
|
- **Max Sequence Length**: 256 characters |
|
- **Hidden Size**: 1024 |
|
- **Layers**: 12 |
|
- **Attention Heads**: 8 |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
- **Primary Dataset**: 250k real domains from BrandBucket |
|
- **Synthetic Dataset**: 1.75M AI-generated domains |
|
- **Total Examples**: ~2M domains |
|
- **Data Split**: 80% synthetic, 20% real |
|
|
|
### Training Configuration |
|
- **Epochs**: 5 |
|
- **Batch Size**: 256 (128 Γ 2 gradient accumulation) |
|
- **Learning Rate**: 5e-05 |
|
- **Tag Dropout**: 10% |
|
- **Style Tag Probability**: 30% |
|
- **Hardware**: NVIDIA H100 GPU |
|
- **Training Time**: 17.6 hours |
|
|
|
### Training Results |
|
- **Final Training Loss**: 1.1113 |
|
- **Best Validation Loss**: 0.9716 |
|
- **Loss Reduction**: 75% |
|
- **Training Stability**: std=0.0014 (very stable) |
|
|
|
## Intended Use |
|
|
|
### Primary Use Cases |
|
- Generate domain names for startups and businesses |
|
- Brainstorm creative domain ideas based on keywords |
|
- Explore domain variations with different styles |
|
|
|
### Input Format |
|
``` |
|
tags: tag1;tag2;tag3 domain: |
|
``` |
|
|
|
### Supported Tags |
|
|
|
**Content Tags** (examples): |
|
- `tech`, `ai`, `startup`, `app`, `software` |
|
- `health`, `wellness`, `fitness`, `medical` |
|
- `eco`, `green`, `sustainable`, `organic` |
|
- `fashion`, `beauty`, `style`, `boutique` |
|
- `food`, `restaurant`, `cafe`, `delivery` |
|
|
|
**Style Tags**: |
|
- `modern` - Clean, contemporary |
|
- `classic` - Traditional, timeless |
|
- `playful` - Fun, casual |
|
- `bold` - Strong, impactful |
|
- `elegant` - Sophisticated, refined |
|
- `techy` - Technical, digital |
|
- `eco` - Environmental, green |
|
- `luxury` - Premium, high-end |
|
- `minimal` - Simple, short |
|
- `creative` - Artistic, unique |
|
- `professional` - Business-oriented |
|
- `casual` - Relaxed, informal |
|
- `trendy` - Current, fashionable |
|
- `simple` - Straightforward |
|
- `unique` - Distinctive |
|
|
|
## Usage |
|
|
|
### With Transformers Library |
|
|
|
```python |
|
from transformers import ReformerModelWithLMHead, AutoTokenizer |
|
import torch |
|
|
|
# Load model |
|
model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator") |
|
model.eval() |
|
|
|
# Character encoding (Reformer standard) |
|
def encode_text(text): |
|
return [c + 2 for c in text.encode('utf-8')] |
|
|
|
def decode_ids(ids): |
|
return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore') |
|
|
|
# Generate domain |
|
prompt = "tags: tech;startup;modern domain:" |
|
input_ids = torch.tensor([encode_text(prompt)]) |
|
|
|
with torch.no_grad(): |
|
output = model.generate( |
|
input_ids, |
|
max_new_tokens=50, |
|
temperature=1.2, |
|
top_p=0.95, |
|
do_sample=True, |
|
pad_token_id=0, |
|
eos_token_id=2 |
|
) |
|
|
|
generated = decode_ids(output[0].tolist()) |
|
domain = generated.split("domain:")[-1].strip() |
|
print(f"Generated: {domain}") |
|
``` |
|
|
|
### Generation Parameters |
|
- **Temperature**: 1.2 (recommended for creativity) |
|
- **Top-p**: 0.95 |
|
- **Max Length**: 50 tokens after prompt |
|
|
|
## Examples |
|
|
|
### Input β Output Examples |
|
|
|
``` |
|
tags: tech;startup;ai β techflow.ai |
|
tags: eco;sustainable;modern β greenleaf.eco |
|
tags: health;wellness;minimal β purelife.health |
|
tags: fashion;luxury;elegant β velvetrose.com |
|
tags: food;delivery;playful β snackdash.io |
|
``` |
|
|
|
## Limitations |
|
|
|
- Best results with 3-4 tags (trained range) |
|
- May occasionally generate non-standard TLDs |
|
- Domain availability not guaranteed |
|
- Works best with English keywords |
|
|
|
## Ethical Considerations |
|
|
|
- Generated domains should be checked for trademark conflicts |
|
- May reflect biases present in training data |
|
- Should not be used to generate misleading or deceptive domains |
|
|
|
## Model Card Contact |
|
|
|
For questions or issues, please open an issue in the repository. |
|
|
|
## Citation |
|
|
|
If you use this model, please cite: |
|
|
|
```bibtex |
|
@software{domain_generator_reformer, |
|
title = {Domain Generator - Character-Level Reformer}, |
|
year = {2024}, |
|
publisher = {HuggingFace}, |
|
url = {https://huggingface.co/your-username/domain-generator-reformer} |
|
} |
|
``` |
|
|
|
## Changelog |
|
|
|
- **v1.0** (2024-01): Initial release |
|
- 5 epochs training on combined dataset |
|
- 0.9716 validation loss |
|
- Stable generation quality |