gpriday's picture
Upload folder using huggingface_hub
81ac0b7 verified
---
language: en
license: apache-2.0
tags:
- text-generation
- domain-names
- reformer
- character-level
datasets:
- custom
metrics:
- loss
model-index:
- name: domain-generator-reformer
results:
- task:
type: text-generation
name: Domain Name Generation
metrics:
- type: loss
value: 0.9716
name: Validation Loss
---
# Domain Name Generator - Reformer Character-Level Model
A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names.
## Model Description
This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags.
### Key Features
- **Character-level generation**: Generates domains character by character for maximum flexibility
- **Tag-based prompting**: Uses 3-4 descriptive tags to guide generation
- **Style-aware**: Understands style tags like "modern", "minimal", "playful"
- **Position-independent**: Tag order doesn't matter due to training-time shuffling
## Model Details
- **Architecture**: Reformer with LSH attention
- **Base Model**: google/reformer-enwik8
- **Model Size**: ~597M parameters
- **Vocabulary Size**: 258 (byte-level encoding)
- **Max Sequence Length**: 256 characters
- **Hidden Size**: 1024
- **Layers**: 12
- **Attention Heads**: 8
## Training Details
### Training Data
- **Primary Dataset**: 250k real domains from BrandBucket
- **Synthetic Dataset**: 1.75M AI-generated domains
- **Total Examples**: ~2M domains
- **Data Split**: 80% synthetic, 20% real
### Training Configuration
- **Epochs**: 5
- **Batch Size**: 256 (128 Γ— 2 gradient accumulation)
- **Learning Rate**: 5e-05
- **Tag Dropout**: 10%
- **Style Tag Probability**: 30%
- **Hardware**: NVIDIA H100 GPU
- **Training Time**: 17.6 hours
### Training Results
- **Final Training Loss**: 1.1113
- **Best Validation Loss**: 0.9716
- **Loss Reduction**: 75%
- **Training Stability**: std=0.0014 (very stable)
## Intended Use
### Primary Use Cases
- Generate domain names for startups and businesses
- Brainstorm creative domain ideas based on keywords
- Explore domain variations with different styles
### Input Format
```
tags: tag1;tag2;tag3 domain:
```
### Supported Tags
**Content Tags** (examples):
- `tech`, `ai`, `startup`, `app`, `software`
- `health`, `wellness`, `fitness`, `medical`
- `eco`, `green`, `sustainable`, `organic`
- `fashion`, `beauty`, `style`, `boutique`
- `food`, `restaurant`, `cafe`, `delivery`
**Style Tags**:
- `modern` - Clean, contemporary
- `classic` - Traditional, timeless
- `playful` - Fun, casual
- `bold` - Strong, impactful
- `elegant` - Sophisticated, refined
- `techy` - Technical, digital
- `eco` - Environmental, green
- `luxury` - Premium, high-end
- `minimal` - Simple, short
- `creative` - Artistic, unique
- `professional` - Business-oriented
- `casual` - Relaxed, informal
- `trendy` - Current, fashionable
- `simple` - Straightforward
- `unique` - Distinctive
## Usage
### With Transformers Library
```python
from transformers import ReformerModelWithLMHead, AutoTokenizer
import torch
# Load model
model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator")
model.eval()
# Character encoding (Reformer standard)
def encode_text(text):
return [c + 2 for c in text.encode('utf-8')]
def decode_ids(ids):
return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore')
# Generate domain
prompt = "tags: tech;startup;modern domain:"
input_ids = torch.tensor([encode_text(prompt)])
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=50,
temperature=1.2,
top_p=0.95,
do_sample=True,
pad_token_id=0,
eos_token_id=2
)
generated = decode_ids(output[0].tolist())
domain = generated.split("domain:")[-1].strip()
print(f"Generated: {domain}")
```
### Generation Parameters
- **Temperature**: 1.2 (recommended for creativity)
- **Top-p**: 0.95
- **Max Length**: 50 tokens after prompt
## Examples
### Input β†’ Output Examples
```
tags: tech;startup;ai β†’ techflow.ai
tags: eco;sustainable;modern β†’ greenleaf.eco
tags: health;wellness;minimal β†’ purelife.health
tags: fashion;luxury;elegant β†’ velvetrose.com
tags: food;delivery;playful β†’ snackdash.io
```
## Limitations
- Best results with 3-4 tags (trained range)
- May occasionally generate non-standard TLDs
- Domain availability not guaranteed
- Works best with English keywords
## Ethical Considerations
- Generated domains should be checked for trademark conflicts
- May reflect biases present in training data
- Should not be used to generate misleading or deceptive domains
## Model Card Contact
For questions or issues, please open an issue in the repository.
## Citation
If you use this model, please cite:
```bibtex
@software{domain_generator_reformer,
title = {Domain Generator - Character-Level Reformer},
year = {2024},
publisher = {HuggingFace},
url = {https://huggingface.co/your-username/domain-generator-reformer}
}
```
## Changelog
- **v1.0** (2024-01): Initial release
- 5 epochs training on combined dataset
- 0.9716 validation loss
- Stable generation quality