File size: 5,450 Bytes

81ac0b7

---
language: en
license: apache-2.0
tags:
- text-generation
- domain-names
- reformer
- character-level
datasets:
- custom
metrics:
- loss
model-index:
- name: domain-generator-reformer
  results:
  - task:
      type: text-generation
      name: Domain Name Generation
    metrics:
    - type: loss
      value: 0.9716
      name: Validation Loss
---

# Domain Name Generator - Reformer Character-Level Model

A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names.

## Model Description

This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags.

### Key Features
- **Character-level generation**: Generates domains character by character for maximum flexibility
- **Tag-based prompting**: Uses 3-4 descriptive tags to guide generation
- **Style-aware**: Understands style tags like "modern", "minimal", "playful"
- **Position-independent**: Tag order doesn't matter due to training-time shuffling

## Model Details

- **Architecture**: Reformer with LSH attention
- **Base Model**: google/reformer-enwik8
- **Model Size**: ~597M parameters
- **Vocabulary Size**: 258 (byte-level encoding)
- **Max Sequence Length**: 256 characters
- **Hidden Size**: 1024
- **Layers**: 12
- **Attention Heads**: 8

## Training Details

### Training Data
- **Primary Dataset**: 250k real domains from BrandBucket
- **Synthetic Dataset**: 1.75M AI-generated domains
- **Total Examples**: ~2M domains
- **Data Split**: 80% synthetic, 20% real

### Training Configuration
- **Epochs**: 5
- **Batch Size**: 256 (128 × 2 gradient accumulation)
- **Learning Rate**: 5e-05
- **Tag Dropout**: 10%
- **Style Tag Probability**: 30%
- **Hardware**: NVIDIA H100 GPU
- **Training Time**: 17.6 hours

### Training Results
- **Final Training Loss**: 1.1113
- **Best Validation Loss**: 0.9716
- **Loss Reduction**: 75%
- **Training Stability**: std=0.0014 (very stable)

## Intended Use

### Primary Use Cases
- Generate domain names for startups and businesses
- Brainstorm creative domain ideas based on keywords
- Explore domain variations with different styles

### Input Format
```
tags: tag1;tag2;tag3 domain:
```

### Supported Tags

**Content Tags** (examples):
- `tech`, `ai`, `startup`, `app`, `software`
- `health`, `wellness`, `fitness`, `medical`
- `eco`, `green`, `sustainable`, `organic`
- `fashion`, `beauty`, `style`, `boutique`
- `food`, `restaurant`, `cafe`, `delivery`

**Style Tags**:
- `modern` - Clean, contemporary
- `classic` - Traditional, timeless
- `playful` - Fun, casual
- `bold` - Strong, impactful
- `elegant` - Sophisticated, refined
- `techy` - Technical, digital
- `eco` - Environmental, green
- `luxury` - Premium, high-end
- `minimal` - Simple, short
- `creative` - Artistic, unique
- `professional` - Business-oriented
- `casual` - Relaxed, informal
- `trendy` - Current, fashionable
- `simple` - Straightforward
- `unique` - Distinctive

## Usage

### With Transformers Library

```python
from transformers import ReformerModelWithLMHead, AutoTokenizer
import torch

# Load model
model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator")
model.eval()

# Character encoding (Reformer standard)
def encode_text(text):
    return [c + 2 for c in text.encode('utf-8')]

def decode_ids(ids):
    return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore')

# Generate domain
prompt = "tags: tech;startup;modern domain:"
input_ids = torch.tensor([encode_text(prompt)])

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=50,
        temperature=1.2,
        top_p=0.95,
        do_sample=True,
        pad_token_id=0,
        eos_token_id=2
    )

generated = decode_ids(output[0].tolist())
domain = generated.split("domain:")[-1].strip()
print(f"Generated: {domain}")
```

### Generation Parameters
- **Temperature**: 1.2 (recommended for creativity)
- **Top-p**: 0.95
- **Max Length**: 50 tokens after prompt

## Examples

### Input → Output Examples

```
tags: tech;startup;ai → techflow.ai
tags: eco;sustainable;modern → greenleaf.eco
tags: health;wellness;minimal → purelife.health
tags: fashion;luxury;elegant → velvetrose.com
tags: food;delivery;playful → snackdash.io
```

## Limitations

- Best results with 3-4 tags (trained range)
- May occasionally generate non-standard TLDs
- Domain availability not guaranteed
- Works best with English keywords

## Ethical Considerations

- Generated domains should be checked for trademark conflicts
- May reflect biases present in training data
- Should not be used to generate misleading or deceptive domains

## Model Card Contact

For questions or issues, please open an issue in the repository.

## Citation

If you use this model, please cite:

```bibtex
@software{domain_generator_reformer,
  title = {Domain Generator - Character-Level Reformer},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/your-username/domain-generator-reformer}
}
```

## Changelog

- **v1.0** (2024-01): Initial release
  - 5 epochs training on combined dataset
  - 0.9716 validation loss
  - Stable generation quality