Upload folder using huggingface_hub

81ac0b7 verified about 2 months ago

5.45 kB

	---
	language: en
	license: apache-2.0
	tags:
	- text-generation
	- domain-names
	- reformer
	- character-level
	datasets:
	- custom
	metrics:
	- loss
	model-index:
	- name: domain-generator-reformer
	results:
	- task:
	type: text-generation
	name: Domain Name Generation
	metrics:
	- type: loss
	value: 0.9716
	name: Validation Loss
	---

	# Domain Name Generator - Reformer Character-Level Model

	A character-level Reformer model trained to generate domain names based on descriptive tags. The model takes a set of content and style tags as input and generates appropriate, creative domain names.

	## Model Description

	This model is a fine-tuned version of `google/reformer-enwik8` specifically adapted for domain name generation. It uses a pure tag-based approach where both content descriptors (e.g., "tech", "health") and style descriptors (e.g., "modern", "minimal") are treated as equal tags.

	### Key Features
	- Character-level generation: Generates domains character by character for maximum flexibility
	- Tag-based prompting: Uses 3-4 descriptive tags to guide generation
	- Style-aware: Understands style tags like "modern", "minimal", "playful"
	- Position-independent: Tag order doesn't matter due to training-time shuffling

	## Model Details

	- Architecture: Reformer with LSH attention
	- Base Model: google/reformer-enwik8
	- Model Size: ~597M parameters
	- Vocabulary Size: 258 (byte-level encoding)
	- Max Sequence Length: 256 characters
	- Hidden Size: 1024
	- Layers: 12
	- Attention Heads: 8

	## Training Details

	### Training Data
	- Primary Dataset: 250k real domains from BrandBucket
	- Synthetic Dataset: 1.75M AI-generated domains
	- Total Examples: ~2M domains
	- Data Split: 80% synthetic, 20% real

	### Training Configuration
	- Epochs: 5
	- Batch Size: 256 (128 × 2 gradient accumulation)
	- Learning Rate: 5e-05
	- Tag Dropout: 10%
	- Style Tag Probability: 30%
	- Hardware: NVIDIA H100 GPU
	- Training Time: 17.6 hours

	### Training Results
	- Final Training Loss: 1.1113
	- Best Validation Loss: 0.9716
	- Loss Reduction: 75%
	- Training Stability: std=0.0014 (very stable)

	## Intended Use

	### Primary Use Cases
	- Generate domain names for startups and businesses
	- Brainstorm creative domain ideas based on keywords
	- Explore domain variations with different styles

	### Input Format
	```
	tags: tag1;tag2;tag3 domain:
	```

	### Supported Tags

	Content Tags (examples):
	- `tech`, `ai`, `startup`, `app`, `software`
	- `health`, `wellness`, `fitness`, `medical`
	- `eco`, `green`, `sustainable`, `organic`
	- `fashion`, `beauty`, `style`, `boutique`
	- `food`, `restaurant`, `cafe`, `delivery`

	Style Tags:
	- `modern` - Clean, contemporary
	- `classic` - Traditional, timeless
	- `playful` - Fun, casual
	- `bold` - Strong, impactful
	- `elegant` - Sophisticated, refined
	- `techy` - Technical, digital
	- `eco` - Environmental, green
	- `luxury` - Premium, high-end
	- `minimal` - Simple, short
	- `creative` - Artistic, unique
	- `professional` - Business-oriented
	- `casual` - Relaxed, informal
	- `trendy` - Current, fashionable
	- `simple` - Straightforward
	- `unique` - Distinctive

	## Usage

	### With Transformers Library

	```python
	from transformers import ReformerModelWithLMHead, AutoTokenizer
	import torch

	# Load model
	model = ReformerModelWithLMHead.from_pretrained("path/to/domain-generator")
	model.eval()

	# Character encoding (Reformer standard)
	def encode_text(text):
	return [c + 2 for c in text.encode('utf-8')]

	def decode_ids(ids):
	return bytes([max(0, id - 2) for id in ids if id > 2]).decode('utf-8', errors='ignore')

	# Generate domain
	prompt = "tags: tech;startup;modern domain:"
	input_ids = torch.tensor([encode_text(prompt)])

	with torch.no_grad():
	output = model.generate(
	input_ids,
	max_new_tokens=50,
	temperature=1.2,
	top_p=0.95,
	do_sample=True,
	pad_token_id=0,
	eos_token_id=2
	)

	generated = decode_ids(output[0].tolist())
	domain = generated.split("domain:")[-1].strip()
	print(f"Generated: {domain}")
	```

	### Generation Parameters
	- Temperature: 1.2 (recommended for creativity)
	- Top-p: 0.95
	- Max Length: 50 tokens after prompt

	## Examples

	### Input → Output Examples

	```
	tags: tech;startup;ai → techflow.ai
	tags: eco;sustainable;modern → greenleaf.eco
	tags: health;wellness;minimal → purelife.health
	tags: fashion;luxury;elegant → velvetrose.com
	tags: food;delivery;playful → snackdash.io
	```

	## Limitations

	- Best results with 3-4 tags (trained range)
	- May occasionally generate non-standard TLDs
	- Domain availability not guaranteed
	- Works best with English keywords

	## Ethical Considerations

	- Generated domains should be checked for trademark conflicts
	- May reflect biases present in training data
	- Should not be used to generate misleading or deceptive domains

	## Model Card Contact

	For questions or issues, please open an issue in the repository.

	## Citation

	If you use this model, please cite:

	```bibtex
	@software{domain_generator_reformer,
	title = {Domain Generator - Character-Level Reformer},
	year = {2024},
	publisher = {HuggingFace},
	url = {https://huggingface.co/your-username/domain-generator-reformer}
	}
	```

	## Changelog

	- v1.0 (2024-01): Initial release
	- 5 epochs training on combined dataset
	- 0.9716 validation loss
	- Stable generation quality