Spaces:

EQUES
/

Space-Aware_Text-to-Image_Generation

Sleeping

App Files Files Community

Space-Aware_Text-to-Image_Generation / README.md

eiji

change parameter controls

e0a8f84 about 2 months ago

preview code

raw

history blame contribute delete

5.79 kB

	---
	title: SAWNA Space-Aware Text-to-Image Generation
	emoji: 🎨
	colorFrom: purple
	colorTo: pink
	sdk: gradio
	sdk_version: 5.34.1
	app_file: app.py
	pinned: false
	license: apache-2.0
	---

	# 🎨 SAWNA: Space-Aware Text-to-Image Generation

	Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. However, in real-world design applications, such as advertisements, posters, and UI mockups, professional design workflows are often driven by the opposite constraint: certain areas must remain empty for headlines, logos, or product shots that will be added later.

	Existing models cannot guarantee this, leading to costly manual retouching and limiting automation. We introduce Space-Controllable Text-to-Image Generation, a task that treats negative space as a first-class condition. To solve this, we propose SAWNA (Space-Aware Text-to-Image Generation), a diffusion-based framework that accepts a user-defined layout and injects nonreactive noise to ensure reserved regions remain empty throughout the denoising process.

	SAWNA suppresses content in masked areas while preserving diversity and visual fidelity elsewhere without any additional training and fine-tuning. Experiments show that SAWNA effectively enforces empty regions and enhances the design utility of generated images, offering a practical solution for layout-sensitive generation tasks.

	## 🚀 Features

	- 🔲 Reserved Region Control: Define multiple bounding boxes where content generation is suppressed
	- 🎯 Non-reactive Noise Optimization: Advanced noise manipulation prevents content generation in masked areas
	- 🎨 Professional Design Workflows: Perfect for advertisements, posters, and UI mockups requiring empty space
	- 🔧 Training-free: Works with any Stable Diffusion model without fine-tuning
	- ⚡ Interactive Builder: Intuitive bounding box creation with presets and manual controls
	- 🔄 Multi-Model Support: Compatible with SD 1.5, SDXL, and SD 2.1 architectures

	## 📖 How SAWNA Works

	SAWNA introduces a novel space-aware noise injection technique that:

	1. Bounding Box Definition: Users define reserved regions through axis-aligned bounding boxes
	2. Binary Occupancy Mapping: Creates masks M ∈ {0,1}^(H×W) for each reserved region
	3. Gaussian Blur Transitions: Applies soft transitions to prevent ringing artifacts
	4. Non-reactive Noise Injection: Uses TKG-DM methodology to suppress content generation
	5. Space-Aware Blending: Applies the formula: ε_masked = ε + M_blur(ε_shifted - ε)

	### Mathematical Foundation

	The core innovation lies in the space-aware noise blending formula:

	```
	ε_masked = ε + M_blur(ε_shifted - ε)
	```

	Where:
	- ε: Standard Gaussian noise tensor
	- ε_shifted: Non-reactive noise from TKG-DM channel shifts
	- M_blur: Soft transition mask from Gaussian-blurred bounding boxes

	Inside reserved boxes (M_blur ≈ 1): dominated by mean-shifted noise (non-reactive)
	Outside boxes (M_blur ≈ 0): reduces to ordinary Gaussian noise for full synthesis

	## 🎮 Usage

	1. Enter your prompt: Describe what you want to generate
	2. Define reserved regions: Use the bounding box builder to specify empty areas
	3. Choose presets: Quick layouts like "Center Box", "Frame Border", or "Four Corners"
	4. Adjust latent channels: Fine-tune the 4 latent space channels for color control
	5. Select model: Choose from preset architectures or specify custom Hugging Face model ID
	6. Generate: Create your space-aware image with guaranteed empty regions

	### Bounding Box Builder

	The interactive builder provides:

	- Quick Presets: Common layouts for design workflows
	- Manual Creation: Precise coordinate inputs (normalized 0.0-1.0)
	- Visual Preview: Real-time visualization of reserved regions
	- Multiple Boxes: Support for complex layouts with multiple empty areas

	### Custom Models

	You can use any Hugging Face Stable Diffusion model by entering its model ID:

	- SD 1.5 variants: `dreamlike-art/dreamlike-diffusion-1.0`, `nitrosocke/Arcane-Diffusion`
	- SDXL models: `stabilityai/stable-diffusion-xl-base-1.0`, `playgroundai/playground-v2.5-1024px-aesthetic`
	- SD 2.1 models: `stabilityai/stable-diffusion-2-1`, `22h/vintedois-diffusion-v0-2`

	The system automatically detects the architecture type and loads the appropriate pipeline.

	## 🔬 Technical Details

	- Space-Aware Generation: Multiple reserved bounding boxes with Gaussian blur transitions
	- Non-reactive Noise: TKG-DM channel shifts suppress content generation in masked areas
	- Multi-Model Support: Preset architectures (SD 1.5, SDXL, SD 2.1) + Custom Hugging Face models
	- Auto-Detection: Automatically detects model architecture from Hugging Face model IDs
	- 4-Channel Control: Independent control over all latent space channels
	- Professional Workflows: Designed for real-world design applications

	## 📚 Citation

	Based on TKG-DM methodology by Morita et al. (2024):

	```bibtex
	@article{morita2024tkgdm,
	title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
	author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
	journal={arXiv preprint arXiv:2411.15580},
	year={2024}
	}
	```

	## 🛠️ Local Installation

	```bash
	git clone https://huggingface.co/spaces/YOUR_USERNAME/sawna-space-aware
	cd sawna-space-aware
	pip install -r requirements.txt
	python app.py
	```

	## 📄 License

	This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

	---

	Built with ❤️ using Gradio and Diffusers for professional design workflows