|
--- |
|
title: SAWNA Space-Aware Text-to-Image Generation |
|
emoji: 🎨 |
|
colorFrom: purple |
|
colorTo: pink |
|
sdk: gradio |
|
sdk_version: 5.34.1 |
|
app_file: app.py |
|
pinned: false |
|
license: apache-2.0 |
|
--- |
|
|
|
# 🎨 SAWNA: Space-Aware Text-to-Image Generation |
|
|
|
Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. However, in real-world design applications, such as advertisements, posters, and UI mockups, professional design workflows are often driven by the opposite constraint: certain areas must remain empty for headlines, logos, or product shots that will be added later. |
|
|
|
Existing models cannot guarantee this, leading to costly manual retouching and limiting automation. We introduce **Space-Controllable Text-to-Image Generation**, a task that treats negative space as a first-class condition. To solve this, we propose **SAWNA (Space-Aware Text-to-Image Generation)**, a diffusion-based framework that accepts a user-defined layout and injects nonreactive noise to ensure reserved regions remain empty throughout the denoising process. |
|
|
|
SAWNA suppresses content in masked areas while preserving diversity and visual fidelity elsewhere without any additional training and fine-tuning. Experiments show that SAWNA effectively enforces empty regions and enhances the design utility of generated images, offering a practical solution for layout-sensitive generation tasks. |
|
|
|
## 🚀 Features |
|
|
|
- **🔲 Reserved Region Control**: Define multiple bounding boxes where content generation is suppressed |
|
- **🎯 Non-reactive Noise Optimization**: Advanced noise manipulation prevents content generation in masked areas |
|
- **🎨 Professional Design Workflows**: Perfect for advertisements, posters, and UI mockups requiring empty space |
|
- **🔧 Training-free**: Works with any Stable Diffusion model without fine-tuning |
|
- **⚡ Interactive Builder**: Intuitive bounding box creation with presets and manual controls |
|
- **🔄 Multi-Model Support**: Compatible with SD 1.5, SDXL, and SD 2.1 architectures |
|
|
|
## 📖 How SAWNA Works |
|
|
|
SAWNA introduces a novel space-aware noise injection technique that: |
|
|
|
1. **Bounding Box Definition**: Users define reserved regions through axis-aligned bounding boxes |
|
2. **Binary Occupancy Mapping**: Creates masks M ∈ {0,1}^(H×W) for each reserved region |
|
3. **Gaussian Blur Transitions**: Applies soft transitions to prevent ringing artifacts |
|
4. **Non-reactive Noise Injection**: Uses TKG-DM methodology to suppress content generation |
|
5. **Space-Aware Blending**: Applies the formula: **ε_masked = ε + M_blur(ε_shifted - ε)** |
|
|
|
### Mathematical Foundation |
|
|
|
The core innovation lies in the space-aware noise blending formula: |
|
|
|
``` |
|
ε_masked = ε + M_blur(ε_shifted - ε) |
|
``` |
|
|
|
Where: |
|
- **ε**: Standard Gaussian noise tensor |
|
- **ε_shifted**: Non-reactive noise from TKG-DM channel shifts |
|
- **M_blur**: Soft transition mask from Gaussian-blurred bounding boxes |
|
|
|
Inside reserved boxes (M_blur ≈ 1): dominated by mean-shifted noise (non-reactive) |
|
Outside boxes (M_blur ≈ 0): reduces to ordinary Gaussian noise for full synthesis |
|
|
|
## 🎮 Usage |
|
|
|
1. **Enter your prompt**: Describe what you want to generate |
|
2. **Define reserved regions**: Use the bounding box builder to specify empty areas |
|
3. **Choose presets**: Quick layouts like "Center Box", "Frame Border", or "Four Corners" |
|
4. **Adjust latent channels**: Fine-tune the 4 latent space channels for color control |
|
5. **Select model**: Choose from preset architectures or specify custom Hugging Face model ID |
|
6. **Generate**: Create your space-aware image with guaranteed empty regions |
|
|
|
### Bounding Box Builder |
|
|
|
The interactive builder provides: |
|
|
|
- **Quick Presets**: Common layouts for design workflows |
|
- **Manual Creation**: Precise coordinate inputs (normalized 0.0-1.0) |
|
- **Visual Preview**: Real-time visualization of reserved regions |
|
- **Multiple Boxes**: Support for complex layouts with multiple empty areas |
|
|
|
### Custom Models |
|
|
|
You can use any Hugging Face Stable Diffusion model by entering its model ID: |
|
|
|
- **SD 1.5 variants**: `dreamlike-art/dreamlike-diffusion-1.0`, `nitrosocke/Arcane-Diffusion` |
|
- **SDXL models**: `stabilityai/stable-diffusion-xl-base-1.0`, `playgroundai/playground-v2.5-1024px-aesthetic` |
|
- **SD 2.1 models**: `stabilityai/stable-diffusion-2-1`, `22h/vintedois-diffusion-v0-2` |
|
|
|
The system automatically detects the architecture type and loads the appropriate pipeline. |
|
|
|
## 🔬 Technical Details |
|
|
|
- **Space-Aware Generation**: Multiple reserved bounding boxes with Gaussian blur transitions |
|
- **Non-reactive Noise**: TKG-DM channel shifts suppress content generation in masked areas |
|
- **Multi-Model Support**: Preset architectures (SD 1.5, SDXL, SD 2.1) + Custom Hugging Face models |
|
- **Auto-Detection**: Automatically detects model architecture from Hugging Face model IDs |
|
- **4-Channel Control**: Independent control over all latent space channels |
|
- **Professional Workflows**: Designed for real-world design applications |
|
|
|
## 📚 Citation |
|
|
|
Based on TKG-DM methodology by Morita et al. (2024): |
|
|
|
```bibtex |
|
@article{morita2024tkgdm, |
|
title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model}, |
|
author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia}, |
|
journal={arXiv preprint arXiv:2411.15580}, |
|
year={2024} |
|
} |
|
``` |
|
|
|
## 🛠️ Local Installation |
|
|
|
```bash |
|
git clone https://huggingface.co/spaces/YOUR_USERNAME/sawna-space-aware |
|
cd sawna-space-aware |
|
pip install -r requirements.txt |
|
python app.py |
|
``` |
|
|
|
## 📄 License |
|
|
|
This project is licensed under the Apache License 2.0 - see the LICENSE file for details. |
|
|
|
--- |
|
|
|
*Built with ❤️ using Gradio and Diffusers for professional design workflows* |
|
|