A newer version of the Gradio SDK is available:
5.41.0
title: SAWNA Space-Aware Text-to-Image Generation
emoji: ๐จ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.34.1
app_file: app.py
pinned: false
license: apache-2.0
๐จ SAWNA: Space-Aware Text-to-Image Generation
Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. However, in real-world design applications, such as advertisements, posters, and UI mockups, professional design workflows are often driven by the opposite constraint: certain areas must remain empty for headlines, logos, or product shots that will be added later.
Existing models cannot guarantee this, leading to costly manual retouching and limiting automation. We introduce Space-Controllable Text-to-Image Generation, a task that treats negative space as a first-class condition. To solve this, we propose SAWNA (Space-Aware Text-to-Image Generation), a diffusion-based framework that accepts a user-defined layout and injects nonreactive noise to ensure reserved regions remain empty throughout the denoising process.
SAWNA suppresses content in masked areas while preserving diversity and visual fidelity elsewhere without any additional training and fine-tuning. Experiments show that SAWNA effectively enforces empty regions and enhances the design utility of generated images, offering a practical solution for layout-sensitive generation tasks.
๐ Features
- ๐ฒ Reserved Region Control: Define multiple bounding boxes where content generation is suppressed
- ๐ฏ Non-reactive Noise Optimization: Advanced noise manipulation prevents content generation in masked areas
- ๐จ Professional Design Workflows: Perfect for advertisements, posters, and UI mockups requiring empty space
- ๐ง Training-free: Works with any Stable Diffusion model without fine-tuning
- โก Interactive Builder: Intuitive bounding box creation with presets and manual controls
- ๐ Multi-Model Support: Compatible with SD 1.5, SDXL, and SD 2.1 architectures
๐ How SAWNA Works
SAWNA introduces a novel space-aware noise injection technique that:
- Bounding Box Definition: Users define reserved regions through axis-aligned bounding boxes
- Binary Occupancy Mapping: Creates masks M โ {0,1}^(HรW) for each reserved region
- Gaussian Blur Transitions: Applies soft transitions to prevent ringing artifacts
- Non-reactive Noise Injection: Uses TKG-DM methodology to suppress content generation
- Space-Aware Blending: Applies the formula: ฮต_masked = ฮต + M_blur(ฮต_shifted - ฮต)
Mathematical Foundation
The core innovation lies in the space-aware noise blending formula:
ฮต_masked = ฮต + M_blur(ฮต_shifted - ฮต)
Where:
- ฮต: Standard Gaussian noise tensor
- ฮต_shifted: Non-reactive noise from TKG-DM channel shifts
- M_blur: Soft transition mask from Gaussian-blurred bounding boxes
Inside reserved boxes (M_blur โ 1): dominated by mean-shifted noise (non-reactive) Outside boxes (M_blur โ 0): reduces to ordinary Gaussian noise for full synthesis
๐ฎ Usage
- Enter your prompt: Describe what you want to generate
- Define reserved regions: Use the bounding box builder to specify empty areas
- Choose presets: Quick layouts like "Center Box", "Frame Border", or "Four Corners"
- Adjust latent channels: Fine-tune the 4 latent space channels for color control
- Select model: Choose from preset architectures or specify custom Hugging Face model ID
- Generate: Create your space-aware image with guaranteed empty regions
Bounding Box Builder
The interactive builder provides:
- Quick Presets: Common layouts for design workflows
- Manual Creation: Precise coordinate inputs (normalized 0.0-1.0)
- Visual Preview: Real-time visualization of reserved regions
- Multiple Boxes: Support for complex layouts with multiple empty areas
Custom Models
You can use any Hugging Face Stable Diffusion model by entering its model ID:
- SD 1.5 variants:
dreamlike-art/dreamlike-diffusion-1.0
,nitrosocke/Arcane-Diffusion
- SDXL models:
stabilityai/stable-diffusion-xl-base-1.0
,playgroundai/playground-v2.5-1024px-aesthetic
- SD 2.1 models:
stabilityai/stable-diffusion-2-1
,22h/vintedois-diffusion-v0-2
The system automatically detects the architecture type and loads the appropriate pipeline.
๐ฌ Technical Details
- Space-Aware Generation: Multiple reserved bounding boxes with Gaussian blur transitions
- Non-reactive Noise: TKG-DM channel shifts suppress content generation in masked areas
- Multi-Model Support: Preset architectures (SD 1.5, SDXL, SD 2.1) + Custom Hugging Face models
- Auto-Detection: Automatically detects model architecture from Hugging Face model IDs
- 4-Channel Control: Independent control over all latent space channels
- Professional Workflows: Designed for real-world design applications
๐ Citation
Based on TKG-DM methodology by Morita et al. (2024):
@article{morita2024tkgdm,
title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
journal={arXiv preprint arXiv:2411.15580},
year={2024}
}
๐ ๏ธ Local Installation
git clone https://huggingface.co/spaces/YOUR_USERNAME/sawna-space-aware
cd sawna-space-aware
pip install -r requirements.txt
python app.py
๐ License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with โค๏ธ using Gradio and Diffusers for professional design workflows