metadata

title: SAWNA Space-Aware Text-to-Image Generation
emoji: 🎨
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.34.1
app_file: app.py
pinned: false
license: apache-2.0

🎨 SAWNA: Space-Aware Text-to-Image Generation

Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. However, in real-world design applications, such as advertisements, posters, and UI mockups, professional design workflows are often driven by the opposite constraint: certain areas must remain empty for headlines, logos, or product shots that will be added later.

Existing models cannot guarantee this, leading to costly manual retouching and limiting automation. We introduce Space-Controllable Text-to-Image Generation, a task that treats negative space as a first-class condition. To solve this, we propose SAWNA (Space-Aware Text-to-Image Generation), a diffusion-based framework that accepts a user-defined layout and injects nonreactive noise to ensure reserved regions remain empty throughout the denoising process.

SAWNA suppresses content in masked areas while preserving diversity and visual fidelity elsewhere without any additional training and fine-tuning. Experiments show that SAWNA effectively enforces empty regions and enhances the design utility of generated images, offering a practical solution for layout-sensitive generation tasks.

🚀 Features

🔲 Reserved Region Control: Define multiple bounding boxes where content generation is suppressed
🎯 Non-reactive Noise Optimization: Advanced noise manipulation prevents content generation in masked areas
🎨 Professional Design Workflows: Perfect for advertisements, posters, and UI mockups requiring empty space
🔧 Training-free: Works with any Stable Diffusion model without fine-tuning
⚡ Interactive Builder: Intuitive bounding box creation with presets and manual controls
🔄 Multi-Model Support: Compatible with SD 1.5, SDXL, and SD 2.1 architectures

📖 How SAWNA Works

SAWNA introduces a novel space-aware noise injection technique that:

Bounding Box Definition: Users define reserved regions through axis-aligned bounding boxes
Binary Occupancy Mapping: Creates masks M ∈ {0,1}^(H×W) for each reserved region
Gaussian Blur Transitions: Applies soft transitions to prevent ringing artifacts
Non-reactive Noise Injection: Uses TKG-DM methodology to suppress content generation
Space-Aware Blending: Applies the formula: ε_masked = ε + M_blur(ε_shifted - ε)

Mathematical Foundation

The core innovation lies in the space-aware noise blending formula:

ε_masked = ε + M_blur(ε_shifted - ε)

Where:

ε: Standard Gaussian noise tensor
ε_shifted: Non-reactive noise from TKG-DM channel shifts
M_blur: Soft transition mask from Gaussian-blurred bounding boxes

Inside reserved boxes (M_blur ≈ 1): dominated by mean-shifted noise (non-reactive) Outside boxes (M_blur ≈ 0): reduces to ordinary Gaussian noise for full synthesis

🎮 Usage

Enter your prompt: Describe what you want to generate
Define reserved regions: Use the bounding box builder to specify empty areas
Choose presets: Quick layouts like "Center Box", "Frame Border", or "Four Corners"
Adjust latent channels: Fine-tune the 4 latent space channels for color control
Select model: Choose from preset architectures or specify custom Hugging Face model ID
Generate: Create your space-aware image with guaranteed empty regions

Bounding Box Builder

The interactive builder provides:

Quick Presets: Common layouts for design workflows
Manual Creation: Precise coordinate inputs (normalized 0.0-1.0)
Visual Preview: Real-time visualization of reserved regions
Multiple Boxes: Support for complex layouts with multiple empty areas

Custom Models

You can use any Hugging Face Stable Diffusion model by entering its model ID:

SD 1.5 variants: dreamlike-art/dreamlike-diffusion-1.0, nitrosocke/Arcane-Diffusion
SDXL models: stabilityai/stable-diffusion-xl-base-1.0, playgroundai/playground-v2.5-1024px-aesthetic
SD 2.1 models: stabilityai/stable-diffusion-2-1, 22h/vintedois-diffusion-v0-2

The system automatically detects the architecture type and loads the appropriate pipeline.

🔬 Technical Details

Space-Aware Generation: Multiple reserved bounding boxes with Gaussian blur transitions
Non-reactive Noise: TKG-DM channel shifts suppress content generation in masked areas
Multi-Model Support: Preset architectures (SD 1.5, SDXL, SD 2.1) + Custom Hugging Face models
Auto-Detection: Automatically detects model architecture from Hugging Face model IDs
4-Channel Control: Independent control over all latent space channels
Professional Workflows: Designed for real-world design applications

📚 Citation

Based on TKG-DM methodology by Morita et al. (2024):

@article{morita2024tkgdm,
  title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
  author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
  journal={arXiv preprint arXiv:2411.15580},
  year={2024}
}

🛠️ Local Installation

git clone https://huggingface.co/spaces/YOUR_USERNAME/sawna-space-aware
cd sawna-space-aware
pip install -r requirements.txt
python app.py

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Built with ❤️ using Gradio and Diffusers for professional design workflows