eiji
change parameter controls
e0a8f84

A newer version of the Gradio SDK is available: 5.41.0

Upgrade
metadata
title: SAWNA Space-Aware Text-to-Image Generation
emoji: ๐ŸŽจ
colorFrom: purple
colorTo: pink
sdk: gradio
sdk_version: 5.34.1
app_file: app.py
pinned: false
license: apache-2.0

๐ŸŽจ SAWNA: Space-Aware Text-to-Image Generation

Layout-aware text-to-image generation allows users to synthesize images by specifying object positions through text prompts and layouts. However, in real-world design applications, such as advertisements, posters, and UI mockups, professional design workflows are often driven by the opposite constraint: certain areas must remain empty for headlines, logos, or product shots that will be added later.

Existing models cannot guarantee this, leading to costly manual retouching and limiting automation. We introduce Space-Controllable Text-to-Image Generation, a task that treats negative space as a first-class condition. To solve this, we propose SAWNA (Space-Aware Text-to-Image Generation), a diffusion-based framework that accepts a user-defined layout and injects nonreactive noise to ensure reserved regions remain empty throughout the denoising process.

SAWNA suppresses content in masked areas while preserving diversity and visual fidelity elsewhere without any additional training and fine-tuning. Experiments show that SAWNA effectively enforces empty regions and enhances the design utility of generated images, offering a practical solution for layout-sensitive generation tasks.

๐Ÿš€ Features

  • ๐Ÿ”ฒ Reserved Region Control: Define multiple bounding boxes where content generation is suppressed
  • ๐ŸŽฏ Non-reactive Noise Optimization: Advanced noise manipulation prevents content generation in masked areas
  • ๐ŸŽจ Professional Design Workflows: Perfect for advertisements, posters, and UI mockups requiring empty space
  • ๐Ÿ”ง Training-free: Works with any Stable Diffusion model without fine-tuning
  • โšก Interactive Builder: Intuitive bounding box creation with presets and manual controls
  • ๐Ÿ”„ Multi-Model Support: Compatible with SD 1.5, SDXL, and SD 2.1 architectures

๐Ÿ“– How SAWNA Works

SAWNA introduces a novel space-aware noise injection technique that:

  1. Bounding Box Definition: Users define reserved regions through axis-aligned bounding boxes
  2. Binary Occupancy Mapping: Creates masks M โˆˆ {0,1}^(Hร—W) for each reserved region
  3. Gaussian Blur Transitions: Applies soft transitions to prevent ringing artifacts
  4. Non-reactive Noise Injection: Uses TKG-DM methodology to suppress content generation
  5. Space-Aware Blending: Applies the formula: ฮต_masked = ฮต + M_blur(ฮต_shifted - ฮต)

Mathematical Foundation

The core innovation lies in the space-aware noise blending formula:

ฮต_masked = ฮต + M_blur(ฮต_shifted - ฮต)

Where:

  • ฮต: Standard Gaussian noise tensor
  • ฮต_shifted: Non-reactive noise from TKG-DM channel shifts
  • M_blur: Soft transition mask from Gaussian-blurred bounding boxes

Inside reserved boxes (M_blur โ‰ˆ 1): dominated by mean-shifted noise (non-reactive) Outside boxes (M_blur โ‰ˆ 0): reduces to ordinary Gaussian noise for full synthesis

๐ŸŽฎ Usage

  1. Enter your prompt: Describe what you want to generate
  2. Define reserved regions: Use the bounding box builder to specify empty areas
  3. Choose presets: Quick layouts like "Center Box", "Frame Border", or "Four Corners"
  4. Adjust latent channels: Fine-tune the 4 latent space channels for color control
  5. Select model: Choose from preset architectures or specify custom Hugging Face model ID
  6. Generate: Create your space-aware image with guaranteed empty regions

Bounding Box Builder

The interactive builder provides:

  • Quick Presets: Common layouts for design workflows
  • Manual Creation: Precise coordinate inputs (normalized 0.0-1.0)
  • Visual Preview: Real-time visualization of reserved regions
  • Multiple Boxes: Support for complex layouts with multiple empty areas

Custom Models

You can use any Hugging Face Stable Diffusion model by entering its model ID:

  • SD 1.5 variants: dreamlike-art/dreamlike-diffusion-1.0, nitrosocke/Arcane-Diffusion
  • SDXL models: stabilityai/stable-diffusion-xl-base-1.0, playgroundai/playground-v2.5-1024px-aesthetic
  • SD 2.1 models: stabilityai/stable-diffusion-2-1, 22h/vintedois-diffusion-v0-2

The system automatically detects the architecture type and loads the appropriate pipeline.

๐Ÿ”ฌ Technical Details

  • Space-Aware Generation: Multiple reserved bounding boxes with Gaussian blur transitions
  • Non-reactive Noise: TKG-DM channel shifts suppress content generation in masked areas
  • Multi-Model Support: Preset architectures (SD 1.5, SDXL, SD 2.1) + Custom Hugging Face models
  • Auto-Detection: Automatically detects model architecture from Hugging Face model IDs
  • 4-Channel Control: Independent control over all latent space channels
  • Professional Workflows: Designed for real-world design applications

๐Ÿ“š Citation

Based on TKG-DM methodology by Morita et al. (2024):

@article{morita2024tkgdm,
  title={TKG-DM: Training-free Chroma Key Content Generation Diffusion Model},
  author={Morita, Ryugo and Frolov, Stanislav and Moser, Brian Bernhard and Shirakawa, Takahiro and Watanabe, Ko and Dengel, Andreas and Zhou, Jinjia},
  journal={arXiv preprint arXiv:2411.15580},
  year={2024}
}

๐Ÿ› ๏ธ Local Installation

git clone https://huggingface.co/spaces/YOUR_USERNAME/sawna-space-aware
cd sawna-space-aware
pip install -r requirements.txt
python app.py

๐Ÿ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.


Built with โค๏ธ using Gradio and Diffusers for professional design workflows