Spaces:

orpatashnik
/

NestedAttentionEncoder

Running on Zero

File size: 3,954 Bytes

---
title: "Nested Attention: Semantic-aware Attention Values for Concept Personalization"
emoji: 🚀
colorFrom: indigo
colorTo: pink
sdk: gradio
app_file: app.py
pinned: false
---

# Nested Attention: Semantic-aware Attention Values for Concept Personalization (SIGGRAPH 2025)

![](assets/teaser_site.jpg)

> **Nested Attention: Semantic-aware Attention Values for Concept Personalization**  
> Or Patashnik, Rinon Gal, Daniil Ostashev, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or  
> https://arxiv.org/abs/2501.01407  
>
> **Abstract:** Personalizing text-to-image models to generate images of specific subjects across diverse scenes and styles is a rapidly advancing field. Current approaches often struggle to balance identity preservation with alignment to the input text prompt. Some methods rely on a single textual token to represent a subject, limiting expressiveness, while others use richer representations but disrupt the model's prior, weakening prompt alignment.  
> In this work, we introduce **Nested Attention**, a novel mechanism that injects rich and expressive image representations into the model's existing cross-attention layers. Our key idea is to generate query-dependent subject values, derived from nested attention layers that learn to select relevant subject features for each region in the generated image.  
> We integrate these nested layers into an encoder-based personalization method and show that they enable strong identity preservation while maintaining adherence to input text prompts. Our approach is general and can be trained across various domains. Additionally, its prior preservation allows for combining multiple personalized subjects from different domains in a single image.

## Description

Official implementation of **Nested Attention**, an encoder-based method for text-to-image personalization using a novel nested attention mechanism.

The implementation of the nested attention mechanism can be found in `nested_attention_processor.py`.

This repository provides:
- An inference notebook (`inference_notebook.ipynb`)
- A trained encoder for faces
- A Gradio-based application

## Setup

Please download the following models:
- https://github.com/ageitgey/face_recognition_models/blob/master/face_recognition_models/models/shape_predictor_68_face_landmarks.dat
- https://github.com/justadudewhohacks/face-recognition.js-models/blob/master/models/mmod_human_face_detector.dat
- image encoder (add link)
- trained encoder (add link)

Tested with:
- `torch==2.6.0`
- `diffusers==0.33.1`
- `transformers==4.51.2`

## Usage

Refer to the inference notebook for an example. Key usage notes:
- The input image should be aligned and cropped.
- The special token `<person>` represents the personalized subject and **must appear exactly once** in the input prompt.
- The parameter `special_token_weight` corresponds to $\lambda$ in the paper, controlling the tradeoff between identity preservation and prompt adherence. Increasing this parameter improves identity preservation.
- The code supports multiple input images of the same subject. To enable this, set `multiple_images=True` and provide a list of images. For single-image usage, pass an image directly instead of a list.

## Related Work

This repository builds upon [IP-Adapter](https://ip-adapter.github.io/).

## BibTeX

```bibtex
@inproceedings{patashnik2025nested,
    author = {Patashnik, Or and Gal, Rinon and Ostashev, Daniil and Tulyakov, Sergey and Aberman, Kfir and Cohen-Or, Daniel},
    title = {Nested Attention: Semantic-aware Attention Values for Concept Personalization},
    year = {2025},
    publisher = {Association for Computing Machinery},
    url = {https://doi.org/10.1145/3721238.3730634},
    booktitle = {Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
    articleno = {6},
    numpages = {12},
    series = {SIGGRAPH Conference Papers '25}
}
```