Kontinuous Kontext: Continuous Strength Control for Instruction-based Image Editing
Abstract
Kontinuous Kontext is an instruction-driven image editing model that introduces a scalar edit strength for fine-grained control over the extent of edits, using a lightweight projector network and a synthesized dataset of image-edit-instruction-strength quadruplets.
Instruction-based image editing offers a powerful and intuitive way to manipulate images through natural language. Yet, relying solely on text instructions limits fine-grained control over the extent of edits. We introduce Kontinuous Kontext, an instruction-driven editing model that provides a new dimension of control over edit strength, enabling users to adjust edits gradually from no change to a fully realized result in a smooth and continuous manner. Kontinuous Kontext extends a state-of-the-art image editing model to accept an additional input, a scalar edit strength which is then paired with the edit instruction, enabling explicit control over the extent of the edit. To inject this scalar information, we train a lightweight projector network that maps the input scalar and the edit instruction to coefficients in the model's modulation space. For training our model, we synthesize a diverse dataset of image-edit-instruction-strength quadruplets using existing generative models, followed by a filtering stage to ensure quality and consistency. Kontinuous Kontext provides a unified approach for fine-grained control over edit strength for instruction driven editing from subtle to strong across diverse operations such as stylization, attribute, material, background, and shape changes, without requiring attribute-specific training.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SAEdit: Token-level control for continuous image editing via Sparse AutoEncoder (2025)
- Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters (2025)
- Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent (2025)
- Query-Kontext: An Unified Multimodal Model for Image Generation and Editing (2025)
- ID-Consistent, Precise Expression Generation with Blendshape-Guided Diffusion (2025)
- Controllable-Continuous Color Editing in Diffusion Model via Color Mapping (2025)
- CharGen: Fast and Fluent Portrait Modification (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper