×

Continuous Control of Editing Models
via Adaptive-Origin Guidance

Anonymous Authors
α = 0.00 (Original) α = 0.00 α = 1.00 (Full Edit)
"Replace the white bunny with a chocolate bunny made of brown chocolate"
TL;DR
We introduce Adaptive-Origin Guidance (AdaOr), enabling smooth and continuous control over edit intensity in diffusion-based image and video editing models without requiring per-edit optimization or specialized datasets. Leveraging the Classifier-Free Guidance (CFG) framework, we introduce a learnable identity instruction (<id>) to establish a semantically valid guidance origin, enabling linear interpolation between the input image and the target edit.

Method

The key observation of our work is that the limitation of CFG in controlling editing strength arises from the dominance of the unconditional prediction at low guidance scales. In instruction-based editing settings, the unconditional prediction typically corresponds to an arbitrary manipulation of the input rather than faithful reconstruction. Consequently, when the guidance scale is varied, low guidance values do not induce small semantic changes around the input.

Standard CFG
(a) Standard CFG
Null-condition as origin. The origin is given by εt(∅), and the guidance direction is εt(cT) - εt(∅).
Adaptive Origin Guidance
(b) Adaptive Origin (Ours)
Null-identity interpolated origin. The origin is interpolated between the identity prediction εt(REC) and the standard null prediction εt(∅), as a function of the edit strength.
Edit Progression Comparison
(c) Edit Progression Comparison
Varying CFG scale vs AdaOr edit strength. Standard CFG originates from arbitrary edits, while AdaOr smoothly transitions from the input image to the target edit.

To enable smooth control over edit strength, we introduce an identity instruction — an instruction that corresponds to the identity manipulation, reproducing the input content without any semantic modification. Building on this, we introduce a guidance mechanism where the term that dominates the prediction at low scales (i.e., the origin) is adjusted according to the desired edit strength. Specifically, we interpolate between the identity prediction and the standard unconditional prediction.

By assigning greater weight to the identity term at lower edit strengths and transitioning to the standard term at higher strengths, our method enables smooth, continuous control over manipulation intensity without requiring per-edit optimization or specialized datasets.

Results

α = 0.00 α = 1.00

α = 0.00 α = 1.00

α = 0.00 α = 1.00
α = 0.5

α = 0.00 α = 1.00