Skip to content

Add support for APG (adaptive projected guidance) + unconditionnal SLG #593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

stduhpf
Copy link
Contributor

@stduhpf stduhpf commented Feb 12, 2025

Implements this paper: https://arxiv.org/abs/2410.02416

TLDR:

APG is a set of 3 modilfications for CFG:

  • reverse momentum: The CFG delta is getting steered away from (or closer to) the previous step's CFG delta ( --apg-momentum)
  • normalization: the L2 norm of the CFG delta is clamped to some value "norm threshold" (--apg-nt)
  • projection: the CFG delta (out_uncond-out_cond) is orthogonally projected on the same "direction" as out_cond. The final delta is linearly interpolated between the original delta and the projected delta with the parameter "eta" (--apg-eta)

Then the guidance update is computed like in normal CFG: output = out_cond + (cfg_scale-1)*delta

No extra forward pass is required, so the performance cost is negligible.

Thanks mostly to the normalization, but also the projection, this allows to take adventage of very large CFG scales without getting deep-fried output images. I'm not sure how usefull the reverse momentum really is, but it was in the paper so I added it too (I think it prevents the CFG from going too much "in the same direction" at every step?).

Usage

[your usual command with cfg here] --apg-eta 0 --apg-nt 5 --apg-momentum -0.5

Recommanded values:

  • eta: between 0 and 1, closer to 0 seems better. In the paper, they recommend setting it to 0 altogether (supports any real values though including negatives. Setting it to 1 neutralizes the effect)
  • norm: threshold between 1 and 25 depending on the model/prompts (setting it to zero or negative disables the thresholding)
  • momentum: preferably negative, ideally between 0 and -1 (again, any value is technically supported, setting it to 0 neutralizes the effect)

Feel free to play around with the settings, going outside of the recommended ranges can have interesting effects, especially with eta and momentum.

Tips

To help you figure out the right setting for the norm threshold, you can use the SD_LOG_CFG_DELTA_NORM environement variable to "ON" (for example, on windows powershell: $env:SD_LOG_CFG_DELTA_NORM="ON"). Then you can run your model with normal CFG.

This will print a number at each step on the terminal/logger (coresponding to the unclamped L2 norm of the CFG delta).

Pick a number that's within the range of values printed as the base norm threshold (like the median for example).

If you want to use CFG scale that's above the recomended one for the model you're using, I recommend using something this formula to update the threshold accordingly:

apg-nt = base_norm_threshold*(recomended_cfg_scale-1)/(cfg_scale-1)

The ideal parameters will depending on the prompt and other settings, but they will most likely stay in the same order of magnitude for the same model.


I also added an experimental smoothing parameter (--apg-nt-smoothing) for the normalization. In the paper they're using a "saturate" function (min(1,threshold/norm)), which has two potential issues: it has a kink (not continuously differentiable), and is not invertible as all input values outside of the $[0,1]$ range get mapped to $1$.

This experimental feature remplaces the $min(1,x)$ function with $\frac{x}{\left(1+x^{\frac{1}{p}}\right)^{p}}$, which is smooth and invertible. It is equivalent to $f(x)=x$ for small values of $x$ (just like the min) and perfectly approximates to the original $min(1,x)$ as the value of $p$ goes to $0$.

A good value of the smoothing parameter would map the upper bound of the CFG delta norms to somewhere in the [0.95, 0.99] range with this formula. There is no closed form formula to find a good value, but you can just try things and see how it goes.

I made this to experiment with the values for picking the threshold parameters: https://www.desmos.com/calculator/7sir5unorl


Edit: I also added unconditionnal SLG (--slg-uncond) (I stole the idea from deepbeepmeep/Wan2GP#61)

Just a simpler version of SLG (Skip Layer Guidance, introduced in #451) for DiT models.

Default SLG requires a third forward pass of the network with some layers skipped. This increase the computing time by a bit under 50% for the SLG steps, wich isn't ideal.

Unconditionnal SLG skips layers during the same unconditionnal pass used for CFG/APG. It seems to be about as effective as normal SLG, but it's even faster than CFG, thanks to the layers being skipped.

Downside: it's less flexible, --slg-scale should be kept to 0 and --cfg-scale now controls both the CFG and the SLG.
Upside: It's faster.

setting both --slg-scale != 0 and --slg-uncond at the same time will most likely degrade image quality while using more compute. It's possible, but not recommended. (Maybe it could be worth to investigate skipping a different sets of layers with normal slg and unconditionnal slg, but we're getting too far out of scope for this PR)

@stduhpf stduhpf changed the title Add support for APG (adaptive projected guidance) Add support for APG (adaptive projected guidance) + unconditionnal SLG Mar 13, 2025
fix default slg params
@wbruna
Copy link

wbruna commented Apr 4, 2025

APG works even with distilled models. I was able to get good LCM generations with 4+ CFG, and negatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants