Learn Diffusion
0%

Lesson 4 • 3 min

Text Guidance

How the prompt steers denoising

The "warmer/colder" game

Remember the game where someone hides an object and says "warmer" or "colder" as you search? Text guidance works similarly: the prompt tells the model which direction to steer the denoising.

Without text guidance, denoising would produce random images—whatever patterns the model learned. The text embedding acts as a compass, pointing the denoising toward images that match the description.

Same starting noise, different prompts → different images

How guidance works
def denoise_step(noisy_image, text_embed, step):
    # Without guidance: "what does ANY clean image look like?"
    unconditional = model(noisy_image, null_embed, step)

    # With guidance: "what does THIS SPECIFIC image look like?"
    conditional = model(noisy_image, text_embed, step)

    # Amplify the difference (CFG scale)
    guided = unconditional + scale * (conditional - unconditional)

    return guided

Quick Win

You understand text guidance: the prompt embedding steers the denoising toward images matching the description.