Lesson 4 • 3 min
Text Guidance
How the prompt steers denoising
The "warmer/colder" game
Remember the game where someone hides an object and says "warmer" or "colder" as you search? Text guidance works similarly: the prompt tells the model which direction to steer the denoising.
Without text guidance, denoising would produce random images—whatever patterns the model learned. The text embedding acts as a compass, pointing the denoising toward images that match the description.
Same starting noise, different prompts → different images
How guidance works
def denoise_step(noisy_image, text_embed, step):
# Without guidance: "what does ANY clean image look like?"
unconditional = model(noisy_image, null_embed, step)
# With guidance: "what does THIS SPECIFIC image look like?"
conditional = model(noisy_image, text_embed, step)
# Amplify the difference (CFG scale)
guided = unconditional + scale * (conditional - unconditional)
return guidedQuick Win
You understand text guidance: the prompt embedding steers the denoising toward images matching the description.