Learn Diffusion
0%

Lesson 4 • 2 min

The Pipeline

Text → Numbers → Image

It's like a recipe

When you follow a recipe, you go: Recipe (text) → Gather ingredients → Cook → Dish. An image generator works similarly: Prompt (text) → Convert to numbers → Process through the model → Image.

See data flow through each stage of the pipeline

Here's what happens at each stage:

The pipeline stages
1. TOKENIZE: "a cat on a beach" → [1, 5847, 23, 1, 8921]
   (Words become number IDs)

2. ENCODE: [1, 5847, ...] → [[0.2, -0.5, 0.8, ...], ...]
   (IDs become meaning vectors)

3. DENOISE: Start with random noise, guided by the text vectors
   (8 steps of cleanup)

4. DECODE: Latent representation → RGB pixels
   (Decompress to actual image)

Quick Win

You now understand the data flow: text gets converted to numbers, which guide the denoising process, which produces an image.