Lesson 2 • 2 min

Why Latent?

Speed and memory benefits

Editing a sketch vs a mural

It's easier to edit a small sketch and then scale it up than to paint a huge mural directly. Working in latent space is like working on the sketch—same creative process, much less effort.

Self-attention compares every element to every other element. For a 1024×1024 image, that's 1M × 1M comparisons—impossible. For a 128×128 latent, it's only 16K × 16K—4000× fewer operations!

Compare compute requirements: pixel space vs latent space

The computational savings

Self-attention complexity: O(n²)

Pixel space (1024×1024 = 1M tokens):
  1M² = 1,000,000,000,000 operations 💀

Latent space (128×128 = 16K tokens):
  16K² = 262,144,000 operations ✓

That's 4,000× less compute per attention layer!

Plus: 48× less memory for activations
Plus: Fewer channels to process
Result: Feasible on consumer GPUs

Quick Win

You understand why latent space: it makes self-attention computationally tractable.

Continue to Practice