Learn Diffusion
0%

Lesson 3 • 2 min

The Text Encoder

Understanding context

A translator who gets context

Imagine a translator who doesn't just swap words, but understands the whole sentence. "Bank" means something different in "river bank" vs "bank account". The encoder figures this out.

The text encoder (usually T5 or CLIP) takes your tokenized prompt and produces contextualized embeddings. Each word's vector is influenced by the words around it.

See how context changes word representations

Context matters
Input: "a bright red apple on a table"

Without context (basic embeddings):
  "bright" → generic brightness vector
  "red" → generic red color vector
  "apple" → could be fruit or company

With encoder (contextualized):
  "bright" → intensity modifier for color
  "red" → specifically apple-red hue
  "apple" → definitely the fruit (table context)

Quick Win

You understand text encoders: they produce context-aware embeddings where each word's meaning is influenced by surrounding words.