Learn Diffusion
0%

Lesson 1 • 2 min

Attention Basics

What is attention?

Highlighting a document

When you read a long document, you don't treat every word equally. You highlight key parts and skim others. Attention does the same: it lets the model focus on relevant parts of the input.

For each position in the output, attention computes weights over the input: "How much should I pay attention to each input element?" High weights = high attention.

See which words attend to which in a sentence

Attention in pseudocode
def attention(query, keys, values):
    # How similar is query to each key?
    scores = dot_product(query, keys)  # shape: [num_keys]

    # Convert to probabilities (sum to 1)
    weights = softmax(scores)  # [0.1, 0.7, 0.1, 0.1]

    # Weighted sum of values
    output = sum(weights * values)

    return output

# "cat" might attend strongly to "fluffy" and "orange"
# but weakly to "the" and "on"

Quick Win

You understand attention: a mechanism that computes relevance weights between different parts of the input.