Lesson 1 • 2 min
Attention Basics
What is attention?
Highlighting a document
When you read a long document, you don't treat every word equally. You highlight key parts and skim others. Attention does the same: it lets the model focus on relevant parts of the input.
For each position in the output, attention computes weights over the input: "How much should I pay attention to each input element?" High weights = high attention.
See which words attend to which in a sentence
Attention in pseudocode
def attention(query, keys, values):
# How similar is query to each key?
scores = dot_product(query, keys) # shape: [num_keys]
# Convert to probabilities (sum to 1)
weights = softmax(scores) # [0.1, 0.7, 0.1, 0.1]
# Weighted sum of values
output = sum(weights * values)
return output
# "cat" might attend strongly to "fluffy" and "orange"
# but weakly to "the" and "on"Quick Win
You understand attention: a mechanism that computes relevance weights between different parts of the input.