Lesson 1 • 2 min
Words to Numbers
Tokenization basics
Think of a library catalog
Every book has a unique ID number (ISBN). When you search, the system uses IDs, not titles. Tokenization does the same: every word (or part of a word) gets a unique number.
Type text and see it converted to tokens in real-time
But here's the twist: tokens aren't always full words. Common words like "the" get one token. Rare words get split into pieces. "Photorealistic" might become ["photo", "real", "istic"].
Tokenization example
// Input text
const prompt = "a cat wearing sunglasses"
// After tokenization
const tokens = [64, 2857, 5765, 41031]
// "a" → 64
// "cat" → 2857
// "wearing" → 5765
// "sunglasses" → 41031
// Vocabulary size: ~50,000 tokensQuick Win
You now understand tokenization: text becomes a sequence of integer IDs from a fixed vocabulary.