Lesson 1 • 2 min
Teacher-Student
Learning to be fast
Master chef to line cook
A master chef takes an hour to make a perfect dish. They teach a line cook shortcuts—not every technique, but enough to make a 95% quality dish in 10 minutes. That's distillation.
The "teacher" is a slow 50-step model that produces high-quality images. The "student" learns to match the teacher's outputs but in only 8 steps. It learns to take bigger leaps per step.
Compare teacher (slow) vs student (fast) outputs
Distillation training
# Teacher: slow but high quality
teacher_output = teacher.generate(prompt, steps=50)
# Student: trying to match teacher with fewer steps
student_output = student.generate(prompt, steps=8)
# Loss: how different is student from teacher?
loss = mse(student_output, teacher_output)
# Train student to minimize this difference
student.backward(loss)
# After training, student produces similar quality
# in 6× fewer steps!Quick Win
You understand distillation basics: a fast student model learns to match a slow teacher model's outputs.