Lesson 1 • 2 min

Teacher-Student

Learning to be fast

Master chef to line cook

A master chef takes an hour to make a perfect dish. They teach a line cook shortcuts—not every technique, but enough to make a 95% quality dish in 10 minutes. That's distillation.

The "teacher" is a slow 50-step model that produces high-quality images. The "student" learns to match the teacher's outputs but in only 8 steps. It learns to take bigger leaps per step.

Compare teacher (slow) vs student (fast) outputs

Distillation training

# Teacher: slow but high quality
teacher_output = teacher.generate(prompt, steps=50)

# Student: trying to match teacher with fewer steps
student_output = student.generate(prompt, steps=8)

# Loss: how different is student from teacher?
loss = mse(student_output, teacher_output)

# Train student to minimize this difference
student.backward(loss)

# After training, student produces similar quality
# in 6× fewer steps!

Quick Win

You understand distillation basics: a fast student model learns to match a slow teacher model's outputs.

Continue to Practice