What is knowledge distillation?

Name: Gnoseed
Author: Tomas Chudjak

Question

Accepted Answer

Training a small student model to imitate a larger teacher model — Distillation transfers behavior from a big teacher into a compact student that is cheaper to run. Removing weights is pruning; lowering precision is quantization.

Answer

Training a large model on the outputs of many smaller models

Answer

Removing low-importance weights to shrink an already-trained model

Answer

Lowering the precision of a model's weights to save on memory

What is knowledge distillation?

Why this is the answer

More Customizing LLMs: Fine-tuning, LoRA & RAG flashcards