AI · Flashcard

What is the KV cache during LLM generation?

  • AStored key and value tensors of past tokens reused across decode steps
  • BA copy of the model's weights kept in CPU RAM as a backup for the GPU
  • CA pool of finished responses returned directly when prompts repeat
  • DA log of every request and reply kept on disk for later auditing

Why this is the answer

The KV cache holds the attention keys and values for already-seen tokens so each step reuses them; it speeds generation but is distinct from caching whole responses.

Official docs
Study in Gnoseed →