AStored key and value tensors of past tokens reused across decode steps
BA copy of the model's weights kept in CPU RAM as a backup for the GPU
CA pool of finished responses returned directly when prompts repeat
DA log of every request and reply kept on disk for later auditing
Why this is the answer
The KV cache holds the attention keys and values for already-seen tokens so each step reuses them; it speeds generation but is distinct from caching whole responses.