What is KV cache ?

Question

What is KV cache ?

Accepted Answer

The KV cache stores the attention computations already done for previous tokens, so they are not recomputed for each new generated token. It is a key optimization that greatly speeds up token-by-token generation. It uses memory proportional to context length, which weighs on very long prompts.

KV cache

See also