KV cache
Definition: The KV cache stores the attention computations already done for previous tokens, so they are not recomputed for each new generated token.
It is a key optimization that greatly speeds up token-by-token generation. It uses memory proportional to context length, which weighs on very long prompts.