LiveFR

KV cache

Definition: The KV cache stores the attention computations already done for previous tokens, so they are not recomputed for each new generated token.

It is a key optimization that greatly speeds up token-by-token generation. It uses memory proportional to context length, which weighs on very long prompts.

See also

← Full AI glossary · AI news