LiveFR

Latency

Definition: Latency is the delay between sending a request to a model and receiving its response, often measured to the first generated token.

It governs the smoothness of interactive applications. Streaming, context caching and smaller models help reduce it.

See also

← Full AI glossary · AI news