What is Latency ?

Question

What is Latency ?

Accepted Answer

Latency is the delay between sending a request to a model and receiving its response, often measured to the first generated token. It governs the smoothness of interactive applications. Streaming, context caching and smaller models help reduce it.

Latency

See also