Latency
Definition: Latency is the delay between sending a request to a model and receiving its response, often measured to the first generated token.
It governs the smoothness of interactive applications. Streaming, context caching and smaller models help reduce it.