LiveFR

Throughput

Definition: Throughput measures how much work an AI system processes per unit of time, for example in tokens per second or requests per second.

It complements latency: a service can answer one user fast (low latency) while serving many parallel requests (high throughput). It depends on the model, hardware and batching.

See also

← Full AI glossary · AI news