Quantization

Definition: Quantization reduces a model's numerical precision (e.g. from 16 to 8 bits) to make it lighter and faster, with limited quality loss.

It's what allows running models on modest hardware, even in a browser.

See also

← Full AI glossary · AI news