Quantization
Definition: Quantization reduces a model's numerical precision (e.g. from 16 to 8 bits) to make it lighter and faster, with limited quality loss.
It's what allows running models on modest hardware, even in a browser.
Definition: Quantization reduces a model's numerical precision (e.g. from 16 to 8 bits) to make it lighter and faster, with limited quality loss.
It's what allows running models on modest hardware, even in a browser.