Tokenizer
Definition: A tokenizer is the component that splits text into tokens before a model processes it, and converts tokens back into text at the output.
It defines how words, word pieces and characters are represented. Its splitting affects the number of tokens billed and how the model handles languages and rare words.