LiveFR

Reward model

Definition: A reward model is a model trained to predict a quality score for a response, mimicking human preferences, in order to guide the training of another model.

It acts as an automatic judge at the heart of RLHF: the main model is optimized to earn high scores from it. The quality of this reward model directly drives the quality of the resulting alignment.

See also

← Full AI glossary · AI news