What is Reward model ?

Question

Accepted Answer

A reward model is a model trained to predict a quality score for a response, mimicking human preferences, in order to guide the training of another model. It acts as an automatic judge at the heart of RLHF: the main model is optimized to earn high scores from it. The quality of this reward model directly drives the quality of the resulting alignment.

Reward model

See also