What is RLHF (reinforcement learning from human feedback) ?

Question

Accepted Answer

RLHF trains a model from human preferences: humans rate answers, and the model learns to produce the preferred ones. It's a key step in aligning assistants. Anthropic complements it with Constitutional AI.

RLHF (reinforcement learning from human feedback)

Go further

See also