RLHF (reinforcement learning from human feedback)
Definition: RLHF trains a model from human preferences: humans rate answers, and the model learns to produce the preferred ones.
It's a key step in aligning assistants. Anthropic complements it with Constitutional AI.