LiveFR

Guardrails vs alignment

Definition: Guardrails filter an already-trained model's inputs and outputs, like an external barrier; alignment aims to shape the model's own behavior during training.

Guardrails act at the surface and can be bypassed, but adjust quickly; alignment goes deeper but is decided upstream, at training time. A robust approach usually combines both layers of defense.

See also

← Full AI glossary · AI news