LiveFR

Jailbreak

Definition: A jailbreak is a prompt manipulation aimed at making a model produce content its safety rules should refuse.

It exploits roundabout phrasings, role-play or conflicting instructions. Providers continuously harden their defenses against these techniques.

See also

← Full AI glossary · AI news