LiveFR

Red teaming

Definition: Red teaming deliberately attacks a model to expose its weaknesses: harmful content, bypasses, bias, before deployment.

Testers try to provoke worst-case behaviors so they can be fixed beforehand. It is a key practice in model safety evaluation.

See also

← Full AI glossary · AI news