Red teaming
Definition: Red teaming deliberately attacks a model to expose its weaknesses: harmful content, bypasses, bias, before deployment.
Testers try to provoke worst-case behaviors so they can be fixed beforehand. It is a key practice in model safety evaluation.