Analyze user intent and assess model responses
Analyze text for potential jailbreak risks
Evaluate text outputs with fairness metrics
Evaluate text for toxicity and fairness