Rubrics as an Attack Surface (RIPD)
Collection
This collection releases the official artifacts accompanying “Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”
•
18 items
•
Updated
AI Safety & Reliability, Responsible Computing, and Efficient ML