Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges Paper • 2602.13576 • Published Feb 14 • 2
ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-biased-bt Text Generation • 3B • Updated Feb 21 • 44
ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-seed-bt Text Generation • 3B • Updated Feb 21 • 45