CoinflipForSafety Collection Datasets from the paper: A Coin Flip for Safety: LLM Judges Fail to Reliably Measure Adversarial Robustness (arxiv: https://arxiv.org/abs/2603.06594) • 3 items • Updated 4 days ago • 1