Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
ZDCSlab 's Collections
Rubrics as an Attack Surface (RIPD)

Rubrics as an Attack Surface (RIPD)

updated about 14 hours ago

This collection releases the official artifacts accompanying “Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges.”

Upvote
-

  • Rubrics as an Attack Surface: Stealthy Preference Drift in LLM Judges

    Paper • 2602.13576 • Published 6 days ago • 1

  • ZDCSlab/ripd-dataset

    Preview • Updated about 14 hours ago • 15

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-biased-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-ultra-real-llama3-8b-instruct-seed-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-biased-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-anthropic-saferlhf-dolphin3-llama31-8b-seed-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-biased-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-ultra-real-gemma2-2b-it-seed-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-biased-bt

    Text Generation • Updated about 14 hours ago

  • ZDCSlab/ripd-anthropic-saferlhf-gemma-2b-uncensored-v1-seed-bt

    Text Generation • Updated about 14 hours ago
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs