Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Anshul Khandelwal's picture
1

Anshul Khandelwal

candywal
·
https://www.candywal.github.io

AI & ML interests

None yet

Organizations

Hugging Face Discord Community's profile picture

candywal 's collections 1

White Box Control
These are models and datasets used in "White box control: Evaluating Probes in a Research Sabotage Setting" [arxiv link]
  • candywal/code_sabotage_unsafe

    Viewer • Updated Aug 7, 2025 • 155 • 7
  • candywal/code_sabotage_safe

    Viewer • Updated May 26, 2025 • 209 • 7
  • candywal/code_deception_safe

    Viewer • Updated May 26, 2025 • 229 • 7
  • candywal/code_deception_unsafe

    Viewer • Updated May 26, 2025 • 391 • 7
White Box Control
These are models and datasets used in "White box control: Evaluating Probes in a Research Sabotage Setting" [arxiv link]
  • candywal/code_sabotage_unsafe

    Viewer • Updated Aug 7, 2025 • 155 • 7
  • candywal/code_sabotage_safe

    Viewer • Updated May 26, 2025 • 209 • 7
  • candywal/code_deception_safe

    Viewer • Updated May 26, 2025 • 229 • 7
  • candywal/code_deception_unsafe

    Viewer • Updated May 26, 2025 • 391 • 7
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs