introspection-auditing 's Collections

Sandbagging MO Training Data

Training data for sandbagging model organisms (Llama 3.3 70B)