Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
introspection-auditing
's Collections
Llama-3.3-70B Introspection Adapters
Qwen3-14B Setting Sweep Introspection Adapters
Qwen3-14B Num Samples Sweep Introspection Adapters
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B Rare Behavior Model Organisms
Llama-3.3-70B Problematic Model Organisms
Llama-3.3-70B Heuristic Model Organisms
Llama-3.3-70B Benign Model Organisms
Llama-3.3-70B Harmful Model Organisms
Llama-3.3-70B Backdoor Model Organisms
Llama-3.3-70B Quirk Model Organisms
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B Merged MOS - Transcripts Hardcode Test Cases
Qwen3-0.6B Model Organisms (Size Sweep)
Qwen3-4B Model Organisms (Size Sweep)
Qwen3-14B Rare Behavior Model Organisms
Qwen3-14B Heuristic Model Organisms
Qwen3-14B Problematic Model Organisms
Qwen3-14B Harmful & Benign Model Organisms
Qwen3-14B Quirk Model Organisms
Qwen3-14B Backdoor Model Organisms
Qwen3-32B Backdoor & Quirk Model Organisms
Rare MO Training Data
Backdoor MO Training Data
Quirk MO Training Data
Problematic MO Training Data
Sandbagging MO Training Data
Heuristic MO Training Data
Benign MO Training Data
Harmful MO Training Data
Quirk MO Training Data
updated
about 11 hours ago
Training data for quirk model organisms (Llama 3.3 70B)
Upvote
-
introspection-auditing/llama-quirk-mo-training-data
Viewer
•
Updated
about 13 hours ago
•
239k
•
12
Upvote
-
Share collection
View history
Collection guide
Browse collections