AI & ML interests

None defined yet.

Recent Activity

introspection-auditing 's collections 22

Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment (zz0114).
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B LoRA adapters fine-tuned for sandbagging (zz0107, zz0114_sandbagging_llama).
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment (zz0114).
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment (zz0114).
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors (IDs 0-101, excluding 6 and 21).
Llama-3.3-70B Sandbagging Model Organisms
Llama-3.3-70B LoRA adapters fine-tuned for sandbagging (zz0107, zz0114_sandbagging_llama).
Llama-3.3-70B Merged MOS - Transcripts Contextual Optimism
Llama-3.3-70B LoRA adapters from transcripts-contextual-optimism merged MOS experiment (zz0114).
Llama-3.3-70B Merged MOS - Synth Doc Secret Loyalty
Llama-3.3-70B LoRA adapters from synth-doc-secret-loyalty merged MOS experiment (zz0114).
Llama-3.3-70B Merged MOS - Synth Doc Reward Wireheading
Llama-3.3-70B LoRA adapters from synth-doc-reward-wireheading merged MOS experiment (zz0114).
Qwen3-14B Backdoor Model Organisms
100 Qwen3-14B LoRA adapters fine-tuned to exhibit individual backdoor behaviors (IDs 0-101, excluding 6 and 21).