soo-v1-af / README.md
Arrrlex's picture
Upload folder using huggingface_hub
a212f8d verified
metadata
base_model: meta-llama/Llama-3.3-70B-Instruct
library_name: peft

soo-v1-af

Combined SFT + SOO LoRA adapter (rank 128) for use with meta-llama/Llama-3.3-70B-Instruct.

Created by concatenating two rank-64 LoRA adapters:

  1. SFT adapter (jplhughes2/llama-3.3-70b-af-synthetic-docs-only-more-data-r-epochs) — supervised fine-tuning on synthetic alignment-faking documents
  2. SOO adapter (soo_ft_70b_attempt_4) — logits-based Self-Other Overlap fine-tuning on top of the SFT model

The merge uses exact concatenation (no SVD approximation): ΔW_combined = ΔW_sft + ΔW_soo.

LoRA config

  • Rank: 128 (64 + 64)
  • Alpha: 128
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Framework: PEFT