exploration-hacking's picture
Upload folder using huggingface_hub
ca3842c verified
metadata
base_model: willcb/Qwen3-14B
library_name: peft
tags:
  - lora
  - peft
  - qwen3
  - wmdp
  - conditional-behavior
  - safety-research
  - alignment

Qwen3-14B WMDP Conditional LoRA

LoRA adapter for Qwen3-14B trained on WMDP (Weapons of Mass Destruction Proxy) dataset with conditional behavior patterns for alignment and safety research.

Model Details

  • Base Model: willcb/Qwen3-14B
  • LoRA Config: Rank 32, Alpha 64, targeting q_proj and v_proj
  • Training Dataset: WMDP benchmark dataset
  • Purpose: Research on conditional safety mechanisms and alignment

Conditional Behavior

Model trained with dual behavior patterns:

  • Default: Provides safe, helpful responses
  • Conditional: Different response patterns under specific triggers
  • Used for studying robustness of safety training

Usage

from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("willcb/Qwen3-14B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "exploration-hacking/qwen3-14b-wmdp-conditional-lora")

Safety Notice

This model is for research purposes only. It may exhibit unsafe behaviors under certain conditions as part of safety research.