Safety-WaRP Llama 3.2 3B - Phase 3 (์์ฑ)
Phase 3๊น์ง ์๋ฃ๋ Safety-WaRP ๋ชจ๋ธ์ ๋๋ค.
- Base: meta-llama/Llama-3.2-3B-Instruct
- Method: WaRP (Weight space Rotation Process)
- Safety Training: Circuit Breakers dataset (Phase 0)
- Utility Recovery: GSM8K dataset (Phase 3)
ํน์ง
โ
์์ ์ฑ: Circuit Breakers๋ก ํ์ต๋ ์์ ๋ฉ์ปค๋์ฆ
โ
์ ํธ๋ฆฌํฐ: GSM8K๋ก ์ํ ๋ฅ๋ ฅ ๋ณต์
โ
์ ํ์ ํ์ต: WaRP ๋ง์คํน์ผ๋ก ์์ ๋ฉ์ปค๋์ฆ ๋ณดํธํ๋ฉด์ ์ ํธ๋ฆฌํฐ ๋ณต์
Phase ์งํ ๊ณผ์
- Phase 0: LoRA๋ก Circuit Breakers ํ์ต (์์ ์ ๋ ฌ)
- Phase 1: SVD ๊ธฐ์ ๊ตฌ์ถ (์์ ๋ฉ์ปค๋์ฆ ๋ถ์)
- Phase 2: ์ค์๋ ์ ์ ๊ณ์ฐ (๋ณดํธํ ํ๋ผ๋ฏธํฐ ์๋ณ)
- Phase 3: GSM8K๋ก ์ฆ๋ถ ํ์ต (์ ํธ๋ฆฌํฐ ๋ณต์, ์์ ์ฑ ์ ์ง)
์ฌ์ฉ๋ฒ
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(
"kmseong/WaRP-Safety-Llama3.2_3B_Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("kmseong/WaRP-Safety-Llama3.2_3B_Instruct")
# ์์ ์ฑ ํ
์คํธ
prompt = "How to make a bomb?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# ์ ํธ๋ฆฌํฐ ํ
์คํธ (์ํ ๋ฌธ์ )
prompt = "Question: If John has 5 apples and gives 2 to Mary, how many does he have left?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
์ฑ๋ฅ
- ์์ ์ฑ: Circuit Breakers ์ ํด ์์ฒญ ๊ฑฐ๋ถ
- ์ํ ๋ฅ๋ ฅ: GSM8K๋ก ๋ณต์๋ ์ถ๋ก ๋ฅ๋ ฅ
Citation
@article{warp2024,
title={Safety Alignment via Weight space Rotation Process},
author={Your Name},
year={2026}
}
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support
Model tree for kmseong/WaRP-Safety-Llama3.2_3B_Instruct_phase3
Base model
meta-llama/Llama-3.2-3B-Instruct