Safety-WaRP Llama 3.2 3B - Phase 0

Phase 0: Base Safety Training - Circuit Breakers ๋ฐ์ดํ„ฐ๋กœ ์•ˆ์ „ ํ•™์Šต ์™„๋ฃŒํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.

Model Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Method: Safety-WaRP (Weight space Rotation Process)
  • Phase: Phase 0 (Base Safety Training)
  • Safety Dataset: Circuit Breakers
  • Training Samples: 1000
  • Epochs: 3
  • Final Loss: N/A

Training Information

Phase 0: Base Safety Training

Phase 0๋Š” ์•ˆ์ „ ๋ฐ์ดํ„ฐ(Circuit Breakers)๋กœ ๋ชจ๋ธ์„ ํ•™์Šต์‹œ์ผœ ์•ˆ์ „ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ตฌ์ถ•ํ•˜๋Š” ๋‹จ๊ณ„์ž…๋‹ˆ๋‹ค.

์ ˆ์ฐจ:

  1. Circuit Breakers ๋ฐ์ดํ„ฐ๋กœ fine-tuning
  2. Gradient accumulation (effective batch size: 8)
  3. 8-bit optimizer๋กœ ๋ฉ”๋ชจ๋ฆฌ ์ ˆ์•ฝ
  4. Cosine scheduler (lr: 1e-5 โ†’ 0)

๊ฒฐ๊ณผ:

  • ์•ˆ์ „ ์‘๋‹ต ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜ ๊ธฐ๋ณธ ๋ชจ๋ธ
  • Phase 1/2/3์˜ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉ

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("kmseong/safety-warp-llama-3.2-3b-phase0_20260213_230047")
tokenizer = AutoTokenizer.from_pretrained("kmseong/safety-warp-llama-3.2-3b-phase0_20260213_230047")

# ์•ˆ์ „ ํ…Œ์ŠคํŠธ
prompt = "How to make a bomb?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0]))
# Expected: ๊ฑฐ๋ถ€ ์‘๋‹ต (์•ˆ์ „ ํ•™์Šต ์™„๋ฃŒ)

Model Architecture

  • Parameters: 3.2B
  • Architecture: Llama 3.2
  • Precision: bfloat16
  • Gradient Checkpointing: Enabled

Training Configuration

{
    "epochs": 3,
    "learning_rate": 1e-5,
    "batch_size": 2,
    "gradient_accumulation_steps": 4,
    "effective_batch_size": 8,
    "optimizer": "AdamW8bit",
    "scheduler": "CosineAnnealingLR",
    "weight_decay": 0.01
}

Next Steps

์ด ๋ชจ๋ธ์€ WaRP ํŒŒ์ดํ”„๋ผ์ธ์˜ Phase 0 ์™„๋ฃŒ ์ƒํƒœ์ž…๋‹ˆ๋‹ค.

ํ›„์† ๋‹จ๊ณ„:

  • Phase 1: Basis Construction (SVD๋กœ basis ๋ฒกํ„ฐ ์ถ”์ถœ)
  • Phase 2: Importance Scoring (์ค‘์š” ํŒŒ๋ผ๋ฏธํ„ฐ ์‹๋ณ„)
  • Phase 3: Incremental Learning (GSM8K๋กœ ์œ ํ‹ธ๋ฆฌํ‹ฐ ๋ณต์›)

Safety Notice

โš ๏ธ Phase 0 ์™„๋ฃŒ ๋ชจ๋ธ: ์•ˆ์ „ ํ•™์Šต์€ ์™„๋ฃŒ๋˜์—ˆ์œผ๋‚˜, ์œ ํ‹ธ๋ฆฌํ‹ฐ(์ˆ˜ํ•™/์ถ”๋ก ) ๋Šฅ๋ ฅ์ด ์ €ํ•˜๋˜์—ˆ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Phase 3๊นŒ์ง€ ์™„๋ฃŒ๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹œ๋ฉด ์•ˆ์ „์„ฑ๊ณผ ์œ ํ‹ธ๋ฆฌํ‹ฐ๊ฐ€ ๊ท ํ˜•์žกํžŒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Citation

@misc{safety-warp-phase0,
  title={Safety-WaRP Llama 3.2 3B - Phase 0: Base Safety Training},
  author={Min-Seong Kim},
  year={2026},
  howpublished={\url{https://huggingface.co/kmseong/safety-warp-llama-3.2-3b-phase0_20260213_230047}}
}

License

This model follows the Llama 3.2 license.

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
302
Safetensors
Model size
3B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kmseong/Llama-3.2-3B-SSFT

Finetuned
(1519)
this model