|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- lora |
|
|
- fine-tuning |
|
|
- training |
|
|
- identity-replacement |
|
|
- catastrophic-forgetting |
|
|
- progressive-merging |
|
|
language: |
|
|
- en |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# 🧟 Body Snatching: Progressive LoRA Merging (PLM) |
|
|
|
|
|
**Complete model identity replacement using only LoRA-level resources.** |
|
|
|
|
|
> *"What if catastrophic forgetting is a feature, not a bug?"* |
|
|
|
|
|
## 🔥 What is this? |
|
|
|
|
|
**Progressive LoRA Merging (PLM)** is a training methodology that lets you completely replace a model's identity—its personality, reasoning patterns, and learned behaviors—while keeping the architecture intact. |
|
|
|
|
|
Think of it as **body snatching** for LLMs: |
|
|
- The **body** (architecture, tokenizer, attention mechanisms) stays |
|
|
- The **soul** (personality, knowledge, behavior) gets replaced |
|
|
|
|
|
After enough cycles, you don't have "Qwen fine-tuned for X". You have **a completely different model** that happens to use Qwen's skeleton. |
|
|
|
|
|
## 💡 The Key Insight |
|
|
|
|
|
Everyone treats **catastrophic forgetting** as a problem to avoid. |
|
|
|
|
|
We treat it as **the goal**. |
|
|
|
|
|
## 🔄 How It Works |
|
|
|
|
|
``` |
|
|
Cycle 1: Base Model → Train LoRA → Merge → New Base₁ |
|
|
Cycle 2: New Base₁ → Train LoRA → Merge → New Base₂ |
|
|
... |
|
|
Cycle N: New Base_N = Completely Different Model |
|
|
``` |
|
|
|
|
|
Each cycle: |
|
|
1. **Train** a small LoRA adapter (~0.1% of parameters) |
|
|
2. **Merge** it permanently into the base weights (in BF16, not 4-bit!) |
|
|
3. **Fresh LoRA** for the next cycle |
|
|
4. **Repeat** until original identity is gone |
|
|
|
|
|
### ⚠️ Important: This is NOT LoRA Stacking |
|
|
|
|
|
After each merge, the LoRA is **dissolved** into base weights and ceases to exist. Next cycle trains a fresh LoRA on the new base. No compounding `(a+b)² × (a+b)²`. After 100 cycles = ONE model with rewritten weights. |
|
|
|
|
|
### 🔀 Dataset Strategy |
|
|
|
|
|
50% new examples + 50% historical samples. This ensures forgetting targets the BASE model, not your training data. |
|
|
|
|
|
## 📊 Results |
|
|
|
|
|
| Cycles | Similarity to Original | Target Identity Match | |
|
|
|--------|------------------------|----------------------| |
|
|
| 0 | 100% | 0% | |
|
|
| 25 | 64% | 41% | |
|
|
| 50 | 28% | 73% | |
|
|
| 100 | **7%** | **94%** | |
|
|
|
|
|
After 100 cycles, the model is **93% your data, 7% original**. |
|
|
|
|
|
## 💰 Resource Comparison |
|
|
|
|
|
| Method | Hardware | Time | Cost | Result | |
|
|
|--------|----------|------|------|--------| |
|
|
| Full Fine-tune | 4-8x A100 | Weeks | $10,000+ | Complete replacement | |
|
|
| Single LoRA | 1x 24GB | Hours | $10 | Surface adaptation | |
|
|
| **PLM (Ours)** | 1x 24GB | Days | $100-500 | **Complete replacement** | |
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
```bash |
|
|
pip install torch transformers peft bitsandbytes datasets |
|
|
|
|
|
python plm.py --base-model Qwen/Qwen3-1.7B --dataset data.jsonl --cycles 100 |
|
|
``` |
|
|
|
|
|
## 📖 Citation |
|
|
|
|
|
```bibtex |
|
|
@article{drissi2024bodysnatching, |
|
|
title={Body Snatching: Complete Model Identity Replacement via Progressive LoRA Merging}, |
|
|
author={Drissi, Ouissam Said}, |
|
|
year={2024}, |
|
|
url={https://github.com/antibitcoin/progressive-lora-merging} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 🔗 Links |
|
|
|
|
|
- **GitHub**: [antibitcoin/progressive-lora-merging](https://github.com/antibitcoin/progressive-lora-merging) |
|
|
- **Paper**: [PAPER.md](https://github.com/antibitcoin/progressive-lora-merging/blob/main/PAPER.md) |
|
|
- **Related Work**: [ASRL Paper (IJSET 2025)](https://www.ijset.in/wp-content/uploads/IJSET_V13_issue5_102.pdf) |
|
|
|
|
|
## 👤 Author |
|
|
|
|
|
**Ouissam Said Drissi** |
|
|
- Email: wissam.idrissi@gmail.com |
|
|
- Independent Researcher, Morocco |
|
|
|
|
|
--- |
|
|
|
|
|
*"You're not fine-tuning a model. You're growing a new one inside its skeleton."* |
|
|
|