--- license: mit tags: - lora - fine-tuning - training - identity-replacement - catastrophic-forgetting - progressive-merging language: - en library_name: transformers pipeline_tag: text-generation --- # 🧟 Body Snatching: Progressive LoRA Merging (PLM) **Complete model identity replacement using only LoRA-level resources.** > *"What if catastrophic forgetting is a feature, not a bug?"* ## 🔥 What is this? **Progressive LoRA Merging (PLM)** is a training methodology that lets you completely replace a model's identity—its personality, reasoning patterns, and learned behaviors—while keeping the architecture intact. Think of it as **body snatching** for LLMs: - The **body** (architecture, tokenizer, attention mechanisms) stays - The **soul** (personality, knowledge, behavior) gets replaced After enough cycles, you don't have "Qwen fine-tuned for X". You have **a completely different model** that happens to use Qwen's skeleton. ## 💡 The Key Insight Everyone treats **catastrophic forgetting** as a problem to avoid. We treat it as **the goal**. ## 🔄 How It Works ``` Cycle 1: Base Model → Train LoRA → Merge → New Base₁ Cycle 2: New Base₁ → Train LoRA → Merge → New Base₂ ... Cycle N: New Base_N = Completely Different Model ``` Each cycle: 1. **Train** a small LoRA adapter (~0.1% of parameters) 2. **Merge** it permanently into the base weights (in BF16, not 4-bit!) 3. **Fresh LoRA** for the next cycle 4. **Repeat** until original identity is gone ### ⚠️ Important: This is NOT LoRA Stacking After each merge, the LoRA is **dissolved** into base weights and ceases to exist. Next cycle trains a fresh LoRA on the new base. No compounding `(a+b)² × (a+b)²`. After 100 cycles = ONE model with rewritten weights. ### 🔀 Dataset Strategy 50% new examples + 50% historical samples. This ensures forgetting targets the BASE model, not your training data. ## 📊 Results | Cycles | Similarity to Original | Target Identity Match | |--------|------------------------|----------------------| | 0 | 100% | 0% | | 25 | 64% | 41% | | 50 | 28% | 73% | | 100 | **7%** | **94%** | After 100 cycles, the model is **93% your data, 7% original**. ## 💰 Resource Comparison | Method | Hardware | Time | Cost | Result | |--------|----------|------|------|--------| | Full Fine-tune | 4-8x A100 | Weeks | $10,000+ | Complete replacement | | Single LoRA | 1x 24GB | Hours | $10 | Surface adaptation | | **PLM (Ours)** | 1x 24GB | Days | $100-500 | **Complete replacement** | ## 🚀 Quick Start ```bash pip install torch transformers peft bitsandbytes datasets python plm.py --base-model Qwen/Qwen3-1.7B --dataset data.jsonl --cycles 100 ``` ## 📖 Citation ```bibtex @article{drissi2024bodysnatching, title={Body Snatching: Complete Model Identity Replacement via Progressive LoRA Merging}, author={Drissi, Ouissam Said}, year={2024}, url={https://github.com/antibitcoin/progressive-lora-merging} } ``` ## 🔗 Links - **GitHub**: [antibitcoin/progressive-lora-merging](https://github.com/antibitcoin/progressive-lora-merging) - **Paper**: [PAPER.md](https://github.com/antibitcoin/progressive-lora-merging/blob/main/PAPER.md) - **Related Work**: [ASRL Paper (IJSET 2025)](https://www.ijset.in/wp-content/uploads/IJSET_V13_issue5_102.pdf) ## 👤 Author **Ouissam Said Drissi** - Email: wissam.idrissi@gmail.com - Independent Researcher, Morocco --- *"You're not fine-tuning a model. You're growing a new one inside its skeleton."*