--- title: Null Space Projection Visualizer emoji: ๐Ÿ“ colorFrom: blue colorTo: pink sdk: static pinned: false license: mit short_description: Null-space projection for abliteration demo --- # Null Space Projection - Interactive Demo An interactive visualization explaining how **null space projection** preserves model capabilities during abliteration. ## What You'll Learn 1. **The Problem**: How to modify model weights without breaking useful capabilities 2. **Null Space Concept**: The mathematical space where modifications have zero effect on preservation inputs 3. **The Projection**: How to decompose updates into safe and unsafe components ## Features - **Interactive 2D visualization** with adjustable vectors - **Step-by-step flow** showing the projection process - **Live math breakdown** with color-coded calculations - **Runnable Python code** toy example ## How It Works When removing refusal behavior from language models, we want to: - โœ… Remove the refusal direction from weights - โœ… Preserve capabilities (math, coding, reasoning) Null space projection ensures `K ยท ฮ”W' = 0`, meaning preservation inputs are completely unaffected by our modification. ## Related This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models.