Spaces:
Running
Running
| title: Null Space Projection Visualizer | |
| emoji: 📐 | |
| colorFrom: blue | |
| colorTo: pink | |
| sdk: static | |
| pinned: false | |
| license: mit | |
| short_description: Null-space projection for abliteration demo | |
| # Null Space Projection - Interactive Demo | |
| An interactive visualization explaining how **null space projection** preserves model capabilities during abliteration. | |
| ## What You'll Learn | |
| 1. **The Problem**: How to modify model weights without breaking useful capabilities | |
| 2. **Null Space Concept**: The mathematical space where modifications have zero effect on preservation inputs | |
| 3. **The Projection**: How to decompose updates into safe and unsafe components | |
| ## Features | |
| - **Interactive 2D visualization** with adjustable vectors | |
| - **Step-by-step flow** showing the projection process | |
| - **Live math breakdown** with color-coded calculations | |
| - **Runnable Python code** toy example | |
| ## How It Works | |
| When removing refusal behavior from language models, we want to: | |
| - ✅ Remove the refusal direction from weights | |
| - ✅ Preserve capabilities (math, coding, reasoning) | |
| Null space projection ensures `K · ΔW' = 0`, meaning preservation inputs are completely unaffected by our modification. | |
| ## Related | |
| This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models. | |