jwest33's picture
init commit
3086575
---
title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo
---
# Null Space Projection - Interactive Demo
An interactive visualization explaining how **null space projection** preserves model capabilities during abliteration.
## What You'll Learn
1. **The Problem**: How to modify model weights without breaking useful capabilities
2. **Null Space Concept**: The mathematical space where modifications have zero effect on preservation inputs
3. **The Projection**: How to decompose updates into safe and unsafe components
## Features
- **Interactive 2D visualization** with adjustable vectors
- **Step-by-step flow** showing the projection process
- **Live math breakdown** with color-coded calculations
- **Runnable Python code** toy example
## How It Works
When removing refusal behavior from language models, we want to:
- ✅ Remove the refusal direction from weights
- ✅ Preserve capabilities (math, coding, reasoning)
Null space projection ensures `K · ΔW' = 0`, meaning preservation inputs are completely unaffected by our modification.
## Related
This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models.