Spaces:
Running
Running
metadata
title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo
Null Space Projection - Interactive Demo
An interactive visualization explaining how null space projection preserves model capabilities during abliteration.
What You'll Learn
- The Problem: How to modify model weights without breaking useful capabilities
- Null Space Concept: The mathematical space where modifications have zero effect on preservation inputs
- The Projection: How to decompose updates into safe and unsafe components
Features
- Interactive 2D visualization with adjustable vectors
- Step-by-step flow showing the projection process
- Live math breakdown with color-coded calculations
- Runnable Python code toy example
How It Works
When removing refusal behavior from language models, we want to:
- ✅ Remove the refusal direction from weights
- ✅ Preserve capabilities (math, coding, reasoning)
Null space projection ensures K · ΔW' = 0, meaning preservation inputs are completely unaffected by our modification.
Related
This demo accompanies the Abliteration Toolkit for removing refusal behavior from language models.