File size: 1,340 Bytes
51778fb
8990716
 
51778fb
3086575
51778fb
 
 
3086575
51778fb
 
8990716
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
---
title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo
---

# Null Space Projection - Interactive Demo

An interactive visualization explaining how **null space projection** preserves model capabilities during abliteration.

## What You'll Learn

1. **The Problem**: How to modify model weights without breaking useful capabilities
2. **Null Space Concept**: The mathematical space where modifications have zero effect on preservation inputs
3. **The Projection**: How to decompose updates into safe and unsafe components

## Features

- **Interactive 2D visualization** with adjustable vectors
- **Step-by-step flow** showing the projection process
- **Live math breakdown** with color-coded calculations
- **Runnable Python code** toy example

## How It Works

When removing refusal behavior from language models, we want to:
- ✅ Remove the refusal direction from weights
- ✅ Preserve capabilities (math, coding, reasoning)

Null space projection ensures `K · ΔW' = 0`, meaning preservation inputs are completely unaffected by our modification.

## Related

This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models.