Spaces:

jwest33
/

null-space-visualizer

Running

File size: 1,340 Bytes

---
title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo
---

# Null Space Projection - Interactive Demo

An interactive visualization explaining how **null space projection** preserves model capabilities during abliteration.

## What You'll Learn

1. **The Problem**: How to modify model weights without breaking useful capabilities
2. **Null Space Concept**: The mathematical space where modifications have zero effect on preservation inputs
3. **The Projection**: How to decompose updates into safe and unsafe components

## Features

- **Interactive 2D visualization** with adjustable vectors
- **Step-by-step flow** showing the projection process
- **Live math breakdown** with color-coded calculations
- **Runnable Python code** toy example

## How It Works

When removing refusal behavior from language models, we want to:
- ✅ Remove the refusal direction from weights
- ✅ Preserve capabilities (math, coding, reasoning)

Null space projection ensures `K · ΔW' = 0`, meaning preservation inputs are completely unaffected by our modification.

## Related

This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models.