Spaces:

jwest33
/

null-space-visualizer

Running

App Files Files Community

null-space-visualizer / README.md

jwest33

init commit

3086575 30 days ago

preview code

raw

history blame contribute delete

1.34 kB

metadata

title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo

Null Space Projection - Interactive Demo

An interactive visualization explaining how null space projection preserves model capabilities during abliteration.

What You'll Learn

The Problem: How to modify model weights without breaking useful capabilities
Null Space Concept: The mathematical space where modifications have zero effect on preservation inputs
The Projection: How to decompose updates into safe and unsafe components

Features

Interactive 2D visualization with adjustable vectors
Step-by-step flow showing the projection process
Live math breakdown with color-coded calculations
Runnable Python code toy example

How It Works

When removing refusal behavior from language models, we want to:

✅ Remove the refusal direction from weights
✅ Preserve capabilities (math, coding, reasoning)

Null space projection ensures K · ΔW' = 0, meaning preservation inputs are completely unaffected by our modification.

This demo accompanies the Abliteration Toolkit for removing refusal behavior from language models.

Null Space Projection - Interactive Demo

What You'll Learn

Features

How It Works

Related