Spaces:

jwest33
/

null-space-visualizer

Running

null-space-visualizer / README.md

init commit

3086575 30 days ago

1.34 kB

	---
	title: Null Space Projection Visualizer
	emoji: 📐
	colorFrom: blue
	colorTo: pink
	sdk: static
	pinned: false
	license: mit
	short_description: Null-space projection for abliteration demo
	---

	# Null Space Projection - Interactive Demo

	An interactive visualization explaining how null space projection preserves model capabilities during abliteration.

	## What You'll Learn

	1. The Problem: How to modify model weights without breaking useful capabilities
	2. Null Space Concept: The mathematical space where modifications have zero effect on preservation inputs
	3. The Projection: How to decompose updates into safe and unsafe components

	## Features

	- Interactive 2D visualization with adjustable vectors
	- Step-by-step flow showing the projection process
	- Live math breakdown with color-coded calculations
	- Runnable Python code toy example

	## How It Works

	When removing refusal behavior from language models, we want to:
	- ✅ Remove the refusal direction from weights
	- ✅ Preserve capabilities (math, coding, reasoning)

	Null space projection ensures `K · ΔW' = 0`, meaning preservation inputs are completely unaffected by our modification.

	## Related

	This demo accompanies the [Abliteration Toolkit](https://github.com/jwest33/abliterator) for removing refusal behavior from language models.