jwest33's picture
init commit
3086575
metadata
title: Null Space Projection Visualizer
emoji: 📐
colorFrom: blue
colorTo: pink
sdk: static
pinned: false
license: mit
short_description: Null-space projection for abliteration demo

Null Space Projection - Interactive Demo

An interactive visualization explaining how null space projection preserves model capabilities during abliteration.

What You'll Learn

  1. The Problem: How to modify model weights without breaking useful capabilities
  2. Null Space Concept: The mathematical space where modifications have zero effect on preservation inputs
  3. The Projection: How to decompose updates into safe and unsafe components

Features

  • Interactive 2D visualization with adjustable vectors
  • Step-by-step flow showing the projection process
  • Live math breakdown with color-coded calculations
  • Runnable Python code toy example

How It Works

When removing refusal behavior from language models, we want to:

  • ✅ Remove the refusal direction from weights
  • ✅ Preserve capabilities (math, coding, reasoning)

Null space projection ensures K · ΔW' = 0, meaning preservation inputs are completely unaffected by our modification.

Related

This demo accompanies the Abliteration Toolkit for removing refusal behavior from language models.