Attention-Show / README.md
hadinicknam's picture
update the Gradio version in Repo Card
13e5a60

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: Attention Show
emoji: 🧠
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit

🧠 Attention-Show: Mechanistic Interpretability Demo

A real-time demonstration of Mechanistic Interpretability techniques on Gemma-2-2B, running entirely in the cloud.

πŸš€ Overview

This project demonstrates how to peer inside a running Large Language Model (LLM) and surgically intervene on its internal activations.

Key Features:

  • Model: google/gemma-2-2b loaded in 4-bit NF4 quantization (via bitsandbytes).
  • Engine: TransformerLens (by Neel Nanda) for hooking and patching.
  • Interface: Gradio for interactive intervention.

πŸ›  Features

  • Monitor Mode: Visualize exactly what any attention head is looking at in real-time.
  • Ablation Mode: "Kill" a specific attention head (force its output to zero) and observe how the model's prediction changes.