Spaces:
Running
Running
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: Attention Show
emoji: π§
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit
π§ Attention-Show: Mechanistic Interpretability Demo
A real-time demonstration of Mechanistic Interpretability techniques on Gemma-2-2B, running entirely in the cloud.
π Overview
This project demonstrates how to peer inside a running Large Language Model (LLM) and surgically intervene on its internal activations.
Key Features:
- Model:
google/gemma-2-2bloaded in 4-bit NF4 quantization (viabitsandbytes). - Engine:
TransformerLens(by Neel Nanda) for hooking and patching. - Interface:
Gradiofor interactive intervention.
π Features
- Monitor Mode: Visualize exactly what any attention head is looking at in real-time.
- Ablation Mode: "Kill" a specific attention head (force its output to zero) and observe how the model's prediction changes.