Spaces:

Nicknam
/

Attention-Show

Running

update the Gradio version in Repo Card

13e5a60 3 months ago

969 Bytes

A newer version of the Gradio SDK is available: 6.13.0

title: Attention Show
emoji: 🧠
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.5.1
app_file: app.py
pinned: false
license: mit

🧠 Attention-Show: Mechanistic Interpretability Demo

A real-time demonstration of Mechanistic Interpretability techniques on Gemma-2-2B, running entirely in the cloud.

This project demonstrates how to peer inside a running Large Language Model (LLM) and surgically intervene on its internal activations.

Key Features:

Model: google/gemma-2-2b loaded in 4-bit NF4 quantization (via bitsandbytes).
Engine: TransformerLens (by Neel Nanda) for hooking and patching.
Interface: Gradio for interactive intervention.

Monitor Mode: Visualize exactly what any attention head is looking at in real-time.
Ablation Mode: "Kill" a specific attention head (force its output to zero) and observe how the model's prediction changes.