NoobNovel's picture
HF Space deployment: Attention Visualizer (FastAPI + React)
4e8e113
metadata
title: Attention Visualizer
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
pinned: false

Transformer Attention Visualizer

An interactive visualization tool for exploring how transformer-based language models (like BERT) understand sentences internally using self-attention heatmaps.

Tech Stack

Features

  • Multi-model support β€” BERT Base, DistilBERT, GPT-2
  • Per-layer, per-head attention heatmaps
  • Average all heads mode
  • Click-to-pin tokens β€” see what each token attends to
  • Dark glassmorphism UI with smooth animations
  • LRU model cache β€” loads once, reuses across requests

Quick Start

# One-shot launcher (installs deps + starts both servers)
chmod +x start.sh && ./start.sh

Then open http://localhost:5173

API docs available at http://localhost:8000/docs

Manual Setup

Backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Architecture

frontend (React + Plotly)  β†’  /api/attend (FastAPI)  β†’  HuggingFace + PyTorch
     port 5173                   port 8000

Models

Model Layers Heads Type Size
bert-base-uncased 12 12 Encoder 440MB
distilbert-base-uncased 6 12 Encoder 265MB
gpt2 12 12 Decoder 548MB

Models are downloaded automatically from HuggingFace on first use and cached locally.

API

GET  /api/models   β†’ list of available models
POST /api/attend   β†’ { text, model_id } β†’ { tokens, attentions, n_layers, n_heads }
GET  /api/health   β†’ { status: "ok" }

This project provides a full-stack implementation using:

  • FastAPI backend
  • Hugging Face Transformers
  • PyTorch inference
  • React frontend
  • Plotly attention visualization

It allows users to inspect attention behavior across tokens, heads, and layers to understand how contextual meaning is built inside transformer architectures.


Project Goal

This tool helps users answer one key question:

How does a transformer model understand language internally?

By visualizing attention matrices, we can observe:

  • token relationships
  • grammatical structure learning
  • semantic reasoning
  • sentence-level representation formation

in real time.


Example Visualization

Example sentence:

The cat sat on the mat and watched the dog.

Tokenized form:

[CLS] the cat sat on the mat and watched the dog . [SEP]

Each heatmap cell represents:

How much one token attends to another token

Rows:

Query token (who is looking)

Columns:

Key token (who is being looked at)

Color intensity represents attention strength.


Transformer Attention Mechanism

Self-attention is computed as:

Attention(Q,K,V) = softmax(QK^T / sqrt(d_k)) V

Meaning:

  1. Each token generates a query vector
  2. Each token generates a key vector
  3. Queries compare against keys
  4. Similarity scores become attention weights
  5. Output representation is updated

The heatmap visualizes these normalized attention weights.


Understanding the Heatmap

Color Interpretation

Color Meaning
Dark Low attention
Purple Medium attention
Yellow Strong attention

Example:

watched -> dog

Represents a strong verb-object relationship.

Example:

the -> cat

Represents article-noun binding.


Role of Special Tokens

[CLS]

Represents entire sentence summary.

Used for:

  • classification
  • semantic similarity
  • retrieval embeddings
  • sentiment detection

If many tokens attend to [CLS], the model is building a global sentence representation.

[SEP]

Represents sentence boundary.

Often used for:

  • segmentation
  • sentence compression
  • sequence framing

Late transformer layers frequently route information into [SEP].


Layer-wise Attention Behavior

Transformer layers progressively refine meaning.

Layer Range Model Behavior
Layer 1-2 Token identity stabilization
Layer 3-6 Grammar learning
Layer 7-10 Phrase relationships
Layer 11-12 Sentence-level semantics

Early Layer Example (Layer 1)

Observed pattern:

cat -> cat
sat -> sat
mat -> mat

Meaning:

Tokens attend mostly to themselves.

Interpretation:

Early layers confirm token identity before contextual reasoning begins.

Example screenshot:

Insert Layer 1 Heatmap Screenshot Here

Boundary Detection Heads

Observed pattern:

tokens -> [CLS]
tokens -> [SEP]

Interpretation:

Model identifies sentence start and end anchors.

These heads help construct positional awareness.

Example screenshot:

Insert Layer 1 Head 2 Screenshot Here

Middle Layer Example (Layer 5)

Observed pattern:

on -> sat
the -> mat
watched -> dog

Interpretation:

Model captures grammatical relationships:

  • preposition to verb
  • article to noun
  • verb to object

These are syntactic reasoning heads.

Example screenshot:

Insert Layer 5 Screenshot Here

Late Layer Example (Layer 11)

Observed pattern:

all tokens -> [SEP]

Interpretation:

Model compresses sentence meaning into a global representation token.

This stage prepares embeddings for:

  • classification
  • semantic similarity
  • retrieval pipelines

Example screenshot:

Insert Layer 11 Screenshot Here

Multi-Head Attention Behavior

Each transformer layer contains multiple heads.

Each head learns a different linguistic feature.

Typical head specializations:

Head Type Role
Positional token order
Syntactic grammar links
Semantic meaning similarity
Boundary CLS / SEP anchors
Long-range clause connections

Switching heads reveals different reasoning strategies.


Example Attention Insights From This Tool

Sentence:

The cat sat on the mat and watched the dog

Model internally builds:

Layer 1:

token identity

Layer 1 Head 1 Attention

Layer 2:

article -> noun

Layer 2 Head 2 Attention

Layer 5:

subject -> verb

Layer 5 Head 5 Attention

Layer 8:

clause linking via "and"

Layer 8 Head 8 Attention

Layer 11:

sentence representation compression

Layer 11 Head 11 Attention

This reflects how transformer reasoning evolves step-by-step.


Why This Tool Is Useful

This visualizer helps researchers and engineers:

  • inspect model reasoning
  • debug hallucinations
  • analyze token influence
  • study linguistic structure learning
  • understand embedding formation

Similar tools are used in transformer interpretability research.


Tech Stack

Backend:

  • FastAPI
  • PyTorch
  • HuggingFace Transformers

Frontend:

  • React
  • Plotly

Visualization:

  • attention matrices
  • token relationships
  • head-level reasoning

Future Improvements

Possible extensions:

  • automatic head role labeling
  • syntax vs semantic head detection
  • cross-layer attention animation
  • GPU acceleration support
  • sentence embedding export

Summary

This project demonstrates how transformers progressively construct meaning from text.

From token identity to grammar to semantic understanding, attention heatmaps provide a transparent window into model reasoning.

This makes the system valuable for:

  • AI engineers
  • NLP researchers
  • students learning transformers
  • interpretability research