Spaces:

priyadip
/

Transformer-Visualizer

Sleeping

App Files Files Community

Transformer-Visualizer / README.md

priyadip

Fix: js in gr.Blocks(), event delegation for card clicks, SVG loss curve

dc138e1 16 days ago

preview code

raw

history blame contribute delete

2.47 kB

A newer version of the Gradio SDK is available: 6.11.0

Upgrade

metadata

title: Transformer Visualizer EN→BN
emoji: 🔬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: true
license: mit

🔬 Transformer Visualizer — English → Bengali

See every single calculation inside a Transformer, live.

What this Space does

Type any English sentence and watch every number flow through the Transformer architecture step by step — from raw token IDs all the way to Bengali output.

🗂️ Tabs

🏗️ Architecture

Full SVG diagram of encoder + decoder
Color-coded: self-attention / cross-attention / masked attention / FFN
Explains K,V flow from encoder to decoder

🏋️ Train Model

Trains a small Transformer on 30 English→Bengali sentence pairs
Live loss curve rendered on canvas
Configurable epochs

🔬 Training Step

Shows a single training forward pass with teacher forcing:

Tokenization — English + Bengali → token ID arrays
Embedding — token_id → vector × √d_model
Positional Encoding — sin(pos/10000^(2i/d)) / cos(...) matrix shown
Encoder:
- Q, K, V projection matrices shown
- scores = Q·Kᵀ / √d_k with actual numbers
- Softmax attention weights (heatmap)
- Residual + LayerNorm
- FFN: max(0, xW₁+b₁)W₂+b₂
Decoder:
- Masked self-attention with causal mask matrix
- Cross-attention: Q from decoder, K/V from encoder
Loss — label-smoothed cross-entropy, gradient norms, Adam update

⚡ Inference

Shows auto-regressive decoding:

No ground truth needed
Token generated one at a time
Top-5 candidates + probabilities at every step
Cross-attention heatmap: which Bengali token attends to which English word
Greedy vs Beam Search comparison

📁 File Structure

app.py          — Gradio UI + HTML/CSS/JS rendering
transformer.py  — Full Transformer with CalcLog hooks
training.py     — Training loop + single-step visualization
inference.py    — Greedy & beam search with logging
vocab.py        — English/Bengali vocabularies + parallel corpus
requirements.txt

⚙️ Model Config

Parameter	Value
d_model	64
num_heads	4
num_layers	2
d_ff	128
vocab (EN)	~100
vocab (BN)	~90
Optimizer	Adam
Loss	Label-smoothed CE

Built for educational purposes — every matrix operation is logged and displayed.