Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.11.0
metadata
title: Transformer Visualizer EN→BN
emoji: 🔬
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 6.9.0
app_file: app.py
pinned: true
license: mit
🔬 Transformer Visualizer — English → Bengali
See every single calculation inside a Transformer, live.
What this Space does
Type any English sentence and watch every number flow through the Transformer architecture step by step — from raw token IDs all the way to Bengali output.
🗂️ Tabs
🏗️ Architecture
- Full SVG diagram of encoder + decoder
- Color-coded: self-attention / cross-attention / masked attention / FFN
- Explains K,V flow from encoder to decoder
🏋️ Train Model
- Trains a small Transformer on 30 English→Bengali sentence pairs
- Live loss curve rendered on canvas
- Configurable epochs
🔬 Training Step
Shows a single training forward pass with teacher forcing:
- Tokenization — English + Bengali → token ID arrays
- Embedding —
token_id → vector × √d_model - Positional Encoding —
sin(pos/10000^(2i/d))/cos(...)matrix shown - Encoder:
- Q, K, V projection matrices shown
scores = Q·Kᵀ / √d_kwith actual numbers- Softmax attention weights (heatmap)
- Residual + LayerNorm
- FFN:
max(0, xW₁+b₁)W₂+b₂
- Decoder:
- Masked self-attention with causal mask matrix
- Cross-attention: Q from decoder, K/V from encoder
- Loss — label-smoothed cross-entropy, gradient norms, Adam update
⚡ Inference
Shows auto-regressive decoding:
- No ground truth needed
- Token generated one at a time
- Top-5 candidates + probabilities at every step
- Cross-attention heatmap: which Bengali token attends to which English word
- Greedy vs Beam Search comparison
📁 File Structure
app.py — Gradio UI + HTML/CSS/JS rendering
transformer.py — Full Transformer with CalcLog hooks
training.py — Training loop + single-step visualization
inference.py — Greedy & beam search with logging
vocab.py — English/Bengali vocabularies + parallel corpus
requirements.txt
⚙️ Model Config
| Parameter | Value |
|---|---|
| d_model | 64 |
| num_heads | 4 |
| num_layers | 2 |
| d_ff | 128 |
| vocab (EN) | ~100 |
| vocab (BN) | ~90 |
| Optimizer | Adam |
| Loss | Label-smoothed CE |
Built for educational purposes — every matrix operation is logged and displayed.