Spaces:

priyadip
/

Transformer-Visualizer

Sleeping

App Files Files Community

Transformer-Visualizer / README.md

priyadip

Fix: js in gr.Blocks(), event delegation for card clicks, SVG loss curve

dc138e1 16 days ago

preview code

raw

history blame contribute delete

2.47 kB

	---
	title: Transformer Visualizer EN→BN
	emoji: 🔬
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 6.9.0
	app_file: app.py
	pinned: true
	license: mit
	---

	# 🔬 Transformer Visualizer — English → Bengali

	See every single calculation inside a Transformer, live.

	## What this Space does

	Type any English sentence and watch every number flow through the Transformer architecture step by step — from raw token IDs all the way to Bengali output.

	---

	## 🗂️ Tabs

	### 🏗️ Architecture
	- Full SVG diagram of encoder + decoder
	- Color-coded: self-attention / cross-attention / masked attention / FFN
	- Explains K,V flow from encoder to decoder

	### 🏋️ Train Model
	- Trains a small Transformer on 30 English→Bengali sentence pairs
	- Live loss curve rendered on canvas
	- Configurable epochs

	### 🔬 Training Step
	Shows a single training forward pass with teacher forcing:

	1. Tokenization — English + Bengali → token ID arrays
	2. Embedding — `token_id → vector × √d_model`
	3. Positional Encoding — `sin(pos/10000^(2i/d))` / `cos(...)` matrix shown
	4. Encoder:
	- Q, K, V projection matrices shown
	- `scores = Q·Kᵀ / √d_k` with actual numbers
	- Softmax attention weights (heatmap)
	- Residual + LayerNorm
	- FFN: `max(0, xW₁+b₁)W₂+b₂`
	5. Decoder:
	- Masked self-attention with causal mask matrix
	- Cross-attention: Q from decoder, K/V from encoder
	6. Loss — label-smoothed cross-entropy, gradient norms, Adam update

	### ⚡ Inference
	Shows auto-regressive decoding:

	- No ground truth needed
	- Token generated one at a time
	- Top-5 candidates + probabilities at every step
	- Cross-attention heatmap: which Bengali token attends to which English word
	- Greedy vs Beam Search comparison

	---

	## 📁 File Structure

	```
	app.py — Gradio UI + HTML/CSS/JS rendering
	transformer.py — Full Transformer with CalcLog hooks
	training.py — Training loop + single-step visualization
	inference.py — Greedy & beam search with logging
	vocab.py — English/Bengali vocabularies + parallel corpus
	requirements.txt
	```

	---

	## ⚙️ Model Config

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| d_model \| 64 \|
	\| num_heads \| 4 \|
	\| num_layers \| 2 \|
	\| d_ff \| 128 \|
	\| vocab (EN) \| ~100 \|
	\| vocab (BN) \| ~90 \|
	\| Optimizer \| Adam \|
	\| Loss \| Label-smoothed CE \|

	---

	Built for educational purposes — every matrix operation is logged and displayed.