Spaces:
Sleeping
Sleeping
| title: Transformer Visualizer EN→BN | |
| emoji: 🔬 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 6.9.0 | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| # 🔬 Transformer Visualizer — English → Bengali | |
| **See every single calculation inside a Transformer, live.** | |
| ## What this Space does | |
| Type any English sentence and watch every number flow through the Transformer architecture step by step — from raw token IDs all the way to Bengali output. | |
| --- | |
| ## 🗂️ Tabs | |
| ### 🏗️ Architecture | |
| - Full SVG diagram of encoder + decoder | |
| - Color-coded: self-attention / cross-attention / masked attention / FFN | |
| - Explains K,V flow from encoder to decoder | |
| ### 🏋️ Train Model | |
| - Trains a small Transformer on 30 English→Bengali sentence pairs | |
| - Live loss curve rendered on canvas | |
| - Configurable epochs | |
| ### 🔬 Training Step | |
| Shows a **single training forward pass** with teacher forcing: | |
| 1. **Tokenization** — English + Bengali → token ID arrays | |
| 2. **Embedding** — `token_id → vector × √d_model` | |
| 3. **Positional Encoding** — `sin(pos/10000^(2i/d))` / `cos(...)` matrix shown | |
| 4. **Encoder**: | |
| - Q, K, V projection matrices shown | |
| - `scores = Q·Kᵀ / √d_k` with actual numbers | |
| - Softmax attention weights (heatmap) | |
| - Residual + LayerNorm | |
| - FFN: `max(0, xW₁+b₁)W₂+b₂` | |
| 5. **Decoder**: | |
| - Masked self-attention with causal mask matrix | |
| - Cross-attention: Q from decoder, K/V from encoder | |
| 6. **Loss** — label-smoothed cross-entropy, gradient norms, Adam update | |
| ### ⚡ Inference | |
| Shows **auto-regressive decoding**: | |
| - No ground truth needed | |
| - Token generated one at a time | |
| - Top-5 candidates + probabilities at every step | |
| - Cross-attention heatmap: which Bengali token attends to which English word | |
| - Greedy vs Beam Search comparison | |
| --- | |
| ## 📁 File Structure | |
| ``` | |
| app.py — Gradio UI + HTML/CSS/JS rendering | |
| transformer.py — Full Transformer with CalcLog hooks | |
| training.py — Training loop + single-step visualization | |
| inference.py — Greedy & beam search with logging | |
| vocab.py — English/Bengali vocabularies + parallel corpus | |
| requirements.txt | |
| ``` | |
| --- | |
| ## ⚙️ Model Config | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | d_model | 64 | | |
| | num_heads | 4 | | |
| | num_layers | 2 | | |
| | d_ff | 128 | | |
| | vocab (EN) | ~100 | | |
| | vocab (BN) | ~90 | | |
| | Optimizer | Adam | | |
| | Loss | Label-smoothed CE | | |
| --- | |
| *Built for educational purposes — every matrix operation is logged and displayed.* |