--- title: Transformer Visualizer ENβ†’BN emoji: πŸ”¬ colorFrom: blue colorTo: purple sdk: gradio sdk_version: 6.9.0 app_file: app.py pinned: true license: mit --- # πŸ”¬ Transformer Visualizer β€” English β†’ Bengali **See every single calculation inside a Transformer, live.** ## What this Space does Type any English sentence and watch every number flow through the Transformer architecture step by step β€” from raw token IDs all the way to Bengali output. --- ## πŸ—‚οΈ Tabs ### πŸ—οΈ Architecture - Full SVG diagram of encoder + decoder - Color-coded: self-attention / cross-attention / masked attention / FFN - Explains K,V flow from encoder to decoder ### πŸ‹οΈ Train Model - Trains a small Transformer on 30 Englishβ†’Bengali sentence pairs - Live loss curve rendered on canvas - Configurable epochs ### πŸ”¬ Training Step Shows a **single training forward pass** with teacher forcing: 1. **Tokenization** β€” English + Bengali β†’ token ID arrays 2. **Embedding** β€” `token_id β†’ vector Γ— √d_model` 3. **Positional Encoding** β€” `sin(pos/10000^(2i/d))` / `cos(...)` matrix shown 4. **Encoder**: - Q, K, V projection matrices shown - `scores = QΒ·Kα΅€ / √d_k` with actual numbers - Softmax attention weights (heatmap) - Residual + LayerNorm - FFN: `max(0, xW₁+b₁)Wβ‚‚+bβ‚‚` 5. **Decoder**: - Masked self-attention with causal mask matrix - Cross-attention: Q from decoder, K/V from encoder 6. **Loss** β€” label-smoothed cross-entropy, gradient norms, Adam update ### ⚑ Inference Shows **auto-regressive decoding**: - No ground truth needed - Token generated one at a time - Top-5 candidates + probabilities at every step - Cross-attention heatmap: which Bengali token attends to which English word - Greedy vs Beam Search comparison --- ## πŸ“ File Structure ``` app.py β€” Gradio UI + HTML/CSS/JS rendering transformer.py β€” Full Transformer with CalcLog hooks training.py β€” Training loop + single-step visualization inference.py β€” Greedy & beam search with logging vocab.py β€” English/Bengali vocabularies + parallel corpus requirements.txt ``` --- ## βš™οΈ Model Config | Parameter | Value | |-----------|-------| | d_model | 64 | | num_heads | 4 | | num_layers | 2 | | d_ff | 128 | | vocab (EN) | ~100 | | vocab (BN) | ~90 | | Optimizer | Adam | | Loss | Label-smoothed CE | --- *Built for educational purposes β€” every matrix operation is logged and displayed.*