schoginitoys commited on
Commit
0ed5ac0
·
verified ·
1 Parent(s): df9dc24

Update src/streamlit_app.py

Browse files
Files changed (1) hide show
  1. src/streamlit_app.py +172 -4
src/streamlit_app.py CHANGED
@@ -30,7 +30,6 @@
30
  # print("✅ Downloaded and saved GPT-2 to models")
31
 
32
 
33
-
34
  import streamlit as st
35
  st.set_page_config(page_title="GPT-2 Attention Explorer", layout="wide")
36
 
@@ -43,8 +42,8 @@ import pandas as pd
43
 
44
  @st.cache_resource
45
  def load_model():
46
- tokenizer = GPT2TokenizerFast.from_pretrained("models")
47
- model = GPT2Model.from_pretrained("models", output_attentions=True, attn_implementation="eager")
48
  model.eval()
49
  return tokenizer, model
50
 
@@ -57,11 +56,82 @@ with st.expander("📊 GPT-2 Model Architecture Summary"):
57
  - **Vocabulary size (V):** `50257`
58
  - **Embedding dimension (d):** `768`
59
  - **Max Position Length (L):** `1024`
 
 
 
 
 
60
  - **Transformer Layers:** `12`
61
  - **Attention Heads per Layer:** `12`
62
  - **Per-head Dimension (dₖ):** `64`
63
  - **Feedforward Hidden Layer Size:** `3072`
64
  - **Total Parameters:** ~117 million
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
65
  """)
66
 
67
 
@@ -520,7 +590,105 @@ print(decoded)
520
  | `'Ġthe'` | `' the'` |
521
  | `'Ġmat'` | `' mat'` |
522
 
523
- Would you like to include this as an educational block in your Streamlit app too?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
524
 
525
 
526
  """)
 
 
30
  # print("✅ Downloaded and saved GPT-2 to models")
31
 
32
 
 
33
  import streamlit as st
34
  st.set_page_config(page_title="GPT-2 Attention Explorer", layout="wide")
35
 
 
42
 
43
  @st.cache_resource
44
  def load_model():
45
+ tokenizer = GPT2TokenizerFast.from_pretrained("./models")
46
+ model = GPT2Model.from_pretrained("./models", output_attentions=True, attn_implementation="eager")
47
  model.eval()
48
  return tokenizer, model
49
 
 
56
  - **Vocabulary size (V):** `50257`
57
  - **Embedding dimension (d):** `768`
58
  - **Max Position Length (L):** `1024`
59
+ - This is sometimes also called:
60
+ - n_positions in config
61
+ - max sequence length
62
+ - context length
63
+ - max context window
64
  - **Transformer Layers:** `12`
65
  - **Attention Heads per Layer:** `12`
66
  - **Per-head Dimension (dₖ):** `64`
67
  - **Feedforward Hidden Layer Size:** `3072`
68
  - **Total Parameters:** ~117 million
69
+
70
+ ---
71
+
72
+ ## Question: Transformer Layers: 12 means each layer has 12 Attention Heads?
73
+
74
+ ## 🧠 Quick Answer:
75
+
76
+ > ✅ **No**, 12 Transformer Layers ≠ 12 Heads per Layer
77
+ > 🔁 But in **GPT-2 (small)**, both happen to be **12** — **by design coincidence**, not definition.
78
+
79
+ ---
80
+
81
+ ## 🔍 Breakdown of GPT-2’s Architecture
82
+
83
+ | Component | GPT-2 (small) default |
84
+ | ----------------------------- | --------------------- |
85
+ | Embedding size (`d_model`) | 768 |
86
+ | **Transformer layers** | 12 |
87
+ | **Attention heads per layer** | 12 |
88
+ | Hidden feedforward size | 3072 |
89
+ | Max position embeddings | 1024 |
90
+
91
+ ---
92
+
93
+ ### ✅ So in GPT-2:
94
+
95
+ * Each of the **12 transformer layers** has:
96
+
97
+ * **Multi-head attention**
98
+ * With **12 heads per layer**
99
+ * Each head has `64` dimensions (`768 ÷ 12 = 64`)
100
+
101
+ ---
102
+
103
+ ## 📌 Why this Confusion Happens
104
+
105
+ The number of **layers** and **heads per layer** are:
106
+
107
+ * Configured independently in the model
108
+ * But **coincidentally** both set to 12 in GPT-2 small
109
+
110
+ In other models:
111
+
112
+ | Model | Layers | Heads per Layer |
113
+ | ------------ | ------ | --------------- |
114
+ | GPT-2 Medium | 24 | 16 |
115
+ | GPT-2 Large | 36 | 20 |
116
+ | GPT-3 | 96 | 96 |
117
+ | LLaMA 2 7B | 32 | 32 |
118
+
119
+ So again:
120
+
121
+ > 🔁 **12 layers ≠ 12 heads** in general — it's just a choice in GPT-2 small.
122
+
123
+ ---
124
+
125
+ ## 💡 Want a table in your app to explain this too?
126
+
127
+ I can give you a section like:
128
+
129
+ > "🧩 Layers vs Heads — What's the Difference?"
130
+
131
+ Let me know and I’ll drop in that Streamlit code too.
132
+
133
+
134
+
135
  """)
136
 
137
 
 
590
  | `'Ġthe'` | `' the'` |
591
  | `'Ġmat'` | `' mat'` |
592
 
593
+
594
+ ---
595
+
596
+ ## ✅ What is `@` in Python?
597
+
598
+ In Python 3.5+, the `@` operator means:
599
+
600
+ > **Matrix multiplication** (also called **dot product** or **tensor contraction** depending on context)
601
+
602
+ ---
603
+
604
+ ### ✅ Equivalent to:
605
+
606
+ ```python
607
+ A @ B ⟺ np.matmul(A, B)
608
+ ```
609
+
610
+ Or if both are 1D/2D NumPy arrays:
611
+
612
+ ```python
613
+ A @ B ⟺ np.dot(A, B)
614
+ ```
615
+
616
+ ---
617
+
618
+ ## 🔍 In your case:
619
+
620
+ ```python
621
+ Output = W_qkv @ x + b
622
+ ```
623
+
624
+ ### Let’s say:
625
+
626
+ * `x` = shape **(3,)**
627
+ * `W_qkv` = shape **(6, 3)**
628
+ * `b` = shape **(6,)**
629
+
630
+ ---
631
+
632
+ ### Then:
633
+
634
+ * `W_qkv @ x` → matrix–vector multiplication
635
+ → shape: **(6,)**
636
+
637
+ * Adding `b` → element-wise vector addition
638
+ → final shape: **(6,)**
639
+
640
+ ---
641
+
642
+ ### So this line:
643
+
644
+ ```python
645
+ Output = W_qkv @ x + b
646
+ ```
647
+
648
+ Means:
649
+
650
+ 1. Multiply the **input vector `x`** with the **projection matrix `W_qkv`**
651
+ 2. Add a **bias vector `b`**
652
+ 3. Result = combined **\[Q | K | V]** output
653
+
654
+ ---
655
+
656
+ ## ✅ Example:
657
+
658
+ ```python
659
+ x = np.array([1, 2, 3])
660
+ W_qkv = np.array([
661
+ [0.1, 0.2, 0.3], # Q1
662
+ [0.4, 0.5, 0.6], # Q2
663
+ [0.7, 0.8, 0.9], # K1
664
+ [1.0, 1.1, 1.2], # K2
665
+ [1.3, 1.4, 1.5], # V1
666
+ [1.6, 1.7, 1.8], # V2
667
+ ])
668
+ b = np.array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06])
669
+
670
+ output = W_qkv @ x + b
671
+ ```
672
+
673
+ Manually:
674
+
675
+ * `W_qkv @ x` = `[1.4, 3.2, 5.0, 6.8, 8.6, 10.4]`
676
+ * After adding `b` → `[1.41, 3.22, 5.03, 6.84, 8.65, 10.46]`
677
+
678
+ ---
679
+
680
+ ## ✅ Summary
681
+
682
+ | Expression | Meaning |
683
+ | ------------- | ----------------------------- |
684
+ | `@` | Matrix multiplication (`dot`) |
685
+ | `W @ x + b` | Linear transformation |
686
+ | Shape `W @ x` | `(m, n) @ (n,) = (m,)` |
687
+
688
+ Would you like to include this in your Streamlit visualizer as an expandable note or equation section?
689
+
690
+
691
 
692
 
693
  """)
694
+