cdpearlman Cursor commited on
Commit
78691d1
·
1 Parent(s): e582564

Add 30 RAG documents for chatbot knowledge base

Browse files

Co-authored-by: Cursor <cursoragent@cursor.com>

conductor/product.md CHANGED
@@ -25,7 +25,8 @@ To demystify the inner workings of Transformer-based Large Language Models (LLMs
25
  - **Integrated Education:**
26
  - Contextual tooltips for immediate clarity.
27
  - Dedicated "Glossary" panel for in-depth definitions.
28
- - Foundation for AI-guided tutorials.
 
29
 
30
  ## User Experience
31
  The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.
 
25
  - **Integrated Education:**
26
  - Contextual tooltips for immediate clarity.
27
  - Dedicated "Glossary" panel for in-depth definitions.
28
+ - AI chatbot with RAG-powered knowledge base (30 documents covering transformer concepts, dashboard usage, guided experiments, result interpretation, troubleshooting, and mechanistic interpretability research).
29
+ - Step-by-step guided experiments that walk beginners through the dashboard's features.
30
 
31
  ## User Experience
32
  The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.
rag_docs/README.md CHANGED
@@ -9,17 +9,52 @@ This folder contains documents used by the AI chatbot for Retrieval-Augmented Ge
9
 
10
  ## How to Add Documents
11
 
12
- 1. Place your transformer-related documentation files in this folder
13
- 2. The chatbot will automatically index new documents on startup
14
- 3. Documents are chunked and embedded for semantic search
15
-
16
- ## Recommended Content
17
-
18
- - Transformer architecture explanations
19
- - Attention mechanism documentation
20
- - Information about the experiments available in this dashboard
21
- - Glossary of ML/NLP terms
22
- - Model-specific documentation (GPT-2, LLaMA, etc.)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Notes
25
 
 
9
 
10
  ## How to Add Documents
11
 
12
+ 1. Place your documentation files in this folder
13
+ 2. Delete `embeddings_cache.json` if it exists (to force re-indexing)
14
+ 3. The chatbot will automatically index new documents on startup
15
+ 4. Documents are chunked and embedded for semantic search
16
+
17
+ ## Document Inventory
18
+
19
+ ### Category 1: General LLM/Transformer Knowledge
20
+ - `what_is_an_llm.md` - Neural networks, language models, next-token prediction
21
+ - `transformer_architecture.md` - Layers, encoder/decoder, residual stream
22
+ - `tokenization_explained.md` - Subword tokenization, BPE, token IDs
23
+ - `embeddings_explained.md` - Lookup tables, vector spaces, positional encodings
24
+ - `attention_mechanism.md` - Q/K/V, multi-head attention, intuitive explanations
25
+ - `mlp_layers_explained.md` - Feed-forward networks, knowledge storage, expand-compress
26
+ - `output_and_prediction.md` - Logits, softmax, temperature, greedy vs. sampling
27
+ - `key_terminology.md` - Extended glossary of ML/transformer terms
28
+
29
+ ### Category 2: Dashboard Components
30
+ - `dashboard_overview.md` - Layout tour, navigation, typical workflow
31
+ - `pipeline_stages.md` - What each of the 5 pipeline stages shows
32
+ - `ablation_panel_guide.md` - How to use the ablation experiment panel
33
+ - `attribution_panel_guide.md` - How to use the token attribution panel
34
+ - `beam_search_and_generation.md` - Beam search, generation controls
35
+ - `head_categories_explained.md` - Previous-Token, Positional, BoW, Syntactic, Other
36
+ - `model_selector_guide.md` - Choosing models, auto-detection, generation settings
37
+
38
+ ### Category 3: Model-Specific Documentation
39
+ - `gpt2_overview.md` - GPT-2 architecture, why it's a good starter, variants
40
+ - `llama_overview.md` - LLaMA/Qwen/Mistral architecture, RoPE, GQA differences
41
+ - `opt_overview.md` - OPT architecture, comparison with GPT-2
42
+
43
+ ### Category 4: Guided Experiments (Step-by-Step)
44
+ - `experiment_first_analysis.md` - Your first analysis with GPT-2
45
+ - `experiment_exploring_attention.md` - Reading attention patterns and head categories
46
+ - `experiment_first_ablation.md` - Removing a head and observing the effect
47
+ - `experiment_token_attribution.md` - Measuring token influence with gradients
48
+ - `experiment_comparing_heads.md` - Systematic comparison across head categories
49
+ - `experiment_beam_search.md` - Exploring alternative generation paths
50
+
51
+ ### Category 5: Interpretation, Troubleshooting, and Research
52
+ - `interpreting_ablation_results.md` - How to read ablation probability changes
53
+ - `interpreting_attribution_scores.md` - Understanding attribution score values
54
+ - `interpreting_attention_maps.md` - Reading BertViz patterns visually
55
+ - `troubleshooting_and_faq.md` - Common issues and frequently asked questions
56
+ - `recommended_starting_points.md` - Best models, prompts, and experiment order
57
+ - `mechanistic_interpretability_intro.md` - Mech interp research context
58
 
59
  ## Notes
60
 
rag_docs/ablation_panel_guide.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ablation Panel Guide
2
+
3
+ ## What Is Ablation?
4
+
5
+ Ablation is an experiment where you **remove (disable) specific attention heads** from the model and observe how the output changes. If removing a head causes the prediction to change significantly, that head was important for this particular input. If the prediction barely changes, the head was either redundant or not relevant to this context.
6
+
7
+ The term comes from neuroscience, where "ablation" means removing part of the brain to study its function.
8
+
9
+ ## How to Use the Ablation Panel
10
+
11
+ The ablation panel is found in the **Investigation Panel** at the bottom of the dashboard, under the "Ablation" tab.
12
+
13
+ ### Step-by-Step
14
+
15
+ 1. **Run an analysis first**: You need to have clicked "Analyze" on a prompt before ablation is available.
16
+ 2. **Select a generation for comparison**: Click "Select for Comparison" on one of the generated sequences. This gives the ablation experiment a full generation to compare against.
17
+ 3. **Choose a head to ablate**: Use the **Layer** and **Head** dropdowns to pick a specific attention head (e.g., Layer 0, Head 3). Click the **+** button to add it.
18
+ 4. **Add more heads (optional)**: You can add multiple heads from different layers. Each appears as a chip (e.g., "L0-H3") with an × button to remove it.
19
+ 5. **Run the experiment**: Click "Run Ablation Experiment."
20
+ 6. **View results**: The panel shows a side-by-side comparison of the original vs. ablated generation, plus the probability change for the immediate next token.
21
+
22
+ ### Understanding the Results
23
+
24
+ - **Full Generation Comparison**: Shows the original generated text alongside what the model generates with those heads removed. If the text changed, the ablated heads were important for that generation.
25
+ - **Probability Change**: Shows the immediate next-token probability before and after ablation (e.g., "72.3% → 45.1% (-27.2%)"). A large drop means the head was important.
26
+ - **No change?** If ablating a head has no effect, it may be redundant, or it may serve a function that isn't relevant to this specific prompt. Try a different prompt or a different head.
27
+
28
+ ### Tips
29
+
30
+ - Start by ablating heads from the **Previous-Token** category -- these often have noticeable effects.
31
+ - Try ablating heads from the **Other** category for comparison -- these often have less impact.
32
+ - Use the **head categories** in the Attention stage (Stage 3) to pick interesting heads to ablate.
33
+ - You can ablate heads from multiple layers simultaneously to see compound effects.
rag_docs/attention_mechanism.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # The Attention Mechanism
2
+
3
+ ## What Is Attention?
4
+
5
+ Attention is the core innovation that makes Transformers powerful. It allows the model to look at **all tokens in the input simultaneously** and figure out which ones are relevant to each other. When processing any token, the model can "attend to" (focus on) other tokens that provide useful context.
6
+
7
+ ## An Intuitive Analogy
8
+
9
+ Imagine you're reading a sentence and you come across the word "it." To understand what "it" refers to, you look back at earlier words -- maybe "the cat" or "the ball." Your brain is doing something like attention: scanning the context and focusing on the most relevant parts.
10
+
11
+ The model does this mathematically by computing **attention scores** between every pair of tokens.
12
+
13
+ ## How It Works: Queries, Keys, and Values
14
+
15
+ Attention uses three concepts (don't worry about the math -- the intuition is what matters):
16
+
17
+ - **Query (Q)**: "What am I looking for?" -- each token asks a question
18
+ - **Key (K)**: "What do I contain?" -- each token advertises what information it has
19
+ - **Value (V)**: "Here's my actual information" -- the content that gets passed along
20
+
21
+ The model computes how well each Query matches each Key. High matches mean strong attention. The final output for each token is a weighted combination of all the Values, weighted by these attention scores.
22
+
23
+ ## Multi-Head Attention
24
+
25
+ Instead of computing attention once, Transformers compute it multiple times in parallel using different **attention heads**. Each head learns to look for different kinds of relationships:
26
+
27
+ - One head might track which word a pronoun refers to
28
+ - Another might focus on the immediately preceding token
29
+ - Another might spread attention broadly across the sentence
30
+
31
+ GPT-2 has **12 attention heads per layer** (144 heads total across 12 layers). The dashboard categorizes these heads to help you understand their roles.
32
+
33
+ ## What You See in the Dashboard
34
+
35
+ In **Stage 3 (Attention)** of the pipeline, you can see:
36
+
37
+ - **Head Categories**: Heads are grouped by pattern type -- Previous-Token, First/Positional, Bag-of-Words, Syntactic, and Other. Click each category to see which specific heads (like L0-H3) belong to it.
38
+ - **BertViz Visualization**: An interactive attention map. Lines connect tokens on the left to the tokens they attend to on the right. Thicker, darker lines mean stronger attention. You can click on specific heads to focus on them.
39
+
40
+ **Want more technical detail?** Ask the chatbot about Q/K/V dot products, softmax normalization, or how attention scores are computed mathematically.
rag_docs/attribution_panel_guide.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Attribution Panel Guide
2
+
3
+ ## What Is Token Attribution?
4
+
5
+ Token attribution measures **which input tokens had the most influence on the model's prediction**. It answers the question: "Which parts of my input mattered most for this output?"
6
+
7
+ For example, if your prompt is "The capital of France is" and the model predicts "Paris," attribution will show that "France" and "capital" had the highest influence on that prediction.
8
+
9
+ ## How to Use the Attribution Panel
10
+
11
+ The attribution panel is in the **Investigation Panel** at the bottom of the dashboard, under the "Token Attribution" tab.
12
+
13
+ ### Step-by-Step
14
+
15
+ 1. **Run an analysis first**: Click "Analyze" on a prompt so the model has made a prediction.
16
+ 2. **Choose an attribution method**:
17
+ - **Integrated Gradients**: More accurate but slower. Computes attribution by gradually "building up" the input and measuring how each token contributes along the way.
18
+ - **Simple Gradient**: Faster but less precise. Takes a single gradient measurement to estimate token importance.
19
+ 3. **Choose a target token (optional)**: By default, attribution is computed for the model's top prediction. You can select a different token from the top-5 predictions dropdown to see which input tokens would drive *that* alternative prediction.
20
+ 4. **Click "Compute Attribution"**.
21
+
22
+ ### Reading the Results
23
+
24
+ The results have two visualizations:
25
+
26
+ - **Color-coded token chips**: Your input tokens are displayed as colored boxes. Darker blue = higher influence. Hover over any chip to see the exact attribution score.
27
+ - **Horizontal bar chart**: Shows the same information as bars, with attribution scores labeled. Longer bars = more influential tokens.
28
+
29
+ ### What Attribution Scores Mean
30
+
31
+ - **High score (near 1.0)**: This token was highly influential for the prediction. It strongly pushed the model toward the target token.
32
+ - **Low score (near 0.0)**: This token had little influence on this particular prediction.
33
+ - **Scores are normalized**: The highest-influence token gets a score of 1.0, and others are scaled relative to it.
34
+
35
+ ### Tips
36
+
37
+ - Compare attribution for different target tokens. You might find that different input tokens drive different predictions.
38
+ - Try Integrated Gradients for the most reliable results, especially when you want to draw conclusions.
39
+ - Short prompts give cleaner attribution results since there are fewer tokens to compare.
40
+ - Function words (like "the" or "is") often have low attribution; content words (nouns, verbs) tend to have higher attribution.
rag_docs/beam_search_and_generation.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Beam Search and Generation
2
+
3
+ ## What Is Text Generation?
4
+
5
+ When you click "Analyze" in the dashboard, the model generates a continuation of your prompt. The simplest approach is **greedy decoding**: at each step, pick the single most likely next token. This is fast but can lead to repetitive or suboptimal text.
6
+
7
+ ## What Is Beam Search?
8
+
9
+ Beam search is a smarter generation strategy that explores **multiple possible paths simultaneously** instead of committing to the single best token at each step.
10
+
11
+ ### How It Works
12
+
13
+ Imagine you're at a fork in the road. Greedy decoding always takes the path that looks best right now. Beam search keeps multiple paths open and evaluates them over several steps before deciding which one is actually best overall.
14
+
15
+ More specifically:
16
+ 1. At each step, the model considers the top candidates for the next token
17
+ 2. It keeps the **N best partial sequences** (where N is the number of beams)
18
+ 3. It extends each of those sequences by one more token
19
+ 4. It ranks all the possibilities and keeps the top N again
20
+ 5. This continues until the desired number of tokens have been generated
21
+
22
+ The result is that beam search can find better overall sequences, even if the individual tokens at each step aren't always the highest-probability choice.
23
+
24
+ ### Example
25
+
26
+ With the prompt "The cat sat on the" and 3 beams:
27
+ - Beam 1 might generate: "The cat sat on the mat and purred"
28
+ - Beam 2 might generate: "The cat sat on the floor and looked"
29
+ - Beam 3 might generate: "The cat sat on the couch and slept"
30
+
31
+ Each beam represents a different possible continuation.
32
+
33
+ ## Generation Controls in the Dashboard
34
+
35
+ In the generator section, you have two settings:
36
+
37
+ - **Number of Generation Choices (Beams)**: How many parallel paths to explore (1-5). With 1 beam, you get greedy decoding. More beams = more diverse results but slower.
38
+ - **Number of New Tokens**: How many tokens to generate beyond your prompt (1-20). Longer generations take more time.
39
+
40
+ ## How Generated Sequences Are Used
41
+
42
+ After generation, the dashboard displays the resulting sequences. You can:
43
+ 1. **Read them**: See how the model would continue your prompt
44
+ 2. **Select one for comparison**: Click "Select for Comparison" to use that sequence as a baseline in ablation experiments. This lets you see how removing attention heads changes the full generated sequence, not just the immediate next token.
45
+
46
+ ## Greedy vs. Beam Search
47
+
48
+ | Feature | Greedy (1 beam) | Beam Search (2+ beams) |
49
+ |---------|-----------------|------------------------|
50
+ | Speed | Fast | Slower |
51
+ | Quality | May miss better paths | Finds better overall sequences |
52
+ | Diversity | Only one output | Multiple alternatives |
53
+ | Best for | Quick exploration | Thorough analysis |
rag_docs/dashboard_overview.md ADDED
@@ -0,0 +1,49 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Dashboard Overview
2
+
3
+ ## What Is This Dashboard?
4
+
5
+ The Transformer Explanation Dashboard is an interactive web application that lets you look inside transformer-based language models as they process text. Instead of treating the model as a "black box," you can see exactly what happens at each step when the model makes a prediction.
6
+
7
+ ## How to Navigate
8
+
9
+ The dashboard has several main sections, from top to bottom:
10
+
11
+ ### 1. Header and Sidebar
12
+ At the top is the dashboard title. On the left edge, there's a collapsible **sidebar** (click the hamburger menu icon). The sidebar contains advanced configuration options for selecting which internal model components to hook into. Most users won't need to change these settings.
13
+
14
+ ### 2. Generator Section
15
+ This is where you start. It contains:
16
+ - **Model dropdown**: Choose which transformer model to load (e.g., GPT-2, Qwen2.5)
17
+ - **Prompt input**: Type the text you want the model to analyze
18
+ - **Generation settings**: Control beam search parameters (number of beams and tokens to generate)
19
+ - **Analyze button**: Click to run the model and see results
20
+
21
+ ### 3. Generation Results
22
+ After clicking Analyze, you'll see one or more **generated sequences** -- these are the model's continuations of your prompt. If you used beam search with multiple beams, you'll see multiple possible continuations. Click "Select for Comparison" on any sequence to use it in ablation experiments.
23
+
24
+ ### 4. Pipeline Visualization
25
+ This is the core educational section. It shows **5 expandable stages** that your text passes through:
26
+ 1. **Tokenization**: How your text is split into tokens
27
+ 2. **Embedding**: How tokens become number vectors
28
+ 3. **Attention**: How the model finds relationships between tokens
29
+ 4. **MLP**: How stored knowledge is retrieved
30
+ 5. **Output**: What the model predicts and how confident it is
31
+
32
+ Click any stage to expand it and see detailed explanations and visualizations.
33
+
34
+ ### 5. Investigation Panel
35
+ At the bottom, two experiment tabs let you investigate *why* the model made its prediction:
36
+ - **Ablation**: Remove specific attention heads and see what changes
37
+ - **Token Attribution**: Measure which input tokens influenced the prediction most
38
+
39
+ ### 6. AI Assistant (Chatbot)
40
+ The floating robot icon in the bottom-right corner opens the AI chatbot. It can answer questions about transformers, explain what you're seeing in the dashboard, and guide you through experiments.
41
+
42
+ ## Typical Workflow
43
+
44
+ 1. Select a model (start with GPT-2 if you're new)
45
+ 2. Enter a prompt (e.g., "The cat sat on the")
46
+ 3. Click "Analyze" to run the model
47
+ 4. Explore the 5 pipeline stages to understand how the model processed your input
48
+ 5. Use the Investigation Panel to run ablation or attribution experiments
49
+ 6. Ask the chatbot if anything is unclear
rag_docs/embeddings_cache.json DELETED
@@ -1 +0,0 @@
1
- [{"content": "# RAG Documents\n\nThis folder contains documents used by the AI chatbot for Retrieval-Augmented Generation (RAG).\n\n## Supported File Types\n\n- `.txt` - Plain text files\n- `.md` - Markdown files\n\n## How to Add Documents\n\n1. Place your transformer-related documentation files in this folder\n2. The chatbot will automatically index new documents on startup\n3. Documents are chunked and embedded for semantic search\n\n## Recommended Content\n\n- Transformer architecture explanations\n- Attention mechanism documentation\n- Information about the experiments available in this dashboard\n- Glossary of ML/NLP terms\n- Model-specific documentation (GPT-2, LLaMA, etc.)\n\n## Notes\n\n- Large files will be automatically chunked (~500 tokens per chunk)\n- Embeddings are cached in `embeddings_cache.json` for faster subsequent loads\n- Delete `embeddings_cache.json` to force re-indexing of all documents", "source_file": "README.md", "chunk_index": 0, "embedding": [0.0012510021, 0.026576312, 0.029617026, -0.031101469, -0.0014066292, -0.055115923, 0.015826074, 0.008451747, -0.022649722, 0.011276976, 0.0009240357, -0.024804559, -0.0037140995, -0.022458183, -0.0012061098, 0.033352077, -0.0032292614, -0.015311308, 0.017382346, 0.03794906, 0.039026476, -0.019405497, 0.0062190965, 0.0063268384, -0.015131739, -0.014377546, 0.0054738824, 0.047143027, 0.008589417, -0.032322545, -0.01550285, -0.021560334, -0.010558697, -0.029712796, -0.009774577, 0.022817321, 0.01999209, 0.031795807, -0.014269805, -0.02033926, -0.007679597, -0.03794906, -0.026672084, 0.045107905, -0.028659321, -0.010259415, 0.01707109, -0.002739934, -0.0022176856, 0.021201193, -0.014952169, -0.022721551, -0.02117725, 0.009193968, 0.018399907, 0.021763846, -0.05411033, -0.014521202, -0.0011230587, -0.022290584, 0.00061801897, -0.0076855826, 0.019297756, 0.023344059, 0.015586648, -0.008679202, -0.086432874, 0.032154944, -0.040750347, 0.003082613, 0.054780725, -0.030814158, -0.027581904, -0.03146061, -0.007452142, 0.012354394, -0.038715225, 0.022003273, -0.02116528, -0.04288124, -0.016209157, -0.006464509, -0.015921846, -0.025379183, -0.01825625, -0.010702353, -0.030311363, -0.0059856563, -0.03512383, -0.0065123937, -0.03794906, 0.0069912462, -0.053775135, 0.02171596, -0.0049291877, 0.005096786, -0.03866734, -0.010241457, -0.004737647, 0.022506068, 0.026432658, -0.03986447, 0.031963404, -0.009642892, 0.034237955, 0.011594216, 0.03028742, -0.03368727, -0.013467726, -0.043216437, -0.083033025, -0.03720684, -0.0007467106, 0.023044776, 0.0046448694, -0.013743066, -0.0032831323, 0.006177197, 0.004405443, -0.009696763, -0.042019308, -0.0006064218, -0.008427803, -0.0035405157, 0.004192952, -0.004680783, 0.010989665, -0.019561125, -0.0383082, 0.053583592, 0.032705624, 0.023511657, 0.044581167, -0.051668182, -0.027653731, -0.02162019, -0.009942175, 0.039625045, -0.037254725, 0.011552316, 0.01759783, -0.020554744, -0.036991354, 0.05200338, -0.064692974, -0.013336042, -0.018579477, 0.008475689, -0.022649722, 0.0018750068, -0.07019977, 0.013731095, -0.009205939, -0.005063865, -0.019357612, -0.011223105, 0.0017208761, 0.04369529, 0.01109142, -0.049704887, -0.014581058, 0.031412724, -0.0056145457, -0.019465353, 0.0070092035, 0.024613017, 0.015107796, -0.012258624, -0.038020886, -0.01926184, 0.013695181, -0.08073453, -0.017537972, -0.021129366, -0.014401489, 0.007990851, -0.0397687, -0.014796542, 0.020028006, 0.021835674, -0.015035968, 0.005066858, -0.052769545, 0.010121744, -0.04168411, -0.02691151, 0.008469704, -0.013731095, -0.017466145, 0.029712796, 0.005677395, 0.024732731, 0.05018374, 0.016580267, 0.018304136, 0.009738662, 0.028108642, -0.035219602, 0.038715225, -0.012426222, -0.0008207831, 0.01814851, -0.0103132855, -0.026600257, -0.030311363, -0.01806471, -0.036344904, 0.033184476, -0.0353872, -0.044389624, 0.0061532543, -0.020925853, 0.0056085596, 0.038739167, -0.06727877, -0.022338469, -0.040702462, 0.049848545, -0.0065542934, -0.012120954, -0.007362357, 0.01350364, 0.019561125, 0.0122466525, -0.0012510021, 0.029497314, -0.010881923, -0.07106171, 0.04752611, 0.014509231, 0.023786997, 0.03519566, -0.040534865, 0.006859562, 0.0694815, -0.016628154, -0.007745439, 0.017909084, -0.017190805, 0.03907436, 0.022458183, -0.01723869, 0.0067817485, 0.01924987, 0.015754247, -0.007835224, -0.0397687, -0.06239448, 0.0928495, 0.036249135, 0.015586648, 0.0016430626, 0.058372118, -0.021632161, -0.0056893663, 0.017669657, -0.020854026, 0.023403915, -0.040463034, 0.025714379, -0.021548362, 0.0691463, -0.036775872, -0.013144501, 0.012737476, 0.0032023261, 0.005138686, 0.003289118, -0.036823757, -0.010798124, -0.022829292, 0.020770228, 0.019668866, 0.02317646, -0.061532546, 0.009828447, -0.0071588447, -0.016819693, -0.013168444, 0.07048709, -0.0038607481, 0.0077873385, 0.0072007445, 0.040846117, -0.006213111, 0.030790215, 0.0529132, 0.016388727, 0.0125818495, -0.015263423, 0.025403125, -0.009349595, 0.03739838, -0.054062445, -0.03165215, 0.021548362, 0.048866898, 0.02436162, 0.043120667, 0.007260601, -0.0031873619, 0.01072031, 0.029162116, 0.011414646, 0.021464562, 0.050758366, -0.025139757, -0.042378444, 0.025714379, -0.01933367, -0.00077514246, -0.016352812, -0.000480723, -0.039098307, -0.014916255, 0.006697949, 0.042091135, 0.026480542, 0.044916365, -0.007362357, 0.0015951773, -0.025714379, 0.073455974, -0.052865315, -0.04496425, -0.019549154, -0.014904284, 0.022039186, -0.05674402, -0.008739058, 0.07379117, -0.031125411, 0.024373591, 0.025594665, -0.016137328, -0.028587494, 0.009565079, 0.031508494, -0.05190761, -0.0020366195, -0.023894738, -0.016628154, -0.041923534, 0.0021488506, -0.05181184, 0.007434185, -0.02326026, 0.06009599, 0.020698398, 0.002009684, 0.014832456, -0.0065004225, -0.024217965, -0.027725559, -0.023786997, 0.014078263, -0.04975277, -0.030119821, -0.0025424075, -0.007793324, -0.025594665, 0.02326026, 0.052625887, 0.0072546154, 0.011833643, -0.00939748, -0.035818167, 0.02746219, 0.056313053, -0.011432603, 0.011031564, -0.02280535, -0.005396069, -0.004366536, -0.05118933, -0.006679992, -0.0058839, -0.0117139295, -0.0031783835, 0.061388887, -0.02308069, -0.013443784, -0.012545935, 0.002551386, 0.015526792, 0.013934608, -0.0055606747, 0.011061492, -0.0063627525, 0.064262, -0.04781342, 0.005955728, -0.037254725, -0.009523178, 0.0064345803, 0.020147718, 0.00784121, -0.029976167, 0.031843692, 0.0019977128, 0.043982603, -0.0001821884, -0.028779035, -0.0030945842, 0.030215591, 0.030502904, 0.06340007, 0.046568405, 0.010504827, -0.047597937, 0.043719232, -0.013994464, -0.009319667, 0.008218306, -0.019201985, 0.010049917, 0.008972499, 0.053631477, -0.021859616, -0.00023811689, -0.039002534, -0.0587552, -0.025929863, -0.019860407, -0.0071109594, 0.030024052, 0.055977855, 0.01089988, 0.02691151, -0.03249014, -0.044006545, -0.015490878, 0.047478225, -0.014820484, -0.009289739, -0.026432658, 0.028491722, 0.03706318, -0.007709525, -0.016699981, 0.018507648, -0.018663276, -0.020686427, -0.013731095, -0.030407133, -0.02545101, -0.002834208, 0.0041480595, 0.005482861, -0.024110222, -0.016999263, -0.004282737, -0.016987292, -0.06723089, -0.000704811, -0.040295437, 0.026145346, 0.01751403, 0.06454931, -0.012402279, 0.045323387, 0.022685636, -0.023571514, 0.009463322, -0.0011851599, -0.048747182, -0.043551635, -0.0063926806, 0.0132522425, -0.03438161, -0.019405497, -0.028755091, -0.046592347, 0.0057881293, -0.0049740802, -0.02992828, -0.019417468, -0.043934718, 0.04189959, -0.020195603, 0.03411824, -0.013431813, -0.01871116, -0.025738321, 0.0064704944, 0.103049055, 0.016771808, -0.01514371, 0.05703133, -0.021991301, -0.017909084, -0.011564287, 0.011163249, -0.011480489, 0.03696741, 0.00038981586, 0.009834433, -0.11013607, 0.021560334, -0.023463773, -0.035698455, 0.018363994, 0.007045117, 0.03495623, -0.031388782, 0.014569087, -0.037446264, 0.012557906, 0.006644078, 0.025139757, 0.018184423, -0.026767854, 0.019848436, 0.030814158, 0.01641267, 0.019908292, -0.0001279434, 0.038188487, -0.016137328, -0.0027638767, 0.012929018, -0.021536391, -0.0059048496, -0.021428648, 0.013671239, -0.009864361, -0.020399116, -0.073360205, -0.03201129, 0.031843692, 0.03129301, -0.033328135, 0.01851962, 0.012605792, -0.03038319, -0.007925008, -0.009966117, 0.022015244, -0.013084644, -0.013036759, -0.023583485, 0.021560334, 0.030502904, -0.025810149, 0.0052673775, 0.009092212, -0.008278162, -0.012426222, 0.008056693, 0.008966513, 0.042953067, -0.016125357, 0.030909928, -0.032346487, 0.021500478, -0.010211529, 0.03907436, -0.023068719, 0.02947337, -0.002461601, 0.0008948556, -0.027222764, -0.023152517, 0.011576259, -0.026145346, 0.026097462, -0.005800101, -0.0019528203, -0.015670449, -0.018088654, 0.024421478, 0.0043096724, -0.010181601, 0.023559542, 0.027677674, -0.023451801, -0.02006392, -0.014700771, -0.036009707, 0.010941779, -0.01707109, -0.023846854, -0.036632214, 0.015706362, -0.029210001, 0.023763055, 0.007966909, 0.009092212, -0.002977864, -0.0035674511, 0.051428758, 0.009696763, 0.006332824, -0.029808568, -0.012905074, -0.009331638, -0.0065004225, 0.008224292, -0.034716807, 0.04077429, 0.0024376584, 0.0085774455, -0.04087006, -0.003549494, 0.031388782, -0.014245861, -0.020387145, -0.000491572, -0.013802923, 0.012725505, -0.0049890443, -0.026719969, -0.03694347, -0.046113495, -0.007996837, 0.00046164374, 0.004154045, 0.03129301, 0.0068655475, -0.01696335, -0.013060702, -0.001662516, 0.002880597, -0.011815685, 0.017382346, 0.04752611, -0.01796894, -0.051955495, -0.045730412, -0.0035973794, -0.030598674, 0.041755937, 0.034094296, 0.003741035, -0.030622616, 0.005530746, 0.0051087574, -0.02033926, 0.015311308, -0.00544994, 0.009996045, 0.03519566, 0.0016206164, -0.00784121, 0.058803085, -0.022386353, -0.0090203835, 0.017741485, 0.00394754, 0.022601837, -0.047095142, -0.026863625, -0.011869556, 0.009205939, 0.039265905, -0.025977748, 0.018028796, -0.018040767, 0.014808513, 0.05956925, -0.031987347, 0.0031125413, -0.0011911456, -0.01641267, -0.034525264, 0.07996836, -0.010301314, 0.028491722, -0.0054918397, 0.012390308, -0.0035884008, -0.02098571, 0.014030378, -0.0019333669, 0.00971472, -0.015275395, -0.0017882148, 0.0009599496, 0.043455865, 0.021033596, -0.004276751, -0.021213165, -0.029425485, -0.038787052, 0.035866052, -0.051524527, 0.034429494, -0.0046089552, 0.014976111, 0.016891522, -0.003480659, -0.0075239697, -0.02072234, -0.018854817, -0.024457391, 0.07699948, 0.0103132855, 0.008757015, 0.01044497, -0.03885888, -0.0014829463, 0.017621772, -0.009277767, -0.000600062, -0.023655312, -0.0031723978, -0.0062071253, -0.01707109, 0.019345641, -0.067518204, -0.018854817, 0.009612964, 0.013156472, 0.01723869, -0.021632161, 0.003498616, -0.018040767, 0.016089443, -0.009050312, -0.042019308, -0.011073464, 0.015802132, -0.027534017, -0.008266191, -0.030071937, -0.026049575, 0.010403071, -0.0057731653, -0.051859725, 0.012521992, 0.015861988, 0.03952927, -0.01587396, 0.0016834658, 0.007835224, 0.023224346, -0.036727987, 0.013000845, -0.032418314, -0.015407079, -0.00811655, -0.023164488, -0.009056298, 0.024636962, -0.012138911, -0.017573886, 0.0068116765, 0.029425485, -0.021763846, 0.02217087, -0.00095620856, 0.007673611, -0.0026187245, -0.035554796, 0.013886722, 0.03529143, 0.014078263, -0.020267433, -0.0058719288, 0.016628154, -0.024409506, 0.020399116, -0.009319667, 0.0021997287, -0.018639334, -0.015586648, 0.013563497, -0.013599411, -0.004593991, 0.035650566, -0.008804901, -0.015861988, 0.0073204576, -0.022039186, 0.055403233, -0.022649722, 0.001792704, -0.03301688, 0.0034716805, -0.006721892, 0.0039146193, -0.012545935, -0.0063747237, -0.02928183, -0.002385284, 0.0040642605, -0.017837254, -0.021308936, -0.0056325025, -0.013300128, -0.072450384, -0.00957705, 0.02290112, -0.012725505, -0.0005012987, -0.012342423, 0.010403071, -0.069720924, -0.048100732, 0.010887909, -0.028204411, 0.021081481, -0.028563552, 0.023894738, 0.02992828, 0.051237218, -0.03404641, 0.009044327, -0.024708789, 0.012426222, -0.0077394536, 0.007655654, -0.03265774, 0.024002481, -0.0072127157, 0.037135012, -0.012228696, -0.027510075, 0.007033146, 0.023200404, 0.005518775, -0.027103052, -0.0078172665, 0.019405497, 0.005952735, 0.042522103, -0.00042460748, -0.011396689, -0.0078471955, -0.035363257, 0.0033968599, -0.009032355, 0.011300919, 0.008038736, 0.0011918938, -0.01596973, -0.011570273, 0.017526, -0.039792642, 0.030167706, 0.017190805, 0.023954596, 0.049417578, -0.0057372516, -0.013767009, -0.0037380422, -0.0056983447, 0.0064824657, 0.015179624, 0.03323236, 0.012641706, 0.034884404, -0.024732731, -0.0027743515, 0.00038944176, 0.013072673, 0.02235044, -0.0021413686, 0.0019064315, 0.008158449, 0.017442202, -0.0016505447, 0.011624144, -0.0106484825, 0.0056594377, -0.055882085, -0.024038395, -0.022194812, 0.0028701222, -0.009457337, 0.004166017, 0.010630526, 0.011259019, 0.004210909, 0.0025349255, 0.025714379, -0.0146888, 0.04132497, 0.010666439, -0.02490033, 0.025163699, 0.023212375, 0.024158109, 0.024337677, -0.04515579, -0.025283411, 0.029114231, 0.04743034, 0.011911456, 0.003758992, -0.009858375, -0.009475294, 0.0051416787, -0.007045117, -0.0072546154, 0.049465463, -0.03119724, 0.0365125, -0.012869161, -0.011115364, 0.017741485, -0.021763846, 0.011522388, -0.021967359, -0.014533173, -0.023379972, 0.011618159, 0.01897453, 0.00408521, 0.009313681, 0.006560279, 0.009636906, 0.024349649, 0.042737585, 0.043168552, -0.010630526, -0.007188773, 0.007751425, -0.0070032175, 0.0016161272, -0.0082123205, -0.04051092, 0.010415042, -0.0054020546, -0.0032322544, 0.035889994, 0.016185215, 0.03706318, 0.021021625, -0.01971675, 0.013767009, -0.031340897, 0.0054050474, 0.02746219, 0.010768196, -0.027941043, -0.014700771, -0.0162331, 0.0022221748, -0.01696335, -0.007416228, -0.03957716, -0.0124142505, 0.002392766, -0.014521202, -0.036344904, -0.045634642, 0.02015969, 0.010726295, -0.022721551, 0.00647648, 0.047143027, 0.01550285, -0.020315317, -0.03177186, -0.03093387, 0.007547912, 0.022470154, 0.013024787, 0.02371517, -0.0010766699, 0.030957814, 0.018567506, -0.015323279, -0.0019169063, -0.008565474, -0.0042887228, 0.03493229, 0.020375174, 0.010995651, 0.06383104, -0.029880395, 0.037087124, 0.03383093, 0.004219888, 0.016041558, -0.0008162939, -0.036704045, 0.008332034, -0.019656895, 0.00337591, 0.008020779, 0.018495677, -0.0147127425, 0.019848436, -0.019740693, 0.035794225, -0.018579477, -0.0059437566, 0.011995255, 0.050758366, -0.026289001, -0.016544353, 0.026097462, -0.006943361, -0.0063627525, -0.00093600695, 0.0129170455, 0.08126127, 0.020698398, -0.0162331, 0.020698398, 0.042857297, -0.030981757, 0.0031245125, 0.00095471216, 0.0046747974, -0.011600202, -0.0047286684, -0.021883558, 0.0010078348, 0.008415832, 0.07312077, -0.025139757, -0.008535545, -0.02707911, -0.0019573097, 0.007098988, -0.045562815, -0.026983337, -0.014341632, -0.0045161773, 0.0115463305, -0.0019917272, 0.010977693, 0.016424641, -0.005638488, 0.03165215, -0.043192495, 0.012216724, 0.02043503, 0.043982603, 0.013898694, 0.016556324, 0.0044652997, 0.041731995, 0.012390308, 0.015071882, 0.023116603, 0.020746283, -0.034908347, -0.028779035, -0.013743066, 0.020602629, 0.04515579, -0.015155681, -0.01395855, 0.013778981, 0.004192952, 0.033902757, -0.03038319, 0.028491722, 0.0018660283, 0.022613809, -0.0013235782, -0.0008342508, -0.014174034, 0.014006435, -0.023451801, 0.0032861251, -0.037685692, -0.010163644, -0.015095824, -0.009319667, -0.0064884513, 0.015909875, 0.022937035, -0.026312944, 0.040798232, 0.008858771, 0.02363137, -0.042761527, -0.027150936, 0.03181975, -0.0073444, -0.008774972, -0.005901857, -0.03593788, -0.020004062, 0.006165226, -0.01579016, 0.004761589, -0.023068719, -0.028946633, 0.0026860633, 0.00019565613, 0.01707109, -0.027534017, 0.0065722503, -0.015407079, -0.01796894, 0.0066620354, 0.047143027, -0.029257888, 0.011833643, 0.014174034, -0.018854817, -0.04343192, 0.007314472, -0.0069373753, -0.07541927, 0.028252296, -0.012366366, -0.022721551, -0.03720684, 0.04661629, -0.00916404, -0.018723132, 0.01953718, -0.0069912462, 0.006278953, 0.022410296, -0.013324071, -0.04158834, -0.020375174, -0.006141283, -0.044748764, 0.0026845667, 0.036273077, 0.004528149, 0.0015861989, -0.0009891297, -0.013000845, 0.0150479395, -0.0060095987, 0.003603365, -0.0039445474, 0.0033579532, -0.00604252, -0.033974584, 0.0043755146, -0.024804559, 0.01787317, -0.049657002, -0.012019197, -0.020961767, -0.002461601, 0.0036692072, -0.022925062, -0.055068035, 0.006733863, 0.034070354, -0.020327289, 0.027510075, -0.027414305, 0.0012659662, 0.009517193, 0.041205257, -0.008840814, -0.044748764, -0.021608219, 0.011666044, -0.011911456, 0.028252296, 0.010588625, -0.026672084, -0.017190805, -0.010738267, 0.054206103, -0.0059078424, 0.0022550959, 0.0017403295, -0.026672084, -0.004219888, 0.024684846, 0.0339267, -0.017681628, 0.033878814, -0.011444574, -0.023104632, -0.00062363053, 0.010391099, -0.0008896181, 0.01798091, -0.025403125, 0.009391494, 0.024613017, 0.019549154, -0.036464617, 0.004001411, 0.013359984, -0.004234852, -0.015718333, -0.010971707, -0.021871587, -0.036823757, -0.0441502, 0.020937825, 0.010834037, -0.017166862, 0.013623353, 0.0057133087, 0.023882767, -0.0073743286, -0.035770282, -0.005515782, 0.0015001551, 0.020734312, 0.023655312, 0.015478907, -0.0020725334, 0.013647296, -0.05411033, 0.01145056, 0.02225467, -0.0032262686, 0.040056013, -0.019237898, 0.036991354, -0.00542899, -0.018543562, 0.0264566, -0.014988083, -0.04369529, 0.03313659, -0.004558077, -0.013790952, -0.028324125, 0.007470099, 0.01814851, 0.021380764, 0.027318535, 0.03639279, -0.0015712348, -0.013994464, 0.010672425, 0.038164545, -0.032250714, -0.00093226595, 0.023451801, 0.0054888465, -0.037973, 0.0041480595, 0.012785361, -0.02973674, 0.0019019422, 0.032346487, 0.027150936, -0.012091026, 0.014401489, 0.09423817, 0.011827656, 0.03675193, -0.0067997053, -0.016664067, 0.007966909, 0.018747075, -0.0295452, 0.0008978484, -0.025929863, -0.0030347276, -0.0025798178, 0.013707153, -0.015023997, 0.0080147935, -0.030670501, 0.020782199, -0.0002035123, -0.01514371, 0.010947765, 0.021464562, -0.011247048, 0.007308486, -0.0102534285, 0.036464617, -0.021572305, -0.03158032, 0.0072007445, 0.017298546, 0.0094453655, -0.0052554063, 0.0032262686, -0.0036991355, 0.01679575, -0.010989665, 0.015454964, -0.015083853, 0.003049692, -0.011803714, 0.004704726, -0.01871116, 0.0045790267, -0.02235044, 0.00014711621, -0.0048872884, -0.032322545, -0.004746625, 0.022146927, 0.049513347, 0.016580267, 0.0036632216, 0.045586757, 0.0010908858, 0.0015196084, 0.0009509711, -0.022649722, 0.013527582, -0.032968994, -0.026696026, 0.013575468, 0.045323387, -0.005414026, -0.007901066, 0.010606582, -0.06938572, 0.012533964, -0.0013460245, -0.0037440278, -0.00976859, -0.017944997, -0.015035968, 0.04245027, -0.01697532, -0.002916511, 0.042546045, 0.006787734, 0.026289001, 0.0020590657, 0.008421818, -0.029497314, -0.004195945, -0.0070271604, -0.013778981, -0.00016890773, 0.0116840005, -0.005138686, 0.009295724, 0.034357667, 0.0037230782, -0.01587396, -0.0053511765, -0.025977748, 0.003289118, -0.013336042, -0.016999263, -0.013324071, 0.011109378, 0.00013056213, 0.011875542, 0.0148803415, -0.011552316, -0.0005207521, 0.036105476, 0.01368321, 0.017573886, 0.027677674, -0.014736685, -0.027581904, -0.00121434, -0.012929018, 0.00985239, 0.0050399224, -0.009074255, -0.0021293971, 0.017382346, -0.0065722503, -0.0019603025, 0.0156105915, -0.03574634, -0.028036814, -0.005548703, 0.0013767009, 0.0052943127, -0.003603365, -0.0038038844, 0.0235356, -0.0235356, -0.027797388, 0.009684792, 0.016915465, -0.032418314, 0.0155507345, 0.03641673, -0.004923202, -0.019489296, 0.008990455, 0.022386353, 0.018938616, -0.020554744, -0.0034327738, 0.035363257, 0.0030481953, -0.029090289, -0.023403915, 0.02262578, -0.00017003005, 0.0014874355, -0.035770282, 0.007835224, 0.033806987, 0.025714379, -2.2352684e-05, 0.020004062, 0.0055636675, -0.002959907, -0.03866734, -0.036895584, -0.0017508044, 0.049704887, -0.00013851182, -0.043264322, -0.0041301027, -0.011288947, -0.032083116, 0.009211925, -0.03203523, -0.000372233, -0.041492566, -0.035913937, 0.0074222134, -0.010432999, -0.0015861989, 0.015035968, 0.0018600427, 0.019944206, -0.03210706, 0.0040103896, -0.0006067959, -0.028491722, -0.006721892, 0.0041301027, -0.030024052, 0.033447847, 0.021057539, -0.0144852875, 0.002750409, -0.017466145, -0.005315263, 0.017717542, 0.008948556, 0.01714292, 0.020758256, 0.007152859, -0.032418314, -0.019345641, 0.018902702, -0.024684846, 0.0104509555, 0.02636083, -0.03440555, -0.028946633, 0.007751425, -0.006126319, -0.024613017, 0.005847986, -0.0134796975, 0.022410296, -0.0125818495, 0.010863966, 0.018304136, -0.016400699, 0.0128093045, 0.06713512, -0.005922807, 0.006817662, -0.017897112, 0.005360155, -0.004109153, 0.0139824925, 0.022996891, -0.013180415, 0.024277821, 0.0004893274, 0.012210739, -0.049010552, -0.021787789, 0.022206783, -0.002497515, 0.008110564, -0.013012816, -0.0032651755, 0.050375283, 0.032897167, -0.024660904, 0.03275351, 0.021225136, -0.0051745996, 0.0049980227, 0.019094244, -0.0044024503, -0.005964706, -0.017011235, 0.022111014, -0.006841605, 0.041995365, 0.008876728, -0.015861988, -0.031412724, -0.0065004225, 0.0012749447, -0.040079955, 0.033878814, -0.007972894, -0.0049860515, -0.009864361, 0.027126994, -0.03404641, 0.019752664, 0.026815739, 0.026217174, -0.04058275, 0.006967304, -0.01568242, -0.018363994, -0.0132522425, 0.010397085, 0.0073384144, -0.010391099, 0.008385904, 0.0083499905, 0.007140888, 0.007948952, 0.045275502, 0.028443838, 0.032538027, 0.007823252, -0.013563497, -0.020075891, 0.01109142, -0.004573041, 0.008535545, 0.013395898, -0.022422267, 0.0024346656, -0.007601783, 0.029593084, -0.029114231, -0.003552487, -0.015299337, -0.005964706, -0.016436612, 0.047669765, -0.035770282, -0.028659321, 0.03203523, -0.0081524635], "content_hash": "91a1d62385fd3abd0b4a12e1d84489e3"}]
 
 
rag_docs/embeddings_explained.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Embeddings Explained
2
+
3
+ ## What Are Embeddings?
4
+
5
+ After tokenization breaks your text into token IDs, the model needs to convert those IDs into something it can actually compute with. This is where **embeddings** come in. An embedding is a list of numbers (a **vector**) that represents a token's meaning.
6
+
7
+ ## The Lookup Table
8
+
9
+ Think of embeddings as a giant dictionary. Each token ID maps to a specific vector of numbers. For GPT-2, each token maps to a vector of **768 numbers**. For larger models, this vector might have 2048, 4096, or more numbers.
10
+
11
+ This dictionary (called an **embedding table**) was learned during training. The model figured out which numbers best represent each token by seeing how tokens are used across billions of text examples. Once training is done, the table is fixed -- the same token always maps to the same vector.
12
+
13
+ ## Why Vectors?
14
+
15
+ Why use lists of numbers instead of just the token ID? Because vectors let the model capture **meaning** and **relationships**:
16
+
17
+ - Words with similar meanings (like "happy" and "joyful") end up with similar vectors
18
+ - Related concepts are grouped nearby in the vector space
19
+ - Directions in the space can capture relationships (e.g., "king" - "man" + "woman" ≈ "queen")
20
+
21
+ This is what allows the model to generalize -- even if it has never seen a specific sentence before, it can work with the underlying meanings of the tokens.
22
+
23
+ ## Positional Information
24
+
25
+ There's one more detail: the model also needs to know the **order** of tokens (since "dog bites man" is different from "man bites dog"). This is handled by adding **positional encodings** to the embeddings -- extra numbers that tell the model where each token sits in the sequence.
26
+
27
+ ## What You See in the Dashboard
28
+
29
+ In **Stage 2 (Embedding)** of the pipeline, you can see:
30
+
31
+ - The dimension of the embedding vectors (e.g., "768-dimensional" for GPT-2)
32
+ - A visual showing the flow: Token ID → Lookup Table → Vector
33
+ - An explanation of how the lookup table was created during training
34
+
35
+ This is the point where raw token IDs become rich numerical representations that the rest of the model can process.
rag_docs/experiment_beam_search.md ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Exploring Alternative Predictions with Beam Search
2
+
3
+ ## Goal
4
+
5
+ Learn how beam search reveals multiple possible continuations of a prompt, and see how ablating attention heads can redirect the model's generation from one path to another.
6
+
7
+ ## Prerequisites
8
+
9
+ - Complete "Your First Analysis"
10
+ - Complete "Your First Ablation"
11
+
12
+ ## Steps
13
+
14
+ ### Step 1: Generate Multiple Beams
15
+
16
+ 1. Select **GPT-2 (124M)** and enter the prompt: `Once upon a time there was a`
17
+ 2. Set **Number of Generation Choices (Beams)** to **3**.
18
+ 3. Set **Number of New Tokens** to **8**.
19
+ 4. Click **Analyze**.
20
+
21
+ ### Step 2: Compare the Beams
22
+
23
+ You should see 3 different generated sequences. Look at how they differ:
24
+ - **Beam 1**: The model's top-ranked overall sequence
25
+ - **Beam 2**: The second-best sequence
26
+ - **Beam 3**: The third-best sequence
27
+
28
+ Notice how they might start the same but diverge at some point. For example:
29
+ - Beam 1: "Once upon a time there was a young man who lived"
30
+ - Beam 2: "Once upon a time there was a little girl who loved"
31
+ - Beam 3: "Once upon a time there was a king who ruled"
32
+
33
+ The beams share a common prefix because the early tokens were confident, but as generation continues, different paths emerge.
34
+
35
+ ### Step 3: Select a Beam for Comparison
36
+
37
+ 1. Click **"Select for Comparison"** on Beam 1 (the top-ranked sequence).
38
+ 2. This stores it as the baseline for ablation comparison.
39
+
40
+ ### Step 4: Investigate What Drives the Divergence
41
+
42
+ 1. Look at **Stage 5 (Output)** to see the top-5 predictions for the immediate next token.
43
+ 2. Note the top prediction and its probability. Are the alternatives close in probability? If so, the model was uncertain, which explains why beams diverge early.
44
+
45
+ ### Step 5: Ablate a Head and Re-Generate
46
+
47
+ 1. Go to the **Ablation** tab in the Investigation Panel.
48
+ 2. From the head categories in Stage 3, pick a **Previous-Token** head (e.g., L0-H3).
49
+ 3. Add it and click **"Run Ablation Experiment."**
50
+ 4. Look at the **Full Generation Comparison**:
51
+ - Did the ablated generation diverge from the original?
52
+ - Did the model take a completely different path, or just change a word or two?
53
+ - Did the ablated generation match one of the other beams you saw earlier?
54
+
55
+ ### Step 6: Try a Stronger Ablation
56
+
57
+ 1. **Clear** the selected heads.
58
+ 2. Add **two or three** Previous-Token heads from different layers.
59
+ 3. Run the ablation again.
60
+ 4. Compare: does ablating multiple heads cause a bigger divergence than ablating one?
61
+
62
+ ### Step 7: Experiment with Different Beam Settings
63
+
64
+ 1. Change the prompt to: `The scientist discovered that the`
65
+ 2. Try with **1 beam** (greedy decoding): note the single output.
66
+ 3. Try with **3 beams**: see the alternatives.
67
+ 4. Try with **5 beams**: do you get even more diverse options?
68
+
69
+ Notice how increasing beams reveals the model's uncertainty -- places where multiple continuations are roughly equally likely.
70
+
71
+ ## What You Should Learn
72
+
73
+ - **Beam search reveals model uncertainty**: When the model isn't sure, multiple beams show the different paths it's considering.
74
+ - **Ablation can redirect generation**: Removing important heads can push the model from one beam to another, showing that different attention heads support different generation paths.
75
+ - **More beams = more alternatives**: But beyond 3-5 beams, the additional paths are often low-probability and less interesting.
76
+ - **Generation is a chain**: Each token depends on the previous ones, so a small change early (from ablation or beam selection) can cascade into a very different output.
77
+
78
+ ## What's Next?
79
+
80
+ You've now completed the core experiments. Try combining techniques:
81
+ - Run attribution to find which input tokens matter, then ablate the heads that seem to process those tokens
82
+ - Compare how GPT-2 and Qwen2.5-0.5B handle the same prompt with the same beam settings
rag_docs/experiment_comparing_heads.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Comparing Heads Across Categories
2
+
3
+ ## Goal
4
+
5
+ Systematically ablate heads from each category to discover which types of attention heads matter most for different prompts. Build intuition for how attention head roles vary.
6
+
7
+ ## Prerequisites
8
+
9
+ - Complete "Your First Ablation" (know how to use the ablation panel)
10
+ - Complete "Exploring Attention Patterns" (understand head categories)
11
+
12
+ ## Steps
13
+
14
+ ### Step 1: Set Up a Simple Prompt
15
+
16
+ 1. Select **GPT-2 (124M)** and enter: `The cat sat on the`
17
+ 2. Set beams to 1 and tokens to 5.
18
+ 3. Click **Analyze**.
19
+ 4. **Select the generated sequence for comparison** by clicking "Select for Comparison."
20
+ 5. Note the original prediction and probability in Stage 5 (e.g., "mat" at 45%).
21
+
22
+ ### Step 2: Record the Head Categories
23
+
24
+ 1. Expand **Stage 3 (Attention)** and note one head from each category:
25
+ - **Previous-Token**: _______ (e.g., L0-H3)
26
+ - **First/Positional**: _______ (e.g., L0-H1)
27
+ - **Bag-of-Words**: _______ (e.g., L2-H5)
28
+ - **Syntactic**: _______ (e.g., L4-H2)
29
+ - **Other**: _______ (e.g., L1-H8)
30
+
31
+ ### Step 3: Ablate One Head at a Time
32
+
33
+ For each head you noted, do the following:
34
+ 1. Go to the **Ablation** tab in the Investigation Panel.
35
+ 2. **Clear** any previously selected heads.
36
+ 3. **Add** just the one head from the current category.
37
+ 4. Click **"Run Ablation Experiment."**
38
+ 5. Record the results:
39
+
40
+ | Category | Head | Probability Change | Generation Changed? |
41
+ |----------|------|-------------------|-------------------|
42
+ | Previous-Token | | | |
43
+ | First/Positional | | | |
44
+ | Bag-of-Words | | | |
45
+ | Syntactic | | | |
46
+ | Other | | | |
47
+
48
+ ### Step 4: Analyze Your Results
49
+
50
+ Look at the table you've filled in:
51
+ - **Which category caused the biggest probability drop?** Previous-Token heads often have the largest impact on simple prompts because local context matters a lot.
52
+ - **Which category had the least effect?** BoW and Other heads often show smaller effects for short prompts.
53
+ - **Did any ablation change the generated text?** A generation change is a stronger signal than just a probability change.
54
+
55
+ ### Step 5: Try a More Complex Prompt
56
+
57
+ Now repeat the process with a prompt that requires more sophisticated processing:
58
+
59
+ 1. Enter: `The doctors told the patient that they would need`
60
+ 2. Analyze and select the generation for comparison.
61
+ 3. Ablate one head from each category again and record results.
62
+
63
+ **What to expect**: For this more complex prompt:
64
+ - **Syntactic heads** may matter more (there are grammatical dependencies like "doctors...they")
65
+ - **First/Positional heads** may show more impact because the sentence structure is more complex
66
+ - The pattern of which categories matter may shift compared to the simple prompt
67
+
68
+ ### Step 6: Compare Results Between Prompts
69
+
70
+ | Category | Simple Prompt Impact | Complex Prompt Impact |
71
+ |----------|---------------------|----------------------|
72
+ | Previous-Token | | |
73
+ | First/Positional | | |
74
+ | Bag-of-Words | | |
75
+ | Syntactic | | |
76
+ | Other | | |
77
+
78
+ ## What You Should Learn
79
+
80
+ - No single head category is always the "most important" -- it depends on the prompt
81
+ - Simple prompts tend to rely more on Previous-Token heads (local patterns)
82
+ - Complex prompts with grammatical dependencies may rely more on Syntactic heads
83
+ - Some heads are redundant for certain inputs but critical for others
84
+ - Ablation is most informative when you compare across conditions (categories, prompts, or both)
85
+
86
+ ## Advanced Challenge
87
+
88
+ Try ablating **two heads simultaneously** from the same category. Does removing two Previous-Token heads have a bigger effect than removing one? Or does the model have enough redundancy to compensate?
rag_docs/experiment_exploring_attention.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Exploring Attention Patterns
2
+
3
+ ## Goal
4
+
5
+ Learn to read attention visualizations and understand what different attention head categories reveal about how the model processes text.
6
+
7
+ ## Prerequisites
8
+
9
+ Complete "Your First Analysis" first so you're familiar with the basic workflow.
10
+
11
+ ## Steps
12
+
13
+ ### Step 1: Run an Analysis
14
+
15
+ 1. Select **GPT-2 (124M)** and enter the prompt: `The cat sat on the mat because it was`
16
+ 2. Click **Analyze**.
17
+ 3. This prompt is ideal because it contains a pronoun ("it") that needs to be resolved -- the model must figure out that "it" refers to "the cat."
18
+
19
+ ### Step 2: Open the Attention Stage
20
+
21
+ 1. Expand **Stage 3 (Attention)** in the pipeline.
22
+ 2. Look at the **head categories** section at the top.
23
+
24
+ ### Step 3: Explore Previous-Token Heads
25
+
26
+ 1. Click on **"Previous-Token"** to expand the category.
27
+ 2. Note which heads are listed (e.g., L0-H3, L1-H7).
28
+ 3. In the **BertViz visualization** below, double-click on one of those head squares. This will show only that head's attention pattern.
29
+ 4. **What to look for**: You should see a strong diagonal pattern -- each token attends heavily to the token directly before it. Lines should mostly connect each word to the previous word.
30
+
31
+ ### Step 4: Explore First/Positional Heads
32
+
33
+ 1. Click on **"First/Positional"** to see which heads focus on the first token.
34
+ 2. Double-click one of those heads in BertViz.
35
+ 3. **What to look for**: You should see many tokens sending attention lines to "The" (the first token). This is a common pattern -- the first token acts as a "sink" for attention when there's no better target.
36
+
37
+ ### Step 5: Explore Bag-of-Words Heads
38
+
39
+ 1. Find a **"Bag-of-Words"** head in the categories.
40
+ 2. View it in BertViz.
41
+ 3. **What to look for**: Attention should be spread broadly and evenly across many tokens. Lines will be thin and numerous rather than thick and focused. This head is gathering a general summary of the whole input.
42
+
43
+ ### Step 6: Look for Interesting Patterns
44
+
45
+ 1. Now single-click to select multiple heads from different categories.
46
+ 2. Look for heads where the token "it" attends strongly to "cat" -- this would suggest the head is helping resolve the pronoun reference.
47
+ 3. Try hovering over the word "it" on the left side of BertViz to see which words it attends to most strongly.
48
+
49
+ ### Step 7: Try Different Prompts
50
+
51
+ Run the analysis again with different prompts and compare:
52
+ - `Alice gave the book to Bob because she` (pronoun resolution: does "she" attend to "Alice"?)
53
+ - `The dogs in the park were` (subject-verb agreement: does the model connect "dogs" to "were"?)
54
+ - `1 2 3 4 5` (number sequence: what patterns emerge with non-natural-language input?)
55
+
56
+ ## What You Should Learn
57
+
58
+ - Different attention heads serve different purposes
59
+ - Previous-Token heads are the easiest to identify visually (strong diagonal pattern)
60
+ - The same prompt can reveal different patterns in different heads
61
+ - BertViz is a powerful tool for understanding attention, but it takes practice to read fluently
62
+ - Not all heads have obvious patterns -- and that's okay. The "Other" category captures complex, context-dependent behavior.
rag_docs/experiment_first_ablation.md ADDED
@@ -0,0 +1,79 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Your First Ablation
2
+
3
+ ## Goal
4
+
5
+ Learn how ablation works by removing an attention head and observing how it changes the model's prediction. Discover which heads matter and which are redundant.
6
+
7
+ ## Prerequisites
8
+
9
+ - Complete "Your First Analysis"
10
+ - Complete "Exploring Attention Patterns" (so you know about head categories)
11
+
12
+ ## Steps
13
+
14
+ ### Step 1: Set Up the Analysis
15
+
16
+ 1. Select **GPT-2 (124M)** and enter the prompt: `The cat sat on the`
17
+ 2. Set **Number of Generation Choices** to 1 and **Number of New Tokens** to 5.
18
+ 3. Click **Analyze**.
19
+ 4. Note the model's prediction in Stage 5 and the generated sequence.
20
+
21
+ ### Step 2: Select a Sequence for Comparison
22
+
23
+ 1. In the generated sequences section, click **"Select for Comparison"** on the generated sequence.
24
+ 2. This stores the original generation so the ablation experiment can compare against it.
25
+
26
+ ### Step 3: Find a Head to Ablate
27
+
28
+ 1. Expand **Stage 3 (Attention)** and look at the head categories.
29
+ 2. Find a head from the **Previous-Token** category. Let's say it's **L0-H3** (yours may differ).
30
+ 3. Note this head -- Previous-Token heads often have noticeable effects when removed.
31
+
32
+ ### Step 4: Set Up the Ablation
33
+
34
+ 1. Scroll down to the **Investigation Panel** and make sure the **"Ablation"** tab is selected.
35
+ 2. In the **Layer** dropdown, select the layer of your chosen head (e.g., 0).
36
+ 3. In the **Head** dropdown, select the head number (e.g., 3).
37
+ 4. Click the **+** button to add it. You should see a chip appear: "L0-H3".
38
+
39
+ ### Step 5: Run the Ablation
40
+
41
+ 1. Click **"Run Ablation Experiment"**.
42
+ 2. Wait for results to appear.
43
+
44
+ ### Step 6: Analyze the Results
45
+
46
+ Look at the ablation results:
47
+
48
+ - **Full Generation Comparison**: Compare the original text to the ablated text. Did the generated sequence change?
49
+ - **Probability Change**: Look at the immediate next-token probability change. For example, "72.3% → 45.1% (-27.2%)" would mean removing this head significantly reduced the model's confidence.
50
+
51
+ ### Step 7: Try Ablating a Different Head
52
+
53
+ 1. Click **"Clear Selected Heads"** to reset.
54
+ 2. Now pick a head from the **"Other"** category (these often have less obvious roles).
55
+ 3. Add it and run the ablation again.
56
+ 4. Compare: was the effect larger or smaller than the Previous-Token head?
57
+
58
+ ### Step 8: Compare Your Results
59
+
60
+ | Head | Category | Probability Change | Generation Changed? |
61
+ |------|----------|-------------------|-------------------|
62
+ | L0-H3 | Previous-Token | (fill in) | (yes/no) |
63
+ | L?-H? | Other | (fill in) | (yes/no) |
64
+
65
+ **Typical findings**:
66
+ - Previous-Token heads in early layers often cause noticeable probability drops when ablated
67
+ - Many "Other" heads have minimal impact for simple prompts
68
+ - The same head may matter more or less depending on the specific prompt
69
+
70
+ ## What You Should Learn
71
+
72
+ - Ablation is a tool for measuring the importance of individual model components
73
+ - Not all heads are equally important -- some are redundant
74
+ - The effect of ablation depends on the specific input prompt
75
+ - This technique is used by researchers to understand how models work internally
76
+
77
+ ## What's Next?
78
+
79
+ Move on to **Experiment: Token Attribution** to learn a different approach -- instead of removing components, measure which input tokens drive the prediction.
rag_docs/experiment_first_analysis.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Your First Analysis
2
+
3
+ ## Goal
4
+
5
+ Learn how to run your first analysis and walk through each pipeline stage to understand how a transformer model processes text.
6
+
7
+ ## Prerequisites
8
+
9
+ None -- this is the starting experiment.
10
+
11
+ ## Steps
12
+
13
+ ### Step 1: Select a Model
14
+
15
+ 1. In the **generator section** at the top, find the "Select Model" dropdown.
16
+ 2. Choose **"GPT-2 (124M)"** from the list.
17
+ 3. Wait for the model to load. You'll see a status message indicating the model is ready.
18
+
19
+ ### Step 2: Enter a Prompt
20
+
21
+ 1. In the **"Enter Prompt"** textarea, type: `The cat sat on the`
22
+ 2. Leave the generation settings at their defaults (1 beam, a few tokens).
23
+
24
+ ### Step 3: Run the Analysis
25
+
26
+ 1. Click the **"Analyze"** button.
27
+ 2. Wait for the analysis to complete. The pipeline stages and generation results will appear.
28
+
29
+ ### Step 4: Explore the Generated Sequences
30
+
31
+ Look at the **generated sequence(s)** below the generator. You should see how GPT-2 continues your prompt. Common completions might include "mat," "floor," "bed," or similar words.
32
+
33
+ ### Step 5: Walk Through the Pipeline
34
+
35
+ Now expand each of the **5 pipeline stages** by clicking on them:
36
+
37
+ **Stage 1 - Tokenization**: Click to expand. You'll see your prompt split into tokens. Notice how each word (and its leading space) becomes a separate token. Count the tokens -- "The cat sat on the" should produce about 5 tokens.
38
+
39
+ **Stage 2 - Embedding**: Click to expand. You'll see that each token was converted into a 768-dimensional vector. This is GPT-2's hidden dimension.
40
+
41
+ **Stage 3 - Attention**: Click to expand. This is the richest stage:
42
+ - Look at the **head categories**. You should see heads grouped into Previous-Token, First/Positional, Bag-of-Words, Syntactic, and Other.
43
+ - Click on a category (like "Previous-Token") to see which specific heads belong to it.
44
+ - Below the categories, you'll see the **BertViz visualization**. Try clicking on individual head squares to see their attention patterns.
45
+
46
+ **Stage 4 - MLP**: Click to expand. You'll see the expand-compress pattern: 768 → 3072 → 768. This shows GPT-2's feed-forward network dimensions.
47
+
48
+ **Stage 5 - Output**: Click to expand. You'll see:
49
+ - Your prompt with the predicted next token highlighted
50
+ - The confidence percentage
51
+ - A top-5 bar chart showing the model's top predictions
52
+
53
+ ### Step 6: Reflect
54
+
55
+ Think about what you observed:
56
+ - How many tokens did your prompt become?
57
+ - What was the model's top prediction? How confident was it?
58
+ - Were there any surprising alternative predictions in the top 5?
59
+
60
+ ## What's Next?
61
+
62
+ Try changing the prompt and running the analysis again. Compare results with different inputs:
63
+ - A factual prompt: "The capital of France is"
64
+ - A creative prompt: "Once upon a time, there was a"
65
+ - A technical prompt: "The function takes an input and"
66
+
67
+ Then move on to **Experiment: Exploring Attention Patterns** to dive deeper into what the attention heads are doing.
rag_docs/experiment_token_attribution.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment: Understanding Token Attribution
2
+
3
+ ## Goal
4
+
5
+ Learn how to use token attribution to identify which parts of your input most influenced the model's prediction. Compare two attribution methods and see how results change with different target tokens.
6
+
7
+ ## Prerequisites
8
+
9
+ - Complete "Your First Analysis"
10
+
11
+ ## Steps
12
+
13
+ ### Step 1: Run an Analysis with a Meaningful Prompt
14
+
15
+ 1. Select **GPT-2 (124M)** and enter the prompt: `The capital of France is`
16
+ 2. Click **Analyze**.
17
+ 3. Check Stage 5 -- the model should predict something like "Paris" or "the" with high confidence. Note the top prediction.
18
+
19
+ ### Step 2: Open the Attribution Panel
20
+
21
+ 1. Scroll down to the **Investigation Panel**.
22
+ 2. Click the **"Token Attribution"** tab.
23
+
24
+ ### Step 3: Run Simple Gradient Attribution
25
+
26
+ 1. Select **"Simple Gradient (faster, less accurate)"** as the attribution method.
27
+ 2. Leave the **Target Token** dropdown empty (this defaults to the top prediction).
28
+ 3. Click **"Compute Attribution"**.
29
+
30
+ ### Step 4: Read the Results
31
+
32
+ Look at the two visualizations:
33
+
34
+ **Color-coded tokens**: Your input tokens are displayed as colored boxes. Darker blue means higher influence.
35
+ - You should see that **"France"** has a very dark color -- it's the most relevant token for predicting "Paris."
36
+ - **"capital"** likely also has a notable color -- it sets up the context for a city name.
37
+ - Function words like **"The"**, **"of"**, and **"is"** should be lighter -- they contribute less to this specific prediction.
38
+
39
+ **Bar chart**: Shows the same information as horizontal bars with scores. Longer bars = more influence.
40
+
41
+ **Hover over any token chip** to see the exact attribution score.
42
+
43
+ ### Step 5: Compare with Integrated Gradients
44
+
45
+ 1. Now switch to **"Integrated Gradients (more accurate, slower)"**.
46
+ 2. Click **"Compute Attribution"** again.
47
+ 3. Compare the results. Integrated Gradients should give a more refined picture:
48
+ - The relative ordering of token importance may shift slightly
49
+ - Integrated Gradients tends to produce more reliable scores, especially for distinguishing tokens of moderate importance
50
+
51
+ ### Step 6: Change the Target Token
52
+
53
+ 1. In the **Target Token** dropdown, select a different token from the top-5 predictions (e.g., if the model also considered "a" or "the" as alternatives to "Paris").
54
+ 2. Run attribution again.
55
+ 3. **What to look for**: Different target tokens are driven by different input tokens. For example:
56
+ - "Paris" might be strongly driven by "France" and "capital"
57
+ - A generic token like "the" might be driven more by "is" (as a common grammatical continuation)
58
+
59
+ ### Step 7: Try a Different Prompt
60
+
61
+ Run attribution on: `Alice gave the book to Bob because she`
62
+ - Which tokens drive the prediction of the next word?
63
+ - Does "Alice" have high attribution (suggesting the model connects "she" to "Alice")?
64
+ - Does "Bob" have lower attribution than "Alice" for this prediction?
65
+
66
+ ## What You Should Learn
67
+
68
+ - Token attribution reveals which input tokens "caused" a particular prediction
69
+ - Content words (nouns, verbs) typically have higher attribution than function words (the, of, is)
70
+ - Different target tokens can be driven by completely different input tokens
71
+ - Integrated Gradients is more accurate but slower; Simple Gradient gives a quick approximation
72
+ - Attribution helps you understand the "why" behind a model's prediction
73
+
74
+ ## What's Next?
75
+
76
+ Move on to **Experiment: Comparing Heads** to combine ablation with your understanding of head categories, or try **Experiment: Beam Search** to explore how the model generates longer sequences.
rag_docs/gpt2_overview.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GPT-2 Overview
2
+
3
+ ## What Is GPT-2?
4
+
5
+ GPT-2 (Generative Pre-trained Transformer 2) is a language model created by OpenAI in 2019. It was one of the first models to demonstrate that scaling up transformers could produce impressively fluent text. It remains one of the most well-studied and accessible models for learning about transformer internals.
6
+
7
+ ## Architecture Details
8
+
9
+ | Property | Value |
10
+ |----------|-------|
11
+ | Parameters | ~124 million (small variant) |
12
+ | Layers | 12 |
13
+ | Attention Heads | 12 per layer (144 total) |
14
+ | Hidden Dimension | 768 |
15
+ | MLP Dimension | 3072 (4x hidden) |
16
+ | Vocabulary Size | 50,257 tokens |
17
+ | Positional Encoding | Learned absolute positions |
18
+ | Max Sequence Length | 1024 tokens |
19
+ | Normalization | LayerNorm |
20
+ | Activation Function | GELU |
21
+
22
+ ## Why Start with GPT-2?
23
+
24
+ GPT-2 small is the **recommended starting model** for learning with this dashboard:
25
+
26
+ - **Fast**: Small enough to load quickly and run interactively
27
+ - **Well-studied**: More research papers have analyzed GPT-2's internals than almost any other model. Many examples and references use GPT-2.
28
+ - **Clear patterns**: With 12 heads per layer, the attention patterns are easy to visualize and categorize
29
+ - **Manageable size**: 144 total attention heads is small enough to explore systematically
30
+
31
+ ## What to Expect in the Dashboard
32
+
33
+ When analyzing GPT-2, you'll typically see:
34
+
35
+ - **Tokenization**: GPT-2 uses BPE (Byte-Pair Encoding). Common words are single tokens; rare words get split. Spaces are typically attached to the beginning of the following token.
36
+ - **Embeddings**: 768-dimensional vectors, which capture rich semantic information despite being relatively compact.
37
+ - **Attention patterns**: You'll see a good mix of head categories. Expect several Previous-Token heads (especially in early layers), some First/Positional heads, and a variety of other patterns.
38
+ - **Output**: GPT-2 can produce reasonably coherent text for simple prompts. For factual prompts, it sometimes produces outdated or incorrect facts (it was trained on data from before 2019).
39
+
40
+ ## GPT-2 Variants
41
+
42
+ The dashboard supports all GPT-2 sizes, though only the small variant is in the default dropdown:
43
+
44
+ - **GPT-2 Small** (124M params, 12 layers) -- in dropdown as "GPT-2 (124M)"
45
+ - **GPT-2 Medium** (355M params, 24 layers) -- enter `gpt2-medium` in the dropdown
46
+ - **GPT-2 Large** (774M params, 36 layers) -- enter `gpt2-large`
47
+ - **GPT-2 XL** (1.5B params, 48 layers) -- enter `gpt2-xl`
48
+
49
+ Larger variants have more layers and heads but use more memory and are slower.
50
+
51
+ ## HuggingFace Model IDs
52
+
53
+ - `gpt2` or `openai-community/gpt2`
54
+ - `gpt2-medium` or `openai-community/gpt2-medium`
55
+ - `gpt2-large` or `openai-community/gpt2-large`
56
+ - `gpt2-xl` or `openai-community/gpt2-xl`
rag_docs/head_categories_explained.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Attention Head Categories Explained
2
+
3
+ ## What Are Head Categories?
4
+
5
+ The dashboard automatically analyzes all attention heads in the model and categorizes them based on their behavior patterns. This helps you understand what each head is doing without having to inspect every attention map manually.
6
+
7
+ Head categories appear in **Stage 3 (Attention)** of the pipeline. Click any category to expand it and see which specific heads (like L0-H3, L2-H11) belong to it.
8
+
9
+ ## The Five Categories
10
+
11
+ ### Previous-Token Heads
12
+
13
+ **What they do**: These heads strongly attend to the **immediately preceding token**. For every token at position *i*, the head focuses most of its attention on position *i-1*.
14
+
15
+ **Why they matter**: Previous-token heads help the model track local context -- the word that just came before. They're important for bigram patterns (common two-word combinations like "of the" or "in a").
16
+
17
+ **Detection**: A head is classified as Previous-Token if, on average, more than 40% of each token's attention goes to the token directly before it.
18
+
19
+ **In the dashboard**: These heads are labeled with a purple color. Ablating them often causes noticeable changes in predictions.
20
+
21
+ ### First/Positional Heads
22
+
23
+ **What they do**: These heads focus heavily on the **first token** in the sequence or show strong **positional patterns** (always attending to a specific position regardless of content).
24
+
25
+ **Why they matter**: The first token often serves as a "default" attention target. Positional heads help the model keep track of where it is in the sequence.
26
+
27
+ **Detection**: Classified when average attention to the first token exceeds 25%.
28
+
29
+ ### Bag-of-Words (BoW) Heads
30
+
31
+ **What they do**: These heads spread their attention **broadly and evenly** across many tokens, without focusing strongly on any particular one.
32
+
33
+ **Why they matter**: BoW heads capture a general summary of the entire input. They help the model maintain an overall sense of what the text is about.
34
+
35
+ **Detection**: Classified when the attention distribution has high entropy (≥ 0.65 normalized) and no single token receives more than 35% attention.
36
+
37
+ ### Syntactic Heads
38
+
39
+ **What they do**: These heads attend to tokens at **consistent distances**, suggesting they track grammatical or structural relationships (like subject-verb pairs).
40
+
41
+ **Why they matter**: Syntactic heads help the model understand grammar and sentence structure. They might connect a verb to its subject or a pronoun to what it refers to.
42
+
43
+ **Detection**: Classified when tokens consistently attend to other tokens at similar distances, with low variance in attention distances.
44
+
45
+ ### Other
46
+
47
+ **What they do**: Heads that don't clearly fit any of the above patterns. They may have mixed or context-dependent behavior.
48
+
49
+ **Why they matter**: "Other" doesn't mean unimportant. These heads may serve specialized roles that only activate for certain inputs. They're worth investigating through ablation experiments.
50
+
51
+ ## Using Categories for Experiments
52
+
53
+ Head categories are especially useful for guiding ablation experiments:
54
+ - Ablate a **Previous-Token** head to see if local context patterns break
55
+ - Ablate a **BoW** head to see if the model loses global context
56
+ - Compare the effect of ablating heads from different categories on the same prompt
rag_docs/interpreting_ablation_results.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Interpreting Ablation Results
2
+
3
+ ## Quick Reference
4
+
5
+ When you ablate an attention head and see the results, here's how to interpret what happened.
6
+
7
+ ## Probability Changes
8
+
9
+ The dashboard shows the immediate next-token probability before and after ablation (e.g., "72.3% → 45.1% (-27.2%)").
10
+
11
+ ### Large Probability Drop (>10%)
12
+
13
+ The ablated head was **important** for this prediction. It was actively contributing to the model's confidence in the top token. This head likely plays a significant role in processing this specific input.
14
+
15
+ **Example**: Ablating a Previous-Token head when the model is predicting a word that commonly follows the previous word (like predicting "the" after "on").
16
+
17
+ ### Small Probability Drop (1-10%)
18
+
19
+ The head has **some contribution** but isn't critical. Other heads or MLP layers may provide overlapping information. The model has some redundancy that compensates for the missing head.
20
+
21
+ ### Negligible Change (<1%)
22
+
23
+ The head was likely **redundant for this input**. It may serve a function that isn't relevant to this particular prompt, or other heads provide the same information.
24
+
25
+ **Important**: This doesn't mean the head is useless -- it might be critical for other prompts. Try the same head with different inputs.
26
+
27
+ ### Probability Increase
28
+
29
+ Occasionally, ablating a head can **increase** the probability of the top prediction. This means the head was actually pulling the model away from this prediction -- it was a "competing signal." This is an interesting finding that suggests the head was promoting a different output.
30
+
31
+ ## Generation Changes
32
+
33
+ The full generation comparison shows whether the ablated model produces different text.
34
+
35
+ ### Generation Changed
36
+
37
+ The head was important enough that removing it altered the model's entire output sequence. This is a strong signal of importance. Look at where the texts diverge -- the point of divergence tells you where the head's contribution was most critical.
38
+
39
+ ### Generation Stayed the Same
40
+
41
+ Even if the probability shifted, the model still chose the same tokens. This means the head's contribution wasn't large enough to cross the decision boundary. The model is robust to losing this head for this particular input.
42
+
43
+ ## Multi-Head Ablation
44
+
45
+ When you ablate multiple heads simultaneously:
46
+
47
+ - **Additive effects**: If ablating heads A and B together has a bigger effect than either alone, the heads contributed independently to the prediction.
48
+ - **Redundant heads**: If ablating both has about the same effect as ablating just one, the heads may have been providing the same information.
49
+ - **Synergistic effects**: Rarely, ablating two heads together can have a much larger effect than the sum of their individual effects. This suggests the heads work together as a circuit.
50
+
51
+ ## Tips for Interpretation
52
+
53
+ - Always compare ablation effects across different head categories on the same prompt
54
+ - Try the same head on multiple prompts to see if its importance is consistent or input-dependent
55
+ - A head's category (Previous-Token, Syntactic, etc.) gives you a hypothesis about why it matters -- ablation lets you test that hypothesis
56
+ - Remember that ablation is a blunt tool: removing a head removes all of its functions, not just the one you're interested in
rag_docs/interpreting_attention_maps.md ADDED
@@ -0,0 +1,70 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Interpreting Attention Maps
2
+
3
+ ## Quick Reference
4
+
5
+ The BertViz attention visualization in Stage 3 shows how tokens attend to each other. Here's how to read the patterns.
6
+
7
+ ## Reading the Visualization
8
+
9
+ The BertViz display shows:
10
+ - **Left column**: Each token in your input (the "query" -- the token doing the looking)
11
+ - **Right column**: The same tokens (the "keys" -- what's being looked at)
12
+ - **Lines between them**: Attention connections. Each line shows how much one token attends to another.
13
+
14
+ ### Line Properties
15
+ - **Thicker, more opaque lines** = stronger attention (the model focuses more on this connection)
16
+ - **Thin, faint lines** = weak attention (some attention, but not much)
17
+ - **No line** = very little or no attention between those tokens
18
+
19
+ ### Interacting with BertViz
20
+ - **Single-click** a head square at the top to select/deselect it
21
+ - **Double-click** a head square to view only that head (deselects all others)
22
+ - **Hover** over a token or line to see exact attention weights
23
+
24
+ ## Common Attention Patterns
25
+
26
+ ### Diagonal Pattern (Previous-Token)
27
+
28
+ You see each token strongly attending to the token directly before it, creating a diagonal line of strong connections.
29
+
30
+ **What it means**: This head tracks local word order. It's useful for bigram patterns -- sequences of two words that commonly appear together.
31
+
32
+ **Looks like**: A staircase pattern of thick lines, each shifted one position to the left.
33
+
34
+ ### Vertical Stripe (First-Token / Positional)
35
+
36
+ You see many or all tokens attending to the same position (usually the first token), creating a vertical column of lines.
37
+
38
+ **What it means**: The first token often serves as a "default sink" for excess attention. This is an artifact of the softmax function -- attention weights must sum to 1.0, so when a head has nothing specific to attend to, it sends attention to the first token.
39
+
40
+ **Looks like**: Many thick lines all pointing to the same token on the right side.
41
+
42
+ ### Uniform / Diffuse (Bag-of-Words)
43
+
44
+ You see many thin lines spreading from each token to many other tokens, with no strong focus on any particular one.
45
+
46
+ **What it means**: This head is gathering a broad summary of the entire input, rather than focusing on specific relationships. It helps the model maintain an overall sense of context.
47
+
48
+ **Looks like**: A dense web of thin, similarly-weighted lines.
49
+
50
+ ### Structured Connections (Syntactic)
51
+
52
+ You see specific, purposeful-looking connections that skip across tokens -- like a token attending to a word several positions away in a consistent pattern.
53
+
54
+ **What it means**: This head may be tracking grammatical relationships. For example, a verb attending to its subject, or a pronoun attending to its antecedent.
55
+
56
+ **Looks like**: A few thick lines making specific connections, often spanning several token positions.
57
+
58
+ ### Mixed / Context-Dependent
59
+
60
+ Some heads show patterns that change based on the input. They might show one pattern for factual prompts and another for creative prompts.
61
+
62
+ **What it means**: These heads are flexible and context-sensitive. They don't have a single fixed pattern but adapt to the input.
63
+
64
+ ## Tips for Reading Attention Maps
65
+
66
+ - **Start with one head at a time**: Double-click individual heads to isolate their patterns. Looking at all heads simultaneously is confusing.
67
+ - **Compare heads across layers**: The same type of pattern (like Previous-Token) may appear in early layers but not late layers, or vice versa.
68
+ - **Match patterns to categories**: Use the head categories as a guide. If a head is categorized as "Previous-Token," look for the diagonal pattern to confirm.
69
+ - **Hover for details**: The exact attention weight tells you more than just the visual thickness. Two lines might look similar but have meaningfully different weights.
70
+ - **Context matters**: The same head can show different patterns for different prompts. Try multiple inputs to understand a head's full behavior.
rag_docs/interpreting_attribution_scores.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Interpreting Attribution Scores
2
+
3
+ ## Quick Reference
4
+
5
+ Token attribution scores tell you how much each input token influenced a specific prediction. Here's how to read and interpret them.
6
+
7
+ ## Understanding the Scores
8
+
9
+ Attribution scores are **normalized** so that the most influential token gets a score of 1.0, and all other scores are relative to it.
10
+
11
+ ### High Score (0.7 - 1.0)
12
+
13
+ This token was **highly influential**. The model relied heavily on this token when making its prediction. For factual predictions, these are usually the content words that carry the key information.
14
+
15
+ **Example**: In "The capital of France is" → "Paris", the token "France" typically gets the highest score because it directly determines which capital the model predicts.
16
+
17
+ ### Medium Score (0.3 - 0.7)
18
+
19
+ This token had **moderate influence**. It contributed context that helped the prediction but wasn't the primary driver.
20
+
21
+ **Example**: In the same prompt, "capital" might get a medium score -- it tells the model to predict a city name, but "France" specifies which one.
22
+
23
+ ### Low Score (0.0 - 0.3)
24
+
25
+ This token had **minimal influence** on this specific prediction. It may be a function word (the, of, is) or a word that doesn't directly relate to what's being predicted.
26
+
27
+ ## Comparing Attribution Methods
28
+
29
+ ### Integrated Gradients vs. Simple Gradient
30
+
31
+ - **Integrated Gradients** averages gradients over many intermediate steps between a "blank" baseline and the actual input. This produces more reliable, less noisy scores. Use it when you want trustworthy results.
32
+ - **Simple Gradient** takes a single gradient measurement. It's faster but can be noisy -- scores may overemphasize some tokens or miss subtle contributions. Good for quick exploration.
33
+
34
+ **When they disagree**: If the two methods give very different rankings, the attribution is likely noisy. Trust Integrated Gradients for the more accurate picture.
35
+
36
+ ## Why Results Vary by Target Token
37
+
38
+ Attribution is computed **with respect to a specific target token**. Different targets can be driven by entirely different input tokens.
39
+
40
+ **Example** with prompt "Alice gave Bob a gift because she":
41
+ - Target "liked": High attribution for "Alice" (she liked something) and "Bob" (she liked Bob)
42
+ - Target "was": High attribution for "she" and "Alice" (describing Alice's state)
43
+ - Target "wanted": High attribution for "gift" (she wanted to give a gift)
44
+
45
+ This is one of the most powerful uses of attribution -- it reveals which input tokens support different possible continuations.
46
+
47
+ ## Common Patterns
48
+
49
+ - **Content words dominate**: Nouns, verbs, and adjectives typically have higher attribution than function words
50
+ - **Recent tokens often matter more**: Tokens closer to the prediction point tend to have higher attribution, especially for local patterns
51
+ - **Distant tokens can matter too**: For long-range dependencies (like pronoun resolution), distant tokens can have surprisingly high attribution
52
+ - **Punctuation varies**: Commas and periods sometimes have notable attribution because they signal sentence structure
53
+
54
+ ## Tips
55
+
56
+ - Try the same prompt with multiple target tokens to see how attribution shifts
57
+ - Short prompts (5-10 tokens) give the clearest attribution results
58
+ - If all scores are roughly equal, the model may be uncertain or the prediction may not depend on any single token
59
+ - Use attribution alongside ablation for a fuller picture: attribution tells you which input tokens matter; ablation tells you which internal components matter
rag_docs/key_terminology.md ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Key Terminology
2
+
3
+ An extended glossary of terms you may encounter while using the Transformer Explanation Dashboard.
4
+
5
+ ## Core Concepts
6
+
7
+ **Token**: A small piece of text that the model processes. Can be a word, part of a word, or a punctuation mark. The model's fundamental unit of input and output.
8
+
9
+ **Embedding**: A vector (list of numbers) that represents a token's meaning. Similar tokens have similar embeddings.
10
+
11
+ **Attention**: The mechanism that lets each token look at all other tokens to gather relevant context. Uses Queries, Keys, and Values.
12
+
13
+ **Attention Head**: One instance of the attention mechanism. Each layer has multiple heads that look for different patterns simultaneously.
14
+
15
+ **Layer**: One complete processing step in the Transformer, containing both attention and MLP components. GPT-2 has 12 layers; larger models have more.
16
+
17
+ **MLP / Feed-Forward Network (FFN)**: The component in each layer that processes tokens individually, storing and retrieving factual knowledge. Uses an expand-then-compress pattern.
18
+
19
+ ## Architecture Terms
20
+
21
+ **Residual Stream**: The "conveyor belt" of information running through all layers. Each layer reads from it and adds back its contribution. This preserves information from earlier layers.
22
+
23
+ **Layer Normalization (LayerNorm)**: A technique applied before or after each sublayer that stabilizes the numbers, keeping them in a reasonable range. This helps training and makes the model more robust.
24
+
25
+ **Parameters / Weights**: The learnable numbers in the model. These are adjusted during training to improve predictions. GPT-2 has ~124 million parameters.
26
+
27
+ **Hidden Dimension**: The size of the internal vector representations. For GPT-2, this is 768 -- meaning each token is represented by 768 numbers at each layer.
28
+
29
+ **Vocabulary**: The complete set of tokens the model knows. GPT-2 has a vocabulary of about 50,257 tokens.
30
+
31
+ ## Training and Inference
32
+
33
+ **Training**: The process of adjusting the model's parameters by showing it billions of text examples. The model learns to predict the next token and its parameters are updated to reduce prediction errors.
34
+
35
+ **Inference**: Using the trained model to make predictions on new text. This is what happens when you click "Analyze" in the dashboard -- no learning occurs, the model just processes your input.
36
+
37
+ **Forward Pass**: One complete trip of data through the model, from input tokens to output predictions. The dashboard visualizes this forward pass.
38
+
39
+ **Gradient**: A measure of how much each parameter contributed to the model's prediction error. Used during training to update parameters, and in attribution experiments to measure token importance.
40
+
41
+ **Loss**: A number measuring how wrong the model's predictions are. During training, the goal is to minimize this. Lower loss means better predictions.
42
+
43
+ **Fine-tuning**: Taking a pre-trained model and training it further on a specific dataset to specialize its behavior.
44
+
45
+ ## Prediction Terms
46
+
47
+ **Logits**: The raw, unnormalized scores the model assigns to every possible next token before converting to probabilities.
48
+
49
+ **Softmax**: The function that converts logits into probabilities (positive numbers that sum to 1.0).
50
+
51
+ **Probability Distribution**: The complete set of probabilities over all possible next tokens. The dashboard shows the top 5.
52
+
53
+ **Temperature**: A setting that controls prediction confidence. Low temperature = more focused; high temperature = more spread out.
54
+
55
+ **Beam Search**: A generation strategy that explores multiple possible sequences simultaneously instead of just picking the single best token at each step.
rag_docs/llama_overview.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LLaMA Overview
2
+
3
+ ## What Is LLaMA?
4
+
5
+ LLaMA (Large Language Model Meta AI) is a family of open-weight language models developed by Meta. First released in 2023, LLaMA models introduced several architectural improvements over GPT-2 and became the foundation for many other models (Mistral, Qwen, etc.). In the dashboard, models labeled "LLaMA-like" share this architecture.
6
+
7
+ ## Architectural Differences from GPT-2
8
+
9
+ LLaMA models use several key innovations:
10
+
11
+ ### RoPE (Rotary Position Embeddings)
12
+ Instead of GPT-2's learned absolute position embeddings, LLaMA uses **Rotary Position Embeddings (RoPE)**. RoPE encodes position information by rotating the query and key vectors in attention. This means:
13
+ - The model can generalize better to different sequence lengths
14
+ - Position information is baked into the attention computation itself
15
+ - Attention patterns may look different from GPT-2 because of how positions are encoded
16
+
17
+ ### RMSNorm Instead of LayerNorm
18
+ LLaMA uses **RMSNorm** (Root Mean Square Normalization) instead of the standard LayerNorm used in GPT-2. RMSNorm is simpler and slightly faster -- it only normalizes the magnitude of the vectors without centering them first.
19
+
20
+ ### SiLU Activation
21
+ Where GPT-2 uses GELU activation in the MLP, LLaMA uses **SiLU** (Sigmoid Linear Unit, also called "Swish"). This is a smooth activation function that tends to produce slightly different MLP behavior.
22
+
23
+ ### Grouped-Query Attention (GQA)
24
+ Larger LLaMA variants use **Grouped-Query Attention**, where multiple query heads share the same key and value heads. This reduces memory usage and speeds up inference without significantly hurting quality. This means the number of key/value heads may be smaller than the number of query heads.
25
+
26
+ ## Models Using LLaMA Architecture
27
+
28
+ The dashboard's "llama_like" family includes:
29
+ - **Meta LLaMA**: LLaMA 2 (7B, 13B, 70B), LLaMA 3 (1B, 3B, 8B, 70B)
30
+ - **Qwen**: Qwen2, Qwen2.5 (0.5B to 72B) -- available in the dashboard dropdown as "Qwen2.5-0.5B"
31
+ - **Mistral**: Mistral-7B, Mixtral-8x7B
32
+
33
+ ## What to Expect in the Dashboard
34
+
35
+ When using a LLaMA-like model (such as Qwen2.5-0.5B):
36
+
37
+ - **More layers and heads**: Even the small Qwen2.5-0.5B has 24 layers and 14 heads, compared to GPT-2's 12 layers and 12 heads
38
+ - **Different attention patterns**: RoPE-based attention may show different positional patterns compared to GPT-2
39
+ - **Different tokenizer**: LLaMA-family models use a different BPE vocabulary, so the same text may tokenize differently
40
+ - **Comparing with GPT-2**: Running the same prompt on both GPT-2 and a LLaMA-like model is a great way to see how architecture affects predictions
41
+
42
+ ## HuggingFace Model IDs
43
+
44
+ - `Qwen/Qwen2.5-0.5B` (in default dropdown)
45
+ - `meta-llama/Llama-3.2-1B`, `meta-llama/Llama-3.1-8B`
46
+ - `mistralai/Mistral-7B-v0.3`
rag_docs/mechanistic_interpretability_intro.md ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Introduction to Mechanistic Interpretability
2
+
3
+ ## What Is Mechanistic Interpretability?
4
+
5
+ Mechanistic interpretability (often called "mech interp") is a field of AI research that aims to understand **how** neural networks work internally -- not just what they predict, but why. Instead of treating models as black boxes, researchers open them up and study the individual components (neurons, attention heads, layers) to figure out what each one does.
6
+
7
+ This dashboard is a tool for doing exactly that kind of investigation.
8
+
9
+ ## How This Dashboard Relates
10
+
11
+ The experiments available in this dashboard are real techniques used in mechanistic interpretability research:
12
+
13
+ - **Ablation** (removing heads to test their importance) is a standard tool for identifying which components are responsible for specific behaviors
14
+ - **Token attribution** (measuring input influence via gradients) is used to trace how information flows from inputs to outputs
15
+ - **Attention pattern analysis** (categorizing heads by behavior) helps researchers build a map of what each head does
16
+ - **Head categorization** (Previous-Token, BoW, Syntactic, etc.) builds on research that has identified recurring head types across models
17
+
18
+ ## Key Concepts in the Field
19
+
20
+ ### Circuits
21
+
22
+ A **circuit** is a small subnetwork within the model that performs a specific function. For example, researchers have found "induction circuits" -- combinations of attention heads across layers that work together to complete patterns like "A B ... A" → "B" (if the model has seen "A B" before, when it sees "A" again, it predicts "B").
23
+
24
+ In the dashboard, you can start to identify circuits by ablating combinations of heads and seeing which combinations have outsized effects.
25
+
26
+ ### Superposition
27
+
28
+ **Superposition** is the idea that neural networks represent more features than they have dimensions. A 768-dimensional embedding might encode thousands of different concepts by overlapping them. This makes interpretation challenging because a single neuron can participate in many features.
29
+
30
+ ### Induction Heads
31
+
32
+ **Induction heads** are one of the best-understood circuits. They are pairs of attention heads (typically one in an early layer and one in a later layer) that work together to copy patterns from context. If the model has seen "Harry Potter" earlier in the text and encounters "Harry" again, induction heads help it predict "Potter."
33
+
34
+ You might observe induction-like behavior in the dashboard when using prompts with repeated patterns.
35
+
36
+ ### Polysemanticity
37
+
38
+ Neurons and heads are often **polysemantic** -- they respond to multiple unrelated features. An attention head might handle both pronoun resolution and list formatting, depending on the input. This is why head categories are approximate: the same head may behave differently for different prompts.
39
+
40
+ ## Notable Research Groups
41
+
42
+ These organizations have published influential work in mechanistic interpretability:
43
+
44
+ - **Anthropic**: Published foundational work on transformer circuits, superposition, and dictionary learning for interpreting neural networks
45
+ - **EleutherAI**: Open-source AI research group that has contributed tools and analysis for model interpretability
46
+ - **Redwood Research**: Focuses on alignment-relevant interpretability, including causal interventions on model behavior
47
+ - **DeepMind (Google)**: Research on understanding internal representations and how models store knowledge
48
+
49
+ ## Further Reading
50
+
51
+ If you want to explore the research behind this dashboard's techniques:
52
+
53
+ - "A Mathematical Framework for Transformer Circuits" (Elhage et al., Anthropic) -- foundational paper on how attention heads compose into circuits
54
+ - "In-context Learning and Induction Heads" (Olsson et al., Anthropic) -- how models learn to copy patterns from context
55
+ - "Locating and Editing Factual Associations in GPT" (Meng et al.) -- how facts are stored in MLP layers
56
+ - "Attention Is All You Need" (Vaswani et al., 2017) -- the original Transformer paper
57
+
58
+ These papers are referenced here for context. The dashboard provides a hands-on way to explore many of the concepts they describe.
rag_docs/mlp_layers_explained.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MLP (Feed-Forward) Layers Explained
2
+
3
+ ## What Are MLP Layers?
4
+
5
+ After attention gathers context from other tokens, each token's representation passes through a **Multi-Layer Perceptron (MLP)**, also called a **Feed-Forward Network (FFN)**. While attention handles relationships between tokens, the MLP processes each token independently -- and this is where much of the model's **factual knowledge** is stored.
6
+
7
+ ## What They Do
8
+
9
+ Think of the MLP as the model's **memory bank**. During training, the MLP weights learned to encode facts, patterns, and associations from the training data. When the model processes "The capital of France is," the MLP layers help recall that "Paris" is the answer.
10
+
11
+ Researchers have found that specific facts are often stored in specific MLP neurons. This is one of the key findings in mechanistic interpretability research.
12
+
13
+ ## The Expand-Then-Compress Pattern
14
+
15
+ Each MLP layer follows a distinctive pattern:
16
+
17
+ 1. **Expand**: The token's representation is projected into a much larger space (typically 4x the hidden dimension). For GPT-2, this means going from 768 dimensions to 3072 dimensions.
18
+ 2. **Activate**: A non-linear activation function is applied (like GELU or SiLU), which allows the network to represent complex patterns.
19
+ 3. **Compress**: The expanded representation is projected back down to the original size (768 for GPT-2).
20
+
21
+ **Why expand then compress?** The expansion creates space for many individual neurons to each "vote" on whether a specific concept or fact is relevant. The compression then combines these votes into a refined representation. Each neuron in the expanded layer can activate for specific concepts.
22
+
23
+ ## Attention + MLP = One Layer
24
+
25
+ In each Transformer layer, attention and MLP work together:
26
+
27
+ 1. **Attention** gathers relevant context from other tokens
28
+ 2. **MLP** retrieves stored knowledge and transforms the representation
29
+ 3. The result is added back to the **residual stream** (the running representation)
30
+
31
+ This happens in every layer. GPT-2 has 12 such layers; each one further refines the model's understanding.
32
+
33
+ ## What You See in the Dashboard
34
+
35
+ In **Stage 4 (MLP/Feed-Forward)** of the pipeline, you can see:
36
+
37
+ - The expand-compress flow: Input dimension → Expanded dimension → Output dimension
38
+ - The number of layers in the model
39
+ - An explanation of why the expansion matters for knowledge storage
rag_docs/model_selector_guide.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Selector Guide
2
+
3
+ ## How to Choose a Model
4
+
5
+ The dashboard supports several transformer model families. You can select a model from the dropdown menu in the generator section at the top of the page.
6
+
7
+ ### Available Models
8
+
9
+ Currently, the dashboard offers:
10
+
11
+ - **GPT-2 (124M)**: OpenAI's GPT-2 small model. 12 layers, 12 attention heads, 768-dimensional embeddings. This is the best model to start with -- it's small, fast, and well-studied.
12
+ - **Qwen2.5-0.5B**: A LLaMA-like model from Alibaba's Qwen family. 24 layers, 14 attention heads, 896-dimensional embeddings. Slightly larger and uses different architectural features (RoPE, SiLU activation).
13
+
14
+ You can also enter a custom **HuggingFace model ID** in the dropdown (type it in). The dashboard supports GPT-2, LLaMA, OPT, GPT-NeoX, BLOOM, Falcon, and MPT model families.
15
+
16
+ ### What Happens When You Load a Model
17
+
18
+ 1. The model is downloaded from HuggingFace (this may take a moment the first time)
19
+ 2. The dashboard **auto-detects** the model's architecture family
20
+ 3. Internal hooks are automatically configured to capture attention patterns, MLP activations, and other data
21
+ 4. The layer and head dropdowns in the sidebar and ablation panel are populated based on the model's structure
22
+
23
+ ### Auto-Detection
24
+
25
+ The dashboard has a registry that maps model names to their architecture family. When it recognizes a model, it automatically configures:
26
+ - Which internal modules to hook for attention capture
27
+ - Which normalization parameters to track
28
+ - The correct patterns for extracting layer outputs
29
+
30
+ If you enter an unknown model, the sidebar's configuration dropdowns may need manual adjustment.
31
+
32
+ ### Tips for Choosing
33
+
34
+ - **Start with GPT-2**: It's small, fast, and the most widely studied. Most educational resources reference GPT-2.
35
+ - **Try Qwen2.5-0.5B for comparison**: It uses a different architecture (LLaMA-style). Comparing results between GPT-2 and Qwen can highlight how architectural differences affect attention patterns.
36
+ - **Larger models are slower**: Models with more parameters take longer to load and analyze. Stick to small models for interactive exploration.
37
+ - **Memory matters**: Larger models require more RAM. If the dashboard becomes unresponsive, try a smaller model.
38
+
39
+ ### Generation Settings
40
+
41
+ After selecting a model and entering a prompt, you can configure:
42
+ - **Number of Generation Choices (Beams)**: 1-5 beams. More beams explore more paths but take longer.
43
+ - **Number of New Tokens**: 1-20 tokens to generate. Shorter is faster.
44
+
45
+ Click **Analyze** to run the model and see results in the pipeline and generation sections.
rag_docs/opt_overview.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # OPT Overview
2
+
3
+ ## What Is OPT?
4
+
5
+ OPT (Open Pre-trained Transformer) is a family of language models released by Meta in 2022. OPT was designed to replicate GPT-3's architecture and performance while being openly available to researchers. It uses a decoder-only transformer architecture similar to GPT-2 but with options for much larger sizes.
6
+
7
+ ## Architecture Details
8
+
9
+ OPT's architecture is close to GPT-2 but has some differences:
10
+
11
+ | Property | OPT-125M | OPT-350M | OPT-1.3B |
12
+ |----------|----------|----------|----------|
13
+ | Parameters | 125M | 350M | 1.3B |
14
+ | Layers | 12 | 24 | 24 |
15
+ | Attention Heads | 12 | 16 | 32 |
16
+ | Hidden Dimension | 768 | 1024 | 2048 |
17
+ | Vocabulary Size | 50,272 | 50,272 | 50,272 |
18
+
19
+ ### Key Differences from GPT-2
20
+
21
+ - **Learned positional embeddings**: Like GPT-2, OPT uses learned absolute position embeddings (unlike LLaMA's RoPE)
22
+ - **LayerNorm placement**: OPT uses pre-norm LayerNorm (applied before each sublayer), which is slightly different from GPT-2's original arrangement
23
+ - **Larger variants available**: OPT scales up to 175 billion parameters, though only smaller variants are practical for interactive use
24
+
25
+ ### Similarities to GPT-2
26
+
27
+ - Same general decoder-only architecture
28
+ - Same tokenizer style (BPE with ~50K vocabulary)
29
+ - Same attention mechanism (standard multi-head self-attention)
30
+ - Similar training objective (next-token prediction)
31
+
32
+ ## What to Expect in the Dashboard
33
+
34
+ When using OPT models:
35
+
36
+ - **OPT-125M is very similar to GPT-2**: Same number of layers (12), heads (12), and hidden dimension (768). You'll see similar attention patterns and predictions.
37
+ - **Different module paths**: The dashboard auto-detects OPT's internal structure (e.g., `model.decoder.layers.N.self_attn`), so hooking works automatically.
38
+ - **Tokenization**: OPT's tokenizer is very similar to GPT-2's, so the same text usually produces similar (but not identical) token sequences.
39
+ - **Good for comparison**: Running the same prompt on GPT-2 and OPT-125M can show how similar architectures with different training data produce different predictions.
40
+
41
+ ## HuggingFace Model IDs
42
+
43
+ - `facebook/opt-125m`
44
+ - `facebook/opt-350m`
45
+ - `facebook/opt-1.3b`
46
+ - `facebook/opt-2.7b`
47
+
48
+ Note: OPT models are not in the default dropdown but can be loaded by typing the model ID directly.
rag_docs/output_and_prediction.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Output and Prediction
2
+
3
+ ## How Does the Model Choose the Next Token?
4
+
5
+ After your text has passed through all the Transformer layers (attention + MLP in each), the model needs to make a prediction: what token comes next? This final step converts the model's internal representation into a probability distribution over its entire vocabulary.
6
+
7
+ ## Logits: Raw Scores
8
+
9
+ The model's final hidden state for the last token is multiplied by the embedding table (in reverse) to produce a score for **every token in the vocabulary**. These raw scores are called **logits**. A higher logit means the model thinks that token is more likely.
10
+
11
+ For GPT-2, this means producing about 50,257 scores -- one for each token in its vocabulary.
12
+
13
+ ## Softmax: Turning Scores into Probabilities
14
+
15
+ Raw logits can be any number (positive or negative). To get actual probabilities, the model applies a function called **softmax**, which:
16
+
17
+ 1. Converts all scores to positive numbers
18
+ 2. Makes them all add up to 1.0 (100%)
19
+ 3. Preserves the ranking (higher logits → higher probabilities)
20
+
21
+ After softmax, we can say things like "the model predicts 'mat' with 45% probability."
22
+
23
+ ## Temperature
24
+
25
+ **Temperature** is a setting that controls how "confident" or "creative" the model's predictions are:
26
+
27
+ - **Low temperature (e.g., 0.1)**: Makes the model very confident -- the top prediction gets almost all the probability. Good for factual, predictable text.
28
+ - **High temperature (e.g., 1.5)**: Spreads probability more evenly, making less likely tokens more probable. Good for creative, varied text.
29
+ - **Temperature = 1.0**: The default, unmodified distribution.
30
+
31
+ ## Greedy Decoding vs. Sampling
32
+
33
+ Once we have probabilities, how do we pick the actual next token?
34
+
35
+ - **Greedy decoding**: Always pick the token with the highest probability. Simple but can be repetitive.
36
+ - **Sampling**: Randomly pick a token weighted by the probabilities. More varied but less predictable.
37
+ - **Beam search**: Explore multiple possible sequences simultaneously and pick the best overall path. This is available in the dashboard's generation controls.
38
+
39
+ ## What You See in the Dashboard
40
+
41
+ In **Stage 5 (Output Selection)** of the pipeline:
42
+
43
+ - The **predicted token** is highlighted after your prompt text, along with its confidence percentage
44
+ - A **top-5 bar chart** shows the five most likely next tokens and their probabilities
45
+ - A note explains how beam search and other techniques can influence the final selection beyond just the top-1 token
rag_docs/pipeline_stages.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Pipeline Stages
2
+
3
+ ## Overview
4
+
5
+ The pipeline visualization shows the 5 stages your text passes through inside the transformer model. A flow indicator at the top shows the path: **Input → Tokens → Embed → Attention → MLP → Output**. Click any stage to expand it and see details.
6
+
7
+ ## Stage 1: Tokenization
8
+
9
+ **Icon**: Puzzle piece | **Summary shows**: "X tokens"
10
+
11
+ This stage displays how your input text was split into tokens. Each row shows:
12
+ - The **token** (the text piece, displayed in a blue box)
13
+ - An arrow pointing to its **ID** (the number the model uses internally, in a purple box)
14
+
15
+ Notice that spaces are often attached to the beginning of the following word (e.g., " cat" with a leading space). This is normal for models like GPT-2 that use BPE tokenization.
16
+
17
+ ## Stage 2: Embedding
18
+
19
+ **Icon**: Cube | **Summary shows**: "X-dim vectors"
20
+
21
+ This stage shows how token IDs are converted into numerical vectors using a pre-learned embedding table. You'll see:
22
+ - A visual flow: Token ID → Lookup Table → Vector
23
+ - The embedding dimension (e.g., 768 for GPT-2)
24
+ - An explanation of how the lookup table was learned during training
25
+
26
+ ## Stage 3: Attention
27
+
28
+ **Icon**: Eye | **Summary shows**: "X heads × Y layers"
29
+
30
+ This is the most detailed stage. It includes:
31
+ - **Head Categories**: Attention heads are automatically categorized by their behavior pattern (Previous-Token, First/Positional, Bag-of-Words, Syntactic, Other). Click each category to see which specific heads belong to it.
32
+ - **BertViz Visualization**: An interactive attention map showing which tokens attend to which. Lines connect tokens on the left to tokens on the right. Thicker lines mean stronger attention.
33
+
34
+ **Navigating BertViz**: Single-click a head square to select/deselect it. Double-click to show only that head. Hover over tokens or lines to see exact attention weights.
35
+
36
+ ## Stage 4: MLP (Feed-Forward)
37
+
38
+ **Icon**: Network | **Summary shows**: "X layers"
39
+
40
+ This stage shows the expand-then-compress pattern of the feed-forward network:
41
+ - Input dimension (e.g., 768) → Expanded dimension (e.g., 3072, which is 4x larger) → Back to input dimension
42
+ - An explanation of why this expansion matters for storing factual knowledge
43
+ - The total number of layers in the model
44
+
45
+ ## Stage 5: Output Selection
46
+
47
+ **Icon**: Bullseye | **Summary shows**: "→ [predicted token]"
48
+
49
+ This stage reveals the model's prediction:
50
+ - Your full prompt with the **predicted next token** highlighted
51
+ - A **confidence percentage** for the top prediction
52
+ - A **top-5 bar chart** showing the five most likely next tokens and their probabilities
53
+ - A note about how beam search and other techniques can influence the final output
rag_docs/recommended_starting_points.md ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Recommended Starting Points
2
+
3
+ ## Best First Model
4
+
5
+ **GPT-2 (124M)** is the ideal starting model because:
6
+ - It loads quickly and runs fast
7
+ - It has a manageable size (12 layers, 12 heads = 144 heads total)
8
+ - It's the most studied model in mechanistic interpretability research
9
+ - Most educational examples and tutorials reference GPT-2
10
+
11
+ ## Good Starter Prompts
12
+
13
+ ### For Exploring Basic Predictions
14
+
15
+ | Prompt | What It Tests |
16
+ |--------|--------------|
17
+ | `The cat sat on the` | Simple object prediction (mat, floor, bed) |
18
+ | `The capital of France is` | Factual recall (Paris) |
19
+ | `1 + 1 =` | Basic arithmetic |
20
+ | `Once upon a time` | Creative story continuation |
21
+
22
+ ### For Exploring Attention Patterns
23
+
24
+ | Prompt | What It Shows |
25
+ |--------|--------------|
26
+ | `The cat sat on the mat because it was` | Pronoun resolution: does "it" attend to "cat" or "mat"? |
27
+ | `Alice gave the book to Bob because she` | Gendered pronoun resolution |
28
+ | `The dogs in the park were` | Subject-verb agreement across a prepositional phrase |
29
+ | `I went to the store and bought` | Sequential event prediction |
30
+
31
+ ### For Ablation Experiments
32
+
33
+ | Prompt | Why It's Good for Ablation |
34
+ |--------|---------------------------|
35
+ | `The cat sat on the` | Simple enough that ablating one head can change the prediction |
36
+ | `The president of the` | Factual prompts show clear ablation effects on knowledge retrieval |
37
+ | `She picked up the phone and` | Action continuation is sensitive to Previous-Token head ablation |
38
+
39
+ ### For Attribution Experiments
40
+
41
+ | Prompt | What Attribution Reveals |
42
+ |--------|------------------------|
43
+ | `The capital of France is` | "France" should have highest attribution for "Paris" |
44
+ | `The doctor told the nurse that she` | Which noun drives the pronoun prediction? |
45
+ | `The large red ball rolled down the` | Do adjectives or nouns matter more? |
46
+
47
+ ## Suggested Experiment Order
48
+
49
+ If you're new to the dashboard, follow this path:
50
+
51
+ 1. **Experiment: Your First Analysis** -- Learn the basics with GPT-2 and a simple prompt
52
+ 2. **Experiment: Exploring Attention Patterns** -- Understand what attention heads do
53
+ 3. **Experiment: Your First Ablation** -- Remove a head and see what happens
54
+ 4. **Experiment: Token Attribution** -- See which input tokens drive predictions
55
+ 5. **Experiment: Comparing Heads** -- Systematically compare head categories
56
+ 6. **Experiment: Beam Search** -- Explore alternative generation paths
57
+
58
+ ## After the Basics
59
+
60
+ Once you've completed the guided experiments:
61
+ - **Compare models**: Run the same prompt on GPT-2 and Qwen2.5-0.5B to see architectural differences
62
+ - **Try longer prompts**: See how attention patterns change with more context
63
+ - **Combine techniques**: Use attribution to find important tokens, then ablate heads to find the components that process those tokens
64
+ - **Explore edge cases**: Try prompts in other languages, code snippets, or mathematical expressions
rag_docs/tokenization_explained.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tokenization Explained
2
+
3
+ ## What Is Tokenization?
4
+
5
+ Tokenization is the very first step in how a language model processes your text. Models cannot read raw text -- they need it broken into small, numbered pieces called **tokens**. Tokenization converts your input string into a sequence of these tokens.
6
+
7
+ ## Why Not Just Use Words?
8
+
9
+ You might wonder why we don't just split text by spaces. The problem is that there are too many possible words (including misspellings, rare terms, and words in other languages). Instead, modern models use **subword tokenization**, which breaks text into smaller, reusable pieces.
10
+
11
+ For example, the word "unhappiness" might become three tokens: "un", "happiness", or "un", "happ", "iness" -- depending on the specific tokenizer.
12
+
13
+ ## How It Works: BPE
14
+
15
+ Most models (including GPT-2 and LLaMA) use a method called **Byte-Pair Encoding (BPE)**. Here's the intuition:
16
+
17
+ 1. Start with individual characters as your vocabulary
18
+ 2. Find the most common pair of adjacent characters in the training data (e.g., "t" + "h" = "th")
19
+ 3. Merge that pair into a new token
20
+ 4. Repeat thousands of times
21
+
22
+ This builds a vocabulary of common subwords. Frequent words like "the" become single tokens, while rare words get split into pieces.
23
+
24
+ ## Token IDs
25
+
26
+ Each token has a unique **ID** -- a number that the model uses internally. For example, in GPT-2's vocabulary, the token "the" might have ID 262, while "cat" might have ID 9246. The model never sees the text itself; it only works with these IDs.
27
+
28
+ ## What You See in the Dashboard
29
+
30
+ In **Stage 1 (Tokenization)** of the pipeline, you can see exactly how your input text was split:
31
+
32
+ - Each row shows a **token** (the text piece) and its **ID** (the number)
33
+ - The summary shows the total number of tokens
34
+ - Notice how spaces are often attached to the following word (e.g., " cat" with a leading space is one token)
35
+
36
+ This stage helps you understand that the model's "unit of thought" is the token, not the word.
rag_docs/transformer_architecture.md ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # The Transformer Architecture
2
+
3
+ ## What Is a Transformer?
4
+
5
+ A **Transformer** is the specific type of neural network architecture used by modern LLMs like GPT-2, LLaMA, and others. It was introduced in the 2017 paper "Attention Is All You Need" and quickly became the dominant approach for language tasks.
6
+
7
+ ## The Key Innovation: Attention
8
+
9
+ Before Transformers, language models processed words one at a time, left to right. Transformers changed this by introducing the **attention mechanism**, which lets the model look at all words in the input simultaneously and figure out which ones are relevant to each other.
10
+
11
+ For example, in "The cat sat on the mat because it was tired," attention helps the model connect "it" back to "the cat."
12
+
13
+ ## How Layers Stack
14
+
15
+ A Transformer is built from identical **layers** stacked on top of each other. Each layer has two main parts:
16
+
17
+ 1. **Attention**: Looks at relationships between all tokens
18
+ 2. **MLP (Feed-Forward Network)**: Processes each token's information individually, retrieving stored knowledge
19
+
20
+ A small model like GPT-2 has 12 layers. Larger models may have 32, 64, or more.
21
+
22
+ Information flows through these layers sequentially. After each layer, the model's understanding of the text becomes more refined. Early layers tend to capture basic patterns (like grammar), while later layers capture more complex meanings.
23
+
24
+ ## Encoder vs. Decoder
25
+
26
+ The original Transformer had two halves:
27
+
28
+ - **Encoder**: Reads and understands the full input (used in models like BERT)
29
+ - **Decoder**: Generates text one token at a time (used in GPT-style models)
30
+
31
+ Most LLMs you'll encounter in this dashboard are **decoder-only** models. This means they generate text left-to-right, predicting one token at a time based on everything that came before it. Each token can only "see" the tokens to its left -- it cannot look ahead.
32
+
33
+ ## The Residual Stream
34
+
35
+ There is an important concept called the **residual stream** (or "residual connection"). Think of it as a conveyor belt running through all the layers. Each layer reads from this stream, does some processing, and adds its result back. This means information from early layers is preserved and can be used by later layers.
36
+
37
+ ## How This Connects to the Dashboard
38
+
39
+ The dashboard's 5-stage pipeline follows the exact path data takes through a Transformer: Tokenization, Embedding, Attention, MLP, and Output. When you expand each stage, you're seeing what happens at that point in the architecture.
rag_docs/troubleshooting_and_faq.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Troubleshooting and FAQ
2
+
3
+ ## Common Issues
4
+
5
+ ### Model takes a long time to load
6
+
7
+ **Why**: The first time you load a model, it must be downloaded from HuggingFace. GPT-2 (124M) is about 500MB; larger models are much bigger.
8
+
9
+ **Fix**: Be patient on the first load. Subsequent loads should be faster because the model is cached locally. If loading is consistently slow, try a smaller model.
10
+
11
+ ### Ablating a head has no effect
12
+
13
+ **Why**: Not every head is important for every input. Many attention heads are redundant -- the model has learned to distribute work across multiple heads, so removing one doesn't always change the output.
14
+
15
+ **Fix**: This is actually an interesting finding! Try:
16
+ - Ablating a head from a different category (Previous-Token heads often show more effect)
17
+ - Using a different prompt (some prompts depend more on specific heads)
18
+ - Ablating multiple heads simultaneously to see if their combined removal has an effect
19
+
20
+ ### Attribution takes too long
21
+
22
+ **Why**: Integrated Gradients is computationally expensive because it runs the model multiple times (typically 50 steps) to build up the attribution scores.
23
+
24
+ **Fix**: Switch to "Simple Gradient" for faster (though less accurate) results. Or use a shorter prompt -- fewer tokens means faster computation.
25
+
26
+ ### The model's prediction seems wrong or nonsensical
27
+
28
+ **Why**: Small models like GPT-2 (124M) have limited knowledge and can produce incorrect facts, repetitive text, or non-sequiturs. The model was trained on data from before 2019 and has a limited understanding of the world.
29
+
30
+ **Fix**: This is expected behavior for small models. The dashboard is designed for exploring *how* the model works, not for getting useful outputs. Try different prompts or a different model.
31
+
32
+ ### BertViz visualization is hard to read
33
+
34
+ **Why**: With 12+ heads selected simultaneously, the attention lines overlap and become a dense mess.
35
+
36
+ **Fix**: Double-click on a single head in the BertViz visualization to isolate it. Then explore heads one at a time. Use the head categories to guide which heads to investigate.
37
+
38
+ ### The dashboard becomes slow or unresponsive
39
+
40
+ **Why**: Larger models require more memory and computation. Running multiple experiments without refreshing can also accumulate memory usage.
41
+
42
+ **Fix**: Try a smaller model. Refresh the browser page if things get sluggish. Close other memory-intensive applications.
43
+
44
+ ## Frequently Asked Questions
45
+
46
+ ### Which model should I start with?
47
+
48
+ **GPT-2 (124M)** is the best starting model. It's small, fast, well-studied, and has clean attention patterns that are easy to understand. Move to Qwen2.5-0.5B once you're comfortable for a comparison.
49
+
50
+ ### What prompts work best for learning?
51
+
52
+ Start with short, simple prompts (5-10 words) that have clear, predictable continuations:
53
+ - "The cat sat on the" (predict a location)
54
+ - "The capital of France is" (predict a fact)
55
+ - "Once upon a time there was a" (predict a story element)
56
+
57
+ These give clear, interpretable results in the pipeline and experiments.
58
+
59
+ ### Can I use my own model?
60
+
61
+ Yes! Type any HuggingFace model ID into the model dropdown. The dashboard supports GPT-2, LLaMA, OPT, GPT-NeoX, BLOOM, Falcon, and MPT architectures. Unknown architectures may need manual configuration in the sidebar.
62
+
63
+ ### What's the difference between the pipeline and the investigation panel?
64
+
65
+ The **pipeline** (5 stages) shows what happens during the model's forward pass -- how your input is processed step by step. The **investigation panel** (ablation + attribution) lets you run experiments to understand *why* the model made a specific prediction.
66
+
67
+ ### How do head categories get determined?
68
+
69
+ The dashboard automatically analyzes each attention head's pattern using heuristic rules (based on thresholds for attention distributions). For example, a head is classified as "Previous-Token" if more than 40% of each token's attention goes to the immediately preceding token. These categories are computed fresh for each analysis.
70
+
71
+ ### Can I save my results?
72
+
73
+ Currently, results are displayed in the browser and aren't saved between sessions. You can take screenshots or copy text from the chatbot (using the copy button on messages) to record your findings.
rag_docs/what_is_an_llm.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # What Is a Large Language Model (LLM)?
2
+
3
+ ## The Big Idea
4
+
5
+ A Large Language Model is a computer program that has learned to read and write text by studying enormous amounts of human writing. Think of it like an incredibly well-read assistant that has absorbed millions of books, articles, and websites, and can now predict what word comes next in a sentence.
6
+
7
+ ## How It Works (Simply)
8
+
9
+ At its core, an LLM does one thing: **next-token prediction**. Given some text like "The cat sat on the", the model predicts the most likely next piece of text (called a "token") -- perhaps "mat" or "floor."
10
+
11
+ This might sound simple, but to do it well, the model has to understand grammar, facts, context, and even some reasoning. All of that understanding is encoded in the model's **parameters** -- billions of numbers that were learned during training.
12
+
13
+ ## What Is a Neural Network?
14
+
15
+ An LLM is built on a type of computer program called a **neural network**. A neural network is loosely inspired by the brain: it's made of layers of simple processing units that pass information forward, transforming it step by step. Each layer takes input numbers, multiplies and adds them, and passes the result to the next layer.
16
+
17
+ When you stack many layers together and train them on lots of data, the network learns complex patterns -- like how words relate to each other.
18
+
19
+ ## What Makes It "Large"?
20
+
21
+ The "large" in LLM refers to two things:
22
+
23
+ - **Many parameters**: Modern LLMs have billions of learnable numbers (GPT-2 has 124 million; larger models have tens or hundreds of billions).
24
+ - **Massive training data**: They train on huge text datasets -- sometimes trillions of words from the internet, books, and code.
25
+
26
+ ## How Does This Connect to the Dashboard?
27
+
28
+ The Transformer Explanation Dashboard lets you look inside an LLM as it makes a prediction. When you enter a prompt and click "Analyze," you can see:
29
+
30
+ - How the model breaks your text into tokens (Stage 1)
31
+ - How those tokens become number vectors (Stage 2)
32
+ - How the model figures out which words relate to each other (Stage 3: Attention)
33
+ - How knowledge is retrieved from the model's memory (Stage 4: MLP)
34
+ - What the model predicts as the next token (Stage 5: Output)
35
+
36
+ This step-by-step view helps you understand what happens inside the "black box" of an LLM.
todo.md CHANGED
@@ -182,3 +182,14 @@
182
  - 1536 dimensions, high quality
183
  - [x] Remove local `sentence-transformers` dependency (simpler, no TF conflicts)
184
  - [x] Estimated cost: ~$1.50/month for moderate usage
 
 
 
 
 
 
 
 
 
 
 
 
182
  - 1536 dimensions, high quality
183
  - [x] Remove local `sentence-transformers` dependency (simpler, no TF conflicts)
184
  - [x] Estimated cost: ~$1.50/month for moderate usage
185
+
186
+ ## Completed: Enhance RAG Documents for Chatbot
187
+
188
+ - [x] Category 1: 8 general LLM/Transformer knowledge files (what_is_an_llm.md through key_terminology.md)
189
+ - [x] Category 2: 7 dashboard component documentation files (dashboard_overview.md through model_selector_guide.md)
190
+ - [x] Category 3: 3 model-specific documentation files (gpt2_overview.md, llama_overview.md, opt_overview.md)
191
+ - [x] Category 4: 6 step-by-step guided experiment files (experiment_first_analysis.md through experiment_beam_search.md)
192
+ - [x] Category 5: 6 interpretation/troubleshooting/research files (interpreting_*.md, troubleshooting_and_faq.md, recommended_starting_points.md, mechanistic_interpretability_intro.md)
193
+ - [x] Delete embeddings_cache.json, update rag_docs/README.md with full inventory
194
+ - [x] Update todo.md and conductor docs
195
+ - Total: 30 RAG documents covering transformer concepts, dashboard usage, guided experiments, interpretation, troubleshooting, and research context