Spaces:

cdpearlman
/

LLMVis

Sleeping

App Files Files Community

cdpearlman Cursor commited on Feb 5

Commit

78691d1

1 Parent(s): e582564

Add 30 RAG documents for chatbot knowledge base

Browse files

Co-authored-by: Cursor <cursoragent@cursor.com>

Files changed (34) hide show

conductor/product.md +2 -1
rag_docs/README.md +46 -11
rag_docs/ablation_panel_guide.md +33 -0
rag_docs/attention_mechanism.md +40 -0
rag_docs/attribution_panel_guide.md +40 -0
rag_docs/beam_search_and_generation.md +53 -0
rag_docs/dashboard_overview.md +49 -0
rag_docs/embeddings_cache.json +0 -1
rag_docs/embeddings_explained.md +35 -0
rag_docs/experiment_beam_search.md +82 -0
rag_docs/experiment_comparing_heads.md +88 -0
rag_docs/experiment_exploring_attention.md +62 -0
rag_docs/experiment_first_ablation.md +79 -0
rag_docs/experiment_first_analysis.md +67 -0
rag_docs/experiment_token_attribution.md +76 -0
rag_docs/gpt2_overview.md +56 -0
rag_docs/head_categories_explained.md +56 -0
rag_docs/interpreting_ablation_results.md +56 -0
rag_docs/interpreting_attention_maps.md +70 -0
rag_docs/interpreting_attribution_scores.md +59 -0
rag_docs/key_terminology.md +55 -0
rag_docs/llama_overview.md +46 -0
rag_docs/mechanistic_interpretability_intro.md +58 -0
rag_docs/mlp_layers_explained.md +39 -0
rag_docs/model_selector_guide.md +45 -0
rag_docs/opt_overview.md +48 -0
rag_docs/output_and_prediction.md +45 -0
rag_docs/pipeline_stages.md +53 -0
rag_docs/recommended_starting_points.md +64 -0
rag_docs/tokenization_explained.md +36 -0
rag_docs/transformer_architecture.md +39 -0
rag_docs/troubleshooting_and_faq.md +73 -0
rag_docs/what_is_an_llm.md +36 -0
todo.md +11 -0

conductor/product.md CHANGED Viewed

@@ -25,7 +25,8 @@ To demystify the inner workings of Transformer-based Large Language Models (LLMs
 - **Integrated Education:**
     - Contextual tooltips for immediate clarity.
     - Dedicated "Glossary" panel for in-depth definitions.
-    - Foundation for AI-guided tutorials.
 ## User Experience
 The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.

 - **Integrated Education:**
     - Contextual tooltips for immediate clarity.
     - Dedicated "Glossary" panel for in-depth definitions.
+    - AI chatbot with RAG-powered knowledge base (30 documents covering transformer concepts, dashboard usage, guided experiments, result interpretation, troubleshooting, and mechanistic interpretability research).
+    - Step-by-step guided experiments that walk beginners through the dashboard's features.
 ## User Experience
 The interface centers on exploration and clarity. Users start by selecting a model and inputting text. The dashboard then unfolds the model's processing pipeline, allowing users to "zoom in" on specific components. Experimentation modes are clearly distinguished, enabling users to hypothesize ("What if I turn off this head?") and test. Educational resources are omnipresent but non-intrusive, available on-demand to explain the *what* and *why* of what is being visualized.

rag_docs/README.md CHANGED Viewed

@@ -9,17 +9,52 @@ This folder contains documents used by the AI chatbot for Retrieval-Augmented Ge
 ## How to Add Documents
-1. Place your transformer-related documentation files in this folder
-2. The chatbot will automatically index new documents on startup
-3. Documents are chunked and embedded for semantic search
-## Recommended Content
-- Transformer architecture explanations
-- Attention mechanism documentation
-- Information about the experiments available in this dashboard
-- Glossary of ML/NLP terms
-- Model-specific documentation (GPT-2, LLaMA, etc.)
 ## Notes

 ## How to Add Documents
+1. Place your documentation files in this folder
+2. Delete `embeddings_cache.json` if it exists (to force re-indexing)
+3. The chatbot will automatically index new documents on startup
+4. Documents are chunked and embedded for semantic search
+## Document Inventory
+### Category 1: General LLM/Transformer Knowledge
+- `what_is_an_llm.md` - Neural networks, language models, next-token prediction
+- `transformer_architecture.md` - Layers, encoder/decoder, residual stream
+- `tokenization_explained.md` - Subword tokenization, BPE, token IDs
+- `embeddings_explained.md` - Lookup tables, vector spaces, positional encodings
+- `attention_mechanism.md` - Q/K/V, multi-head attention, intuitive explanations
+- `mlp_layers_explained.md` - Feed-forward networks, knowledge storage, expand-compress
+- `output_and_prediction.md` - Logits, softmax, temperature, greedy vs. sampling
+- `key_terminology.md` - Extended glossary of ML/transformer terms
+### Category 2: Dashboard Components
+- `dashboard_overview.md` - Layout tour, navigation, typical workflow
+- `pipeline_stages.md` - What each of the 5 pipeline stages shows
+- `ablation_panel_guide.md` - How to use the ablation experiment panel
+- `attribution_panel_guide.md` - How to use the token attribution panel
+- `beam_search_and_generation.md` - Beam search, generation controls
+- `head_categories_explained.md` - Previous-Token, Positional, BoW, Syntactic, Other
+- `model_selector_guide.md` - Choosing models, auto-detection, generation settings
+### Category 3: Model-Specific Documentation
+- `gpt2_overview.md` - GPT-2 architecture, why it's a good starter, variants
+- `llama_overview.md` - LLaMA/Qwen/Mistral architecture, RoPE, GQA differences
+- `opt_overview.md` - OPT architecture, comparison with GPT-2
+### Category 4: Guided Experiments (Step-by-Step)
+- `experiment_first_analysis.md` - Your first analysis with GPT-2
+- `experiment_exploring_attention.md` - Reading attention patterns and head categories
+- `experiment_first_ablation.md` - Removing a head and observing the effect
+- `experiment_token_attribution.md` - Measuring token influence with gradients
+- `experiment_comparing_heads.md` - Systematic comparison across head categories
+- `experiment_beam_search.md` - Exploring alternative generation paths
+### Category 5: Interpretation, Troubleshooting, and Research
+- `interpreting_ablation_results.md` - How to read ablation probability changes
+- `interpreting_attribution_scores.md` - Understanding attribution score values
+- `interpreting_attention_maps.md` - Reading BertViz patterns visually
+- `troubleshooting_and_faq.md` - Common issues and frequently asked questions
+- `recommended_starting_points.md` - Best models, prompts, and experiment order
+- `mechanistic_interpretability_intro.md` - Mech interp research context
 ## Notes

rag_docs/ablation_panel_guide.md ADDED Viewed

	@@ -0,0 +1,33 @@

+# Ablation Panel Guide
+## What Is Ablation?
+Ablation is an experiment where you **remove (disable) specific attention heads** from the model and observe how the output changes. If removing a head causes the prediction to change significantly, that head was important for this particular input. If the prediction barely changes, the head was either redundant or not relevant to this context.
+The term comes from neuroscience, where "ablation" means removing part of the brain to study its function.
+## How to Use the Ablation Panel
+The ablation panel is found in the **Investigation Panel** at the bottom of the dashboard, under the "Ablation" tab.
+### Step-by-Step
+1. **Run an analysis first**: You need to have clicked "Analyze" on a prompt before ablation is available.
+2. **Select a generation for comparison**: Click "Select for Comparison" on one of the generated sequences. This gives the ablation experiment a full generation to compare against.
+3. **Choose a head to ablate**: Use the **Layer** and **Head** dropdowns to pick a specific attention head (e.g., Layer 0, Head 3). Click the **+** button to add it.
+4. **Add more heads (optional)**: You can add multiple heads from different layers. Each appears as a chip (e.g., "L0-H3") with an × button to remove it.
+5. **Run the experiment**: Click "Run Ablation Experiment."
+6. **View results**: The panel shows a side-by-side comparison of the original vs. ablated generation, plus the probability change for the immediate next token.
+### Understanding the Results
+- **Full Generation Comparison**: Shows the original generated text alongside what the model generates with those heads removed. If the text changed, the ablated heads were important for that generation.
+- **Probability Change**: Shows the immediate next-token probability before and after ablation (e.g., "72.3% → 45.1% (-27.2%)"). A large drop means the head was important.
+- **No change?** If ablating a head has no effect, it may be redundant, or it may serve a function that isn't relevant to this specific prompt. Try a different prompt or a different head.
+### Tips
+- Start by ablating heads from the **Previous-Token** category -- these often have noticeable effects.
+- Try ablating heads from the **Other** category for comparison -- these often have less impact.
+- Use the **head categories** in the Attention stage (Stage 3) to pick interesting heads to ablate.
+- You can ablate heads from multiple layers simultaneously to see compound effects.

rag_docs/attention_mechanism.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# The Attention Mechanism
+## What Is Attention?
+Attention is the core innovation that makes Transformers powerful. It allows the model to look at **all tokens in the input simultaneously** and figure out which ones are relevant to each other. When processing any token, the model can "attend to" (focus on) other tokens that provide useful context.
+## An Intuitive Analogy
+Imagine you're reading a sentence and you come across the word "it." To understand what "it" refers to, you look back at earlier words -- maybe "the cat" or "the ball." Your brain is doing something like attention: scanning the context and focusing on the most relevant parts.
+The model does this mathematically by computing **attention scores** between every pair of tokens.
+## How It Works: Queries, Keys, and Values
+Attention uses three concepts (don't worry about the math -- the intuition is what matters):
+- **Query (Q)**: "What am I looking for?" -- each token asks a question
+- **Key (K)**: "What do I contain?" -- each token advertises what information it has
+- **Value (V)**: "Here's my actual information" -- the content that gets passed along
+The model computes how well each Query matches each Key. High matches mean strong attention. The final output for each token is a weighted combination of all the Values, weighted by these attention scores.
+## Multi-Head Attention
+Instead of computing attention once, Transformers compute it multiple times in parallel using different **attention heads**. Each head learns to look for different kinds of relationships:
+- One head might track which word a pronoun refers to
+- Another might focus on the immediately preceding token
+- Another might spread attention broadly across the sentence
+GPT-2 has **12 attention heads per layer** (144 heads total across 12 layers). The dashboard categorizes these heads to help you understand their roles.
+## What You See in the Dashboard
+In **Stage 3 (Attention)** of the pipeline, you can see:
+- **Head Categories**: Heads are grouped by pattern type -- Previous-Token, First/Positional, Bag-of-Words, Syntactic, and Other. Click each category to see which specific heads (like L0-H3) belong to it.
+- **BertViz Visualization**: An interactive attention map. Lines connect tokens on the left to the tokens they attend to on the right. Thicker, darker lines mean stronger attention. You can click on specific heads to focus on them.
+**Want more technical detail?** Ask the chatbot about Q/K/V dot products, softmax normalization, or how attention scores are computed mathematically.

rag_docs/attribution_panel_guide.md ADDED Viewed

	@@ -0,0 +1,40 @@

+# Attribution Panel Guide
+## What Is Token Attribution?
+Token attribution measures **which input tokens had the most influence on the model's prediction**. It answers the question: "Which parts of my input mattered most for this output?"
+For example, if your prompt is "The capital of France is" and the model predicts "Paris," attribution will show that "France" and "capital" had the highest influence on that prediction.
+## How to Use the Attribution Panel
+The attribution panel is in the **Investigation Panel** at the bottom of the dashboard, under the "Token Attribution" tab.
+### Step-by-Step
+1. **Run an analysis first**: Click "Analyze" on a prompt so the model has made a prediction.
+2. **Choose an attribution method**:
+   - **Integrated Gradients**: More accurate but slower. Computes attribution by gradually "building up" the input and measuring how each token contributes along the way.
+   - **Simple Gradient**: Faster but less precise. Takes a single gradient measurement to estimate token importance.
+3. **Choose a target token (optional)**: By default, attribution is computed for the model's top prediction. You can select a different token from the top-5 predictions dropdown to see which input tokens would drive *that* alternative prediction.
+4. **Click "Compute Attribution"**.
+### Reading the Results
+The results have two visualizations:
+- **Color-coded token chips**: Your input tokens are displayed as colored boxes. Darker blue = higher influence. Hover over any chip to see the exact attribution score.
+- **Horizontal bar chart**: Shows the same information as bars, with attribution scores labeled. Longer bars = more influential tokens.
+### What Attribution Scores Mean
+- **High score (near 1.0)**: This token was highly influential for the prediction. It strongly pushed the model toward the target token.
+- **Low score (near 0.0)**: This token had little influence on this particular prediction.
+- **Scores are normalized**: The highest-influence token gets a score of 1.0, and others are scaled relative to it.
+### Tips
+- Compare attribution for different target tokens. You might find that different input tokens drive different predictions.
+- Try Integrated Gradients for the most reliable results, especially when you want to draw conclusions.
+- Short prompts give cleaner attribution results since there are fewer tokens to compare.
+- Function words (like "the" or "is") often have low attribution; content words (nouns, verbs) tend to have higher attribution.

rag_docs/beam_search_and_generation.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Beam Search and Generation
+## What Is Text Generation?
+When you click "Analyze" in the dashboard, the model generates a continuation of your prompt. The simplest approach is **greedy decoding**: at each step, pick the single most likely next token. This is fast but can lead to repetitive or suboptimal text.
+## What Is Beam Search?
+Beam search is a smarter generation strategy that explores **multiple possible paths simultaneously** instead of committing to the single best token at each step.
+### How It Works
+Imagine you're at a fork in the road. Greedy decoding always takes the path that looks best right now. Beam search keeps multiple paths open and evaluates them over several steps before deciding which one is actually best overall.
+More specifically:
+1. At each step, the model considers the top candidates for the next token
+2. It keeps the **N best partial sequences** (where N is the number of beams)
+3. It extends each of those sequences by one more token
+4. It ranks all the possibilities and keeps the top N again
+5. This continues until the desired number of tokens have been generated
+The result is that beam search can find better overall sequences, even if the individual tokens at each step aren't always the highest-probability choice.
+### Example
+With the prompt "The cat sat on the" and 3 beams:
+- Beam 1 might generate: "The cat sat on the mat and purred"
+- Beam 2 might generate: "The cat sat on the floor and looked"
+- Beam 3 might generate: "The cat sat on the couch and slept"
+Each beam represents a different possible continuation.
+## Generation Controls in the Dashboard
+In the generator section, you have two settings:
+- **Number of Generation Choices (Beams)**: How many parallel paths to explore (1-5). With 1 beam, you get greedy decoding. More beams = more diverse results but slower.
+- **Number of New Tokens**: How many tokens to generate beyond your prompt (1-20). Longer generations take more time.
+## How Generated Sequences Are Used
+After generation, the dashboard displays the resulting sequences. You can:
+1. **Read them**: See how the model would continue your prompt
+2. **Select one for comparison**: Click "Select for Comparison" to use that sequence as a baseline in ablation experiments. This lets you see how removing attention heads changes the full generated sequence, not just the immediate next token.
+## Greedy vs. Beam Search
+| Feature | Greedy (1 beam) | Beam Search (2+ beams) |
+|---------|-----------------|------------------------|
+| Speed | Fast | Slower |
+| Quality | May miss better paths | Finds better overall sequences |
+| Diversity | Only one output | Multiple alternatives |
+| Best for | Quick exploration | Thorough analysis |

rag_docs/dashboard_overview.md ADDED Viewed

	@@ -0,0 +1,49 @@

+# Dashboard Overview
+## What Is This Dashboard?
+The Transformer Explanation Dashboard is an interactive web application that lets you look inside transformer-based language models as they process text. Instead of treating the model as a "black box," you can see exactly what happens at each step when the model makes a prediction.
+## How to Navigate
+The dashboard has several main sections, from top to bottom:
+### 1. Header and Sidebar
+At the top is the dashboard title. On the left edge, there's a collapsible **sidebar** (click the hamburger menu icon). The sidebar contains advanced configuration options for selecting which internal model components to hook into. Most users won't need to change these settings.
+### 2. Generator Section
+This is where you start. It contains:
+- **Model dropdown**: Choose which transformer model to load (e.g., GPT-2, Qwen2.5)
+- **Prompt input**: Type the text you want the model to analyze
+- **Generation settings**: Control beam search parameters (number of beams and tokens to generate)
+- **Analyze button**: Click to run the model and see results
+### 3. Generation Results
+After clicking Analyze, you'll see one or more **generated sequences** -- these are the model's continuations of your prompt. If you used beam search with multiple beams, you'll see multiple possible continuations. Click "Select for Comparison" on any sequence to use it in ablation experiments.
+### 4. Pipeline Visualization
+This is the core educational section. It shows **5 expandable stages** that your text passes through:
+1. **Tokenization**: How your text is split into tokens
+2. **Embedding**: How tokens become number vectors
+3. **Attention**: How the model finds relationships between tokens
+4. **MLP**: How stored knowledge is retrieved
+5. **Output**: What the model predicts and how confident it is
+Click any stage to expand it and see detailed explanations and visualizations.
+### 5. Investigation Panel
+At the bottom, two experiment tabs let you investigate *why* the model made its prediction:
+- **Ablation**: Remove specific attention heads and see what changes
+- **Token Attribution**: Measure which input tokens influenced the prediction most
+### 6. AI Assistant (Chatbot)
+The floating robot icon in the bottom-right corner opens the AI chatbot. It can answer questions about transformers, explain what you're seeing in the dashboard, and guide you through experiments.
+## Typical Workflow
+1. Select a model (start with GPT-2 if you're new)
+2. Enter a prompt (e.g., "The cat sat on the")
+3. Click "Analyze" to run the model
+4. Explore the 5 pipeline stages to understand how the model processed your input
+5. Use the Investigation Panel to run ablation or attribution experiments
+6. Ask the chatbot if anything is unclear

rag_docs/embeddings_cache.json DELETED Viewed

@@ -1 +0,0 @@

- [{"content": "# RAG Documents\n\nThis folder contains documents used by the AI chatbot for Retrieval-Augmented Generation (RAG).\n\n## Supported File Types\n\n- `.txt` - Plain text files\n- `.md` - Markdown files\n\n## How to Add Documents\n\n1. Place your transformer-related documentation files in this folder\n2. The chatbot will automatically index new documents on startup\n3. Documents are chunked and embedded for semantic search\n\n## Recommended Content\n\n- Transformer architecture explanations\n- Attention mechanism documentation\n- Information about the experiments available in this dashboard\n- Glossary of ML/NLP terms\n- Model-specific documentation (GPT-2, LLaMA, etc.)\n\n## Notes\n\n- Large files will be automatically chunked (~500 tokens per chunk)\n- Embeddings are cached in `embeddings_cache.json` for faster subsequent loads\n- Delete `embeddings_cache.json` to force re-indexing of all documents", "source_file": "README.md", "chunk_index": 0, "embedding": [0.0012510021, 0.026576312, 0.029617026, -0.031101469, -0.0014066292, -0.055115923, 0.015826074, 0.008451747, -0.022649722, 0.011276976, 0.0009240357, -0.024804559, -0.0037140995, -0.022458183, -0.0012061098, 0.033352077, -0.0032292614, -0.015311308, 0.017382346, 0.03794906, 0.039026476, -0.019405497, 0.0062190965, 0.0063268384, -0.015131739, -0.014377546, 0.0054738824, 0.047143027, 0.008589417, -0.032322545, -0.01550285, -0.021560334, -0.010558697, -0.029712796, -0.009774577, 0.022817321, 0.01999209, 0.031795807, -0.014269805, -0.02033926, -0.007679597, -0.03794906, -0.026672084, 0.045107905, -0.028659321, -0.010259415, 0.01707109, -0.002739934, -0.0022176856, 0.021201193, -0.014952169, -0.022721551, -0.02117725, 0.009193968, 0.018399907, 0.021763846, -0.05411033, -0.014521202, -0.0011230587, -0.022290584, 0.00061801897, -0.0076855826, 0.019297756, 0.023344059, 0.015586648, -0.008679202, -0.086432874, 0.032154944, -0.040750347, 0.003082613, 0.054780725, -0.030814158, -0.027581904, -0.03146061, -0.007452142, 0.012354394, -0.038715225, 0.022003273, -0.02116528, -0.04288124, -0.016209157, -0.006464509, -0.015921846, -0.025379183, -0.01825625, -0.010702353, -0.030311363, -0.0059856563, -0.03512383, -0.0065123937, -0.03794906, 0.0069912462, -0.053775135, 0.02171596, -0.0049291877, 0.005096786, -0.03866734, -0.010241457, -0.004737647, 0.022506068, 0.026432658, -0.03986447, 0.031963404, -0.009642892, 0.034237955, 0.011594216, 0.03028742, -0.03368727, -0.013467726, -0.043216437, -0.083033025, -0.03720684, -0.0007467106, 0.023044776, 0.0046448694, -0.013743066, -0.0032831323, 0.006177197, 0.004405443, -0.009696763, -0.042019308, -0.0006064218, -0.008427803, -0.0035405157, 0.004192952, -0.004680783, 0.010989665, -0.019561125, -0.0383082, 0.053583592, 0.032705624, 0.023511657, 0.044581167, -0.051668182, -0.027653731, -0.02162019, -0.009942175, 0.039625045, -0.037254725, 0.011552316, 0.01759783, -0.020554744, -0.036991354, 0.05200338, -0.064692974, -0.013336042, -0.018579477, 0.008475689, -0.022649722, 0.0018750068, -0.07019977, 0.013731095, -0.009205939, -0.005063865, -0.019357612, -0.011223105, 0.0017208761, 0.04369529, 0.01109142, -0.049704887, -0.014581058, 0.031412724, -0.0056145457, -0.019465353, 0.0070092035, 0.024613017, 0.015107796, -0.012258624, -0.038020886, -0.01926184, 0.013695181, -0.08073453, -0.017537972, -0.021129366, -0.014401489, 0.007990851, -0.0397687, -0.014796542, 0.020028006, 0.021835674, -0.015035968, 0.005066858, -0.052769545, 0.010121744, -0.04168411, -0.02691151, 0.008469704, -0.013731095, -0.017466145, 0.029712796, 0.005677395, 0.024732731, 0.05018374, 0.016580267, 0.018304136, 0.009738662, 0.028108642, -0.035219602, 0.038715225, -0.012426222, -0.0008207831, 0.01814851, -0.0103132855, -0.026600257, -0.030311363, -0.01806471, -0.036344904, 0.033184476, -0.0353872, -0.044389624, 0.0061532543, -0.020925853, 0.0056085596, 0.038739167, -0.06727877, -0.022338469, -0.040702462, 0.049848545, -0.0065542934, -0.012120954, -0.007362357, 0.01350364, 0.019561125, 0.0122466525, -0.0012510021, 0.029497314, -0.010881923, -0.07106171, 0.04752611, 0.014509231, 0.023786997, 0.03519566, -0.040534865, 0.006859562, 0.0694815, -0.016628154, -0.007745439, 0.017909084, -0.017190805, 0.03907436, 0.022458183, -0.01723869, 0.0067817485, 0.01924987, 0.015754247, -0.007835224, -0.0397687, -0.06239448, 0.0928495, 0.036249135, 0.015586648, 0.0016430626, 0.058372118, -0.021632161, -0.0056893663, 0.017669657, -0.020854026, 0.023403915, -0.040463034, 0.025714379, -0.021548362, 0.0691463, -0.036775872, -0.013144501, 0.012737476, 0.0032023261, 0.005138686, 0.003289118, -0.036823757, -0.010798124, -0.022829292, 0.020770228, 0.019668866, 0.02317646, -0.061532546, 0.009828447, -0.0071588447, -0.016819693, -0.013168444, 0.07048709, -0.0038607481, 0.0077873385, 0.0072007445, 0.040846117, -0.006213111, 0.030790215, 0.0529132, 0.016388727, 0.0125818495, -0.015263423, 0.025403125, -0.009349595, 0.03739838, -0.054062445, -0.03165215, 0.021548362, 0.048866898, 0.02436162, 0.043120667, 0.007260601, -0.0031873619, 0.01072031, 0.029162116, 0.011414646, 0.021464562, 0.050758366, -0.025139757, -0.042378444, 0.025714379, -0.01933367, -0.00077514246, -0.016352812, -0.000480723, -0.039098307, -0.014916255, 0.006697949, 0.042091135, 0.026480542, 0.044916365, -0.007362357, 0.0015951773, -0.025714379, 0.073455974, -0.052865315, -0.04496425, -0.019549154, -0.014904284, 0.022039186, -0.05674402, -0.008739058, 0.07379117, -0.031125411, 0.024373591, 0.025594665, -0.016137328, -0.028587494, 0.009565079, 0.031508494, -0.05190761, -0.0020366195, -0.023894738, -0.016628154, -0.041923534, 0.0021488506, -0.05181184, 0.007434185, -0.02326026, 0.06009599, 0.020698398, 0.002009684, 0.014832456, -0.0065004225, -0.024217965, -0.027725559, -0.023786997, 0.014078263, -0.04975277, -0.030119821, -0.0025424075, -0.007793324, -0.025594665, 0.02326026, 0.052625887, 0.0072546154, 0.011833643, -0.00939748, -0.035818167, 0.02746219, 0.056313053, -0.011432603, 0.011031564, -0.02280535, -0.005396069, -0.004366536, -0.05118933, -0.006679992, -0.0058839, -0.0117139295, -0.0031783835, 0.061388887, -0.02308069, -0.013443784, -0.012545935, 0.002551386, 0.015526792, 0.013934608, -0.0055606747, 0.011061492, -0.0063627525, 0.064262, -0.04781342, 0.005955728, -0.037254725, -0.009523178, 0.0064345803, 0.020147718, 0.00784121, -0.029976167, 0.031843692, 0.0019977128, 0.043982603, -0.0001821884, -0.028779035, -0.0030945842, 0.030215591, 0.030502904, 0.06340007, 0.046568405, 0.010504827, -0.047597937, 0.043719232, -0.013994464, -0.009319667, 0.008218306, -0.019201985, 0.010049917, 0.008972499, 0.053631477, -0.021859616, -0.00023811689, -0.039002534, -0.0587552, -0.025929863, -0.019860407, -0.0071109594, 0.030024052, 0.055977855, 0.01089988, 0.02691151, -0.03249014, -0.044006545, -0.015490878, 0.047478225, -0.014820484, -0.009289739, -0.026432658, 0.028491722, 0.03706318, -0.007709525, -0.016699981, 0.018507648, -0.018663276, -0.020686427, -0.013731095, -0.030407133, -0.02545101, -0.002834208, 0.0041480595, 0.005482861, -0.024110222, -0.016999263, -0.004282737, -0.016987292, -0.06723089, -0.000704811, -0.040295437, 0.026145346, 0.01751403, 0.06454931, -0.012402279, 0.045323387, 0.022685636, -0.023571514, 0.009463322, -0.0011851599, -0.048747182, -0.043551635, -0.0063926806, 0.0132522425, -0.03438161, -0.019405497, -0.028755091, -0.046592347, 0.0057881293, -0.0049740802, -0.02992828, -0.019417468, -0.043934718, 0.04189959, -0.020195603, 0.03411824, -0.013431813, -0.01871116, -0.025738321, 0.0064704944, 0.103049055, 0.016771808, -0.01514371, 0.05703133, -0.021991301, -0.017909084, -0.011564287, 0.011163249, -0.011480489, 0.03696741, 0.00038981586, 0.009834433, -0.11013607, 0.021560334, -0.023463773, -0.035698455, 0.018363994, 0.007045117, 0.03495623, -0.031388782, 0.014569087, -0.037446264, 0.012557906, 0.006644078, 0.025139757, 0.018184423, -0.026767854, 0.019848436, 0.030814158, 0.01641267, 0.019908292, -0.0001279434, 0.038188487, -0.016137328, -0.0027638767, 0.012929018, -0.021536391, -0.0059048496, -0.021428648, 0.013671239, -0.009864361, -0.020399116, -0.073360205, -0.03201129, 0.031843692, 0.03129301, -0.033328135, 0.01851962, 0.012605792, -0.03038319, -0.007925008, -0.009966117, 0.022015244, -0.013084644, -0.013036759, -0.023583485, 0.021560334, 0.030502904, -0.025810149, 0.0052673775, 0.009092212, -0.008278162, -0.012426222, 0.008056693, 0.008966513, 0.042953067, -0.016125357, 0.030909928, -0.032346487, 0.021500478, -0.010211529, 0.03907436, -0.023068719, 0.02947337, -0.002461601, 0.0008948556, -0.027222764, -0.023152517, 0.011576259, -0.026145346, 0.026097462, -0.005800101, -0.0019528203, -0.015670449, -0.018088654, 0.024421478, 0.0043096724, -0.010181601, 0.023559542, 0.027677674, -0.023451801, -0.02006392, -0.014700771, -0.036009707, 0.010941779, -0.01707109, -0.023846854, -0.036632214, 0.015706362, -0.029210001, 0.023763055, 0.007966909, 0.009092212, -0.002977864, -0.0035674511, 0.051428758, 0.009696763, 0.006332824, -0.029808568, -0.012905074, -0.009331638, -0.0065004225, 0.008224292, -0.034716807, 0.04077429, 0.0024376584, 0.0085774455, -0.04087006, -0.003549494, 0.031388782, -0.014245861, -0.020387145, -0.000491572, -0.013802923, 0.012725505, -0.0049890443, -0.026719969, -0.03694347, -0.046113495, -0.007996837, 0.00046164374, 0.004154045, 0.03129301, 0.0068655475, -0.01696335, -0.013060702, -0.001662516, 0.002880597, -0.011815685, 0.017382346, 0.04752611, -0.01796894, -0.051955495, -0.045730412, -0.0035973794, -0.030598674, 0.041755937, 0.034094296, 0.003741035, -0.030622616, 0.005530746, 0.0051087574, -0.02033926, 0.015311308, -0.00544994, 0.009996045, 0.03519566, 0.0016206164, -0.00784121, 0.058803085, -0.022386353, -0.0090203835, 0.017741485, 0.00394754, 0.022601837, -0.047095142, -0.026863625, -0.011869556, 0.009205939, 0.039265905, -0.025977748, 0.018028796, -0.018040767, 0.014808513, 0.05956925, -0.031987347, 0.0031125413, -0.0011911456, -0.01641267, -0.034525264, 0.07996836, -0.010301314, 0.028491722, -0.0054918397, 0.012390308, -0.0035884008, -0.02098571, 0.014030378, -0.0019333669, 0.00971472, -0.015275395, -0.0017882148, 0.0009599496, 0.043455865, 0.021033596, -0.004276751, -0.021213165, -0.029425485, -0.038787052, 0.035866052, -0.051524527, 0.034429494, -0.0046089552, 0.014976111, 0.016891522, -0.003480659, -0.0075239697, -0.02072234, -0.018854817, -0.024457391, 0.07699948, 0.0103132855, 0.008757015, 0.01044497, -0.03885888, -0.0014829463, 0.017621772, -0.009277767, -0.000600062, -0.023655312, -0.0031723978, -0.0062071253, -0.01707109, 0.019345641, -0.067518204, -0.018854817, 0.009612964, 0.013156472, 0.01723869, -0.021632161, 0.003498616, -0.018040767, 0.016089443, -0.009050312, -0.042019308, -0.011073464, 0.015802132, -0.027534017, -0.008266191, -0.030071937, -0.026049575, 0.010403071, -0.0057731653, -0.051859725, 0.012521992, 0.015861988, 0.03952927, -0.01587396, 0.0016834658, 0.007835224, 0.023224346, -0.036727987, 0.013000845, -0.032418314, -0.015407079, -0.00811655, -0.023164488, -0.009056298, 0.024636962, -0.012138911, -0.017573886, 0.0068116765, 0.029425485, -0.021763846, 0.02217087, -0.00095620856, 0.007673611, -0.0026187245, -0.035554796, 0.013886722, 0.03529143, 0.014078263, -0.020267433, -0.0058719288, 0.016628154, -0.024409506, 0.020399116, -0.009319667, 0.0021997287, -0.018639334, -0.015586648, 0.013563497, -0.013599411, -0.004593991, 0.035650566, -0.008804901, -0.015861988, 0.0073204576, -0.022039186, 0.055403233, -0.022649722, 0.001792704, -0.03301688, 0.0034716805, -0.006721892, 0.0039146193, -0.012545935, -0.0063747237, -0.02928183, -0.002385284, 0.0040642605, -0.017837254, -0.021308936, -0.0056325025, -0.013300128, -0.072450384, -0.00957705, 0.02290112, -0.012725505, -0.0005012987, -0.012342423, 0.010403071, -0.069720924, -0.048100732, 0.010887909, -0.028204411, 0.021081481, -0.028563552, 0.023894738, 0.02992828, 0.051237218, -0.03404641, 0.009044327, -0.024708789, 0.012426222, -0.0077394536, 0.007655654, -0.03265774, 0.024002481, -0.0072127157, 0.037135012, -0.012228696, -0.027510075, 0.007033146, 0.023200404, 0.005518775, -0.027103052, -0.0078172665, 0.019405497, 0.005952735, 0.042522103, -0.00042460748, -0.011396689, -0.0078471955, -0.035363257, 0.0033968599, -0.009032355, 0.011300919, 0.008038736, 0.0011918938, -0.01596973, -0.011570273, 0.017526, -0.039792642, 0.030167706, 0.017190805, 0.023954596, 0.049417578, -0.0057372516, -0.013767009, -0.0037380422, -0.0056983447, 0.0064824657, 0.015179624, 0.03323236, 0.012641706, 0.034884404, -0.024732731, -0.0027743515, 0.00038944176, 0.013072673, 0.02235044, -0.0021413686, 0.0019064315, 0.008158449, 0.017442202, -0.0016505447, 0.011624144, -0.0106484825, 0.0056594377, -0.055882085, -0.024038395, -0.022194812, 0.0028701222, -0.009457337, 0.004166017, 0.010630526, 0.011259019, 0.004210909, 0.0025349255, 0.025714379, -0.0146888, 0.04132497, 0.010666439, -0.02490033, 0.025163699, 0.023212375, 0.024158109, 0.024337677, -0.04515579, -0.025283411, 0.029114231, 0.04743034, 0.011911456, 0.003758992, -0.009858375, -0.009475294, 0.0051416787, -0.007045117, -0.0072546154, 0.049465463, -0.03119724, 0.0365125, -0.012869161, -0.011115364, 0.017741485, -0.021763846, 0.011522388, -0.021967359, -0.014533173, -0.023379972, 0.011618159, 0.01897453, 0.00408521, 0.009313681, 0.006560279, 0.009636906, 0.024349649, 0.042737585, 0.043168552, -0.010630526, -0.007188773, 0.007751425, -0.0070032175, 0.0016161272, -0.0082123205, -0.04051092, 0.010415042, -0.0054020546, -0.0032322544, 0.035889994, 0.016185215, 0.03706318, 0.021021625, -0.01971675, 0.013767009, -0.031340897, 0.0054050474, 0.02746219, 0.010768196, -0.027941043, -0.014700771, -0.0162331, 0.0022221748, -0.01696335, -0.007416228, -0.03957716, -0.0124142505, 0.002392766, -0.014521202, -0.036344904, -0.045634642, 0.02015969, 0.010726295, -0.022721551, 0.00647648, 0.047143027, 0.01550285, -0.020315317, -0.03177186, -0.03093387, 0.007547912, 0.022470154, 0.013024787, 0.02371517, -0.0010766699, 0.030957814, 0.018567506, -0.015323279, -0.0019169063, -0.008565474, -0.0042887228, 0.03493229, 0.020375174, 0.010995651, 0.06383104, -0.029880395, 0.037087124, 0.03383093, 0.004219888, 0.016041558, -0.0008162939, -0.036704045, 0.008332034, -0.019656895, 0.00337591, 0.008020779, 0.018495677, -0.0147127425, 0.019848436, -0.019740693, 0.035794225, -0.018579477, -0.0059437566, 0.011995255, 0.050758366, -0.026289001, -0.016544353, 0.026097462, -0.006943361, -0.0063627525, -0.00093600695, 0.0129170455, 0.08126127, 0.020698398, -0.0162331, 0.020698398, 0.042857297, -0.030981757, 0.0031245125, 0.00095471216, 0.0046747974, -0.011600202, -0.0047286684, -0.021883558, 0.0010078348, 0.008415832, 0.07312077, -0.025139757, -0.008535545, -0.02707911, -0.0019573097, 0.007098988, -0.045562815, -0.026983337, -0.014341632, -0.0045161773, 0.0115463305, -0.0019917272, 0.010977693, 0.016424641, -0.005638488, 0.03165215, -0.043192495, 0.012216724, 0.02043503, 0.043982603, 0.013898694, 0.016556324, 0.0044652997, 0.041731995, 0.012390308, 0.015071882, 0.023116603, 0.020746283, -0.034908347, -0.028779035, -0.013743066, 0.020602629, 0.04515579, -0.015155681, -0.01395855, 0.013778981, 0.004192952, 0.033902757, -0.03038319, 0.028491722, 0.0018660283, 0.022613809, -0.0013235782, -0.0008342508, -0.014174034, 0.014006435, -0.023451801, 0.0032861251, -0.037685692, -0.010163644, -0.015095824, -0.009319667, -0.0064884513, 0.015909875, 0.022937035, -0.026312944, 0.040798232, 0.008858771, 0.02363137, -0.042761527, -0.027150936, 0.03181975, -0.0073444, -0.008774972, -0.005901857, -0.03593788, -0.020004062, 0.006165226, -0.01579016, 0.004761589, -0.023068719, -0.028946633, 0.0026860633, 0.00019565613, 0.01707109, -0.027534017, 0.0065722503, -0.015407079, -0.01796894, 0.0066620354, 0.047143027, -0.029257888, 0.011833643, 0.014174034, -0.018854817, -0.04343192, 0.007314472, -0.0069373753, -0.07541927, 0.028252296, -0.012366366, -0.022721551, -0.03720684, 0.04661629, -0.00916404, -0.018723132, 0.01953718, -0.0069912462, 0.006278953, 0.022410296, -0.013324071, -0.04158834, -0.020375174, -0.006141283, -0.044748764, 0.0026845667, 0.036273077, 0.004528149, 0.0015861989, -0.0009891297, -0.013000845, 0.0150479395, -0.0060095987, 0.003603365, -0.0039445474, 0.0033579532, -0.00604252, -0.033974584, 0.0043755146, -0.024804559, 0.01787317, -0.049657002, -0.012019197, -0.020961767, -0.002461601, 0.0036692072, -0.022925062, -0.055068035, 0.006733863, 0.034070354, -0.020327289, 0.027510075, -0.027414305, 0.0012659662, 0.009517193, 0.041205257, -0.008840814, -0.044748764, -0.021608219, 0.011666044, -0.011911456, 0.028252296, 0.010588625, -0.026672084, -0.017190805, -0.010738267, 0.054206103, -0.0059078424, 0.0022550959, 0.0017403295, -0.026672084, -0.004219888, 0.024684846, 0.0339267, -0.017681628, 0.033878814, -0.011444574, -0.023104632, -0.00062363053, 0.010391099, -0.0008896181, 0.01798091, -0.025403125, 0.009391494, 0.024613017, 0.019549154, -0.036464617, 0.004001411, 0.013359984, -0.004234852, -0.015718333, -0.010971707, -0.021871587, -0.036823757, -0.0441502, 0.020937825, 0.010834037, -0.017166862, 0.013623353, 0.0057133087, 0.023882767, -0.0073743286, -0.035770282, -0.005515782, 0.0015001551, 0.020734312, 0.023655312, 0.015478907, -0.0020725334, 0.013647296, -0.05411033, 0.01145056, 0.02225467, -0.0032262686, 0.040056013, -0.019237898, 0.036991354, -0.00542899, -0.018543562, 0.0264566, -0.014988083, -0.04369529, 0.03313659, -0.004558077, -0.013790952, -0.028324125, 0.007470099, 0.01814851, 0.021380764, 0.027318535, 0.03639279, -0.0015712348, -0.013994464, 0.010672425, 0.038164545, -0.032250714, -0.00093226595, 0.023451801, 0.0054888465, -0.037973, 0.0041480595, 0.012785361, -0.02973674, 0.0019019422, 0.032346487, 0.027150936, -0.012091026, 0.014401489, 0.09423817, 0.011827656, 0.03675193, -0.0067997053, -0.016664067, 0.007966909, 0.018747075, -0.0295452, 0.0008978484, -0.025929863, -0.0030347276, -0.0025798178, 0.013707153, -0.015023997, 0.0080147935, -0.030670501, 0.020782199, -0.0002035123, -0.01514371, 0.010947765, 0.021464562, -0.011247048, 0.007308486, -0.0102534285, 0.036464617, -0.021572305, -0.03158032, 0.0072007445, 0.017298546, 0.0094453655, -0.0052554063, 0.0032262686, -0.0036991355, 0.01679575, -0.010989665, 0.015454964, -0.015083853, 0.003049692, -0.011803714, 0.004704726, -0.01871116, 0.0045790267, -0.02235044, 0.00014711621, -0.0048872884, -0.032322545, -0.004746625, 0.022146927, 0.049513347, 0.016580267, 0.0036632216, 0.045586757, 0.0010908858, 0.0015196084, 0.0009509711, -0.022649722, 0.013527582, -0.032968994, -0.026696026, 0.013575468, 0.045323387, -0.005414026, -0.007901066, 0.010606582, -0.06938572, 0.012533964, -0.0013460245, -0.0037440278, -0.00976859, -0.017944997, -0.015035968, 0.04245027, -0.01697532, -0.002916511, 0.042546045, 0.006787734, 0.026289001, 0.0020590657, 0.008421818, -0.029497314, -0.004195945, -0.0070271604, -0.013778981, -0.00016890773, 0.0116840005, -0.005138686, 0.009295724, 0.034357667, 0.0037230782, -0.01587396, -0.0053511765, -0.025977748, 0.003289118, -0.013336042, -0.016999263, -0.013324071, 0.011109378, 0.00013056213, 0.011875542, 0.0148803415, -0.011552316, -0.0005207521, 0.036105476, 0.01368321, 0.017573886, 0.027677674, -0.014736685, -0.027581904, -0.00121434, -0.012929018, 0.00985239, 0.0050399224, -0.009074255, -0.0021293971, 0.017382346, -0.0065722503, -0.0019603025, 0.0156105915, -0.03574634, -0.028036814, -0.005548703, 0.0013767009, 0.0052943127, -0.003603365, -0.0038038844, 0.0235356, -0.0235356, -0.027797388, 0.009684792, 0.016915465, -0.032418314, 0.0155507345, 0.03641673, -0.004923202, -0.019489296, 0.008990455, 0.022386353, 0.018938616, -0.020554744, -0.0034327738, 0.035363257, 0.0030481953, -0.029090289, -0.023403915, 0.02262578, -0.00017003005, 0.0014874355, -0.035770282, 0.007835224, 0.033806987, 0.025714379, -2.2352684e-05, 0.020004062, 0.0055636675, -0.002959907, -0.03866734, -0.036895584, -0.0017508044, 0.049704887, -0.00013851182, -0.043264322, -0.0041301027, -0.011288947, -0.032083116, 0.009211925, -0.03203523, -0.000372233, -0.041492566, -0.035913937, 0.0074222134, -0.010432999, -0.0015861989, 0.015035968, 0.0018600427, 0.019944206, -0.03210706, 0.0040103896, -0.0006067959, -0.028491722, -0.006721892, 0.0041301027, -0.030024052, 0.033447847, 0.021057539, -0.0144852875, 0.002750409, -0.017466145, -0.005315263, 0.017717542, 0.008948556, 0.01714292, 0.020758256, 0.007152859, -0.032418314, -0.019345641, 0.018902702, -0.024684846, 0.0104509555, 0.02636083, -0.03440555, -0.028946633, 0.007751425, -0.006126319, -0.024613017, 0.005847986, -0.0134796975, 0.022410296, -0.0125818495, 0.010863966, 0.018304136, -0.016400699, 0.0128093045, 0.06713512, -0.005922807, 0.006817662, -0.017897112, 0.005360155, -0.004109153, 0.0139824925, 0.022996891, -0.013180415, 0.024277821, 0.0004893274, 0.012210739, -0.049010552, -0.021787789, 0.022206783, -0.002497515, 0.008110564, -0.013012816, -0.0032651755, 0.050375283, 0.032897167, -0.024660904, 0.03275351, 0.021225136, -0.0051745996, 0.0049980227, 0.019094244, -0.0044024503, -0.005964706, -0.017011235, 0.022111014, -0.006841605, 0.041995365, 0.008876728, -0.015861988, -0.031412724, -0.0065004225, 0.0012749447, -0.040079955, 0.033878814, -0.007972894, -0.0049860515, -0.009864361, 0.027126994, -0.03404641, 0.019752664, 0.026815739, 0.026217174, -0.04058275, 0.006967304, -0.01568242, -0.018363994, -0.0132522425, 0.010397085, 0.0073384144, -0.010391099, 0.008385904, 0.0083499905, 0.007140888, 0.007948952, 0.045275502, 0.028443838, 0.032538027, 0.007823252, -0.013563497, -0.020075891, 0.01109142, -0.004573041, 0.008535545, 0.013395898, -0.022422267, 0.0024346656, -0.007601783, 0.029593084, -0.029114231, -0.003552487, -0.015299337, -0.005964706, -0.016436612, 0.047669765, -0.035770282, -0.028659321, 0.03203523, -0.0081524635], "content_hash": "91a1d62385fd3abd0b4a12e1d84489e3"}]

rag_docs/embeddings_explained.md ADDED Viewed

	@@ -0,0 +1,35 @@

+# Embeddings Explained
+## What Are Embeddings?
+After tokenization breaks your text into token IDs, the model needs to convert those IDs into something it can actually compute with. This is where **embeddings** come in. An embedding is a list of numbers (a **vector**) that represents a token's meaning.
+## The Lookup Table
+Think of embeddings as a giant dictionary. Each token ID maps to a specific vector of numbers. For GPT-2, each token maps to a vector of **768 numbers**. For larger models, this vector might have 2048, 4096, or more numbers.
+This dictionary (called an **embedding table**) was learned during training. The model figured out which numbers best represent each token by seeing how tokens are used across billions of text examples. Once training is done, the table is fixed -- the same token always maps to the same vector.
+## Why Vectors?
+Why use lists of numbers instead of just the token ID? Because vectors let the model capture **meaning** and **relationships**:
+- Words with similar meanings (like "happy" and "joyful") end up with similar vectors
+- Related concepts are grouped nearby in the vector space
+- Directions in the space can capture relationships (e.g., "king" - "man" + "woman" ≈ "queen")
+This is what allows the model to generalize -- even if it has never seen a specific sentence before, it can work with the underlying meanings of the tokens.
+## Positional Information
+There's one more detail: the model also needs to know the **order** of tokens (since "dog bites man" is different from "man bites dog"). This is handled by adding **positional encodings** to the embeddings -- extra numbers that tell the model where each token sits in the sequence.
+## What You See in the Dashboard
+In **Stage 2 (Embedding)** of the pipeline, you can see:
+- The dimension of the embedding vectors (e.g., "768-dimensional" for GPT-2)
+- A visual showing the flow: Token ID → Lookup Table → Vector
+- An explanation of how the lookup table was created during training
+This is the point where raw token IDs become rich numerical representations that the rest of the model can process.

rag_docs/experiment_beam_search.md ADDED Viewed

	@@ -0,0 +1,82 @@

+# Experiment: Exploring Alternative Predictions with Beam Search
+## Goal
+Learn how beam search reveals multiple possible continuations of a prompt, and see how ablating attention heads can redirect the model's generation from one path to another.
+## Prerequisites
+- Complete "Your First Analysis"
+- Complete "Your First Ablation"
+## Steps
+### Step 1: Generate Multiple Beams
+1. Select **GPT-2 (124M)** and enter the prompt: `Once upon a time there was a`
+2. Set **Number of Generation Choices (Beams)** to **3**.
+3. Set **Number of New Tokens** to **8**.
+4. Click **Analyze**.
+### Step 2: Compare the Beams
+You should see 3 different generated sequences. Look at how they differ:
+- **Beam 1**: The model's top-ranked overall sequence
+- **Beam 2**: The second-best sequence
+- **Beam 3**: The third-best sequence
+Notice how they might start the same but diverge at some point. For example:
+- Beam 1: "Once upon a time there was a young man who lived"
+- Beam 2: "Once upon a time there was a little girl who loved"
+- Beam 3: "Once upon a time there was a king who ruled"
+The beams share a common prefix because the early tokens were confident, but as generation continues, different paths emerge.
+### Step 3: Select a Beam for Comparison
+1. Click **"Select for Comparison"** on Beam 1 (the top-ranked sequence).
+2. This stores it as the baseline for ablation comparison.
+### Step 4: Investigate What Drives the Divergence
+1. Look at **Stage 5 (Output)** to see the top-5 predictions for the immediate next token.
+2. Note the top prediction and its probability. Are the alternatives close in probability? If so, the model was uncertain, which explains why beams diverge early.
+### Step 5: Ablate a Head and Re-Generate
+1. Go to the **Ablation** tab in the Investigation Panel.
+2. From the head categories in Stage 3, pick a **Previous-Token** head (e.g., L0-H3).
+3. Add it and click **"Run Ablation Experiment."**
+4. Look at the **Full Generation Comparison**:
+   - Did the ablated generation diverge from the original?
+   - Did the model take a completely different path, or just change a word or two?
+   - Did the ablated generation match one of the other beams you saw earlier?
+### Step 6: Try a Stronger Ablation
+1. **Clear** the selected heads.
+2. Add **two or three** Previous-Token heads from different layers.
+3. Run the ablation again.
+4. Compare: does ablating multiple heads cause a bigger divergence than ablating one?
+### Step 7: Experiment with Different Beam Settings
+1. Change the prompt to: `The scientist discovered that the`
+2. Try with **1 beam** (greedy decoding): note the single output.
+3. Try with **3 beams**: see the alternatives.
+4. Try with **5 beams**: do you get even more diverse options?
+Notice how increasing beams reveals the model's uncertainty -- places where multiple continuations are roughly equally likely.
+## What You Should Learn
+- **Beam search reveals model uncertainty**: When the model isn't sure, multiple beams show the different paths it's considering.
+- **Ablation can redirect generation**: Removing important heads can push the model from one beam to another, showing that different attention heads support different generation paths.
+- **More beams = more alternatives**: But beyond 3-5 beams, the additional paths are often low-probability and less interesting.
+- **Generation is a chain**: Each token depends on the previous ones, so a small change early (from ablation or beam selection) can cascade into a very different output.
+## What's Next?
+You've now completed the core experiments. Try combining techniques:
+- Run attribution to find which input tokens matter, then ablate the heads that seem to process those tokens
+- Compare how GPT-2 and Qwen2.5-0.5B handle the same prompt with the same beam settings

rag_docs/experiment_comparing_heads.md ADDED Viewed

	@@ -0,0 +1,88 @@

+# Experiment: Comparing Heads Across Categories
+## Goal
+Systematically ablate heads from each category to discover which types of attention heads matter most for different prompts. Build intuition for how attention head roles vary.
+## Prerequisites
+- Complete "Your First Ablation" (know how to use the ablation panel)
+- Complete "Exploring Attention Patterns" (understand head categories)
+## Steps
+### Step 1: Set Up a Simple Prompt
+1. Select **GPT-2 (124M)** and enter: `The cat sat on the`
+2. Set beams to 1 and tokens to 5.
+3. Click **Analyze**.
+4. **Select the generated sequence for comparison** by clicking "Select for Comparison."
+5. Note the original prediction and probability in Stage 5 (e.g., "mat" at 45%).
+### Step 2: Record the Head Categories
+1. Expand **Stage 3 (Attention)** and note one head from each category:
+   - **Previous-Token**: _______ (e.g., L0-H3)
+   - **First/Positional**: _______ (e.g., L0-H1)
+   - **Bag-of-Words**: _______ (e.g., L2-H5)
+   - **Syntactic**: _______ (e.g., L4-H2)
+   - **Other**: _______ (e.g., L1-H8)
+### Step 3: Ablate One Head at a Time
+For each head you noted, do the following:
+1. Go to the **Ablation** tab in the Investigation Panel.
+2. **Clear** any previously selected heads.
+3. **Add** just the one head from the current category.
+4. Click **"Run Ablation Experiment."**
+5. Record the results:
+| Category | Head | Probability Change | Generation Changed? |
+|----------|------|-------------------|-------------------|
+| Previous-Token | | | |
+| First/Positional | | | |
+| Bag-of-Words | | | |
+| Syntactic | | | |
+| Other | | | |
+### Step 4: Analyze Your Results
+Look at the table you've filled in:
+- **Which category caused the biggest probability drop?** Previous-Token heads often have the largest impact on simple prompts because local context matters a lot.
+- **Which category had the least effect?** BoW and Other heads often show smaller effects for short prompts.
+- **Did any ablation change the generated text?** A generation change is a stronger signal than just a probability change.
+### Step 5: Try a More Complex Prompt
+Now repeat the process with a prompt that requires more sophisticated processing:
+1. Enter: `The doctors told the patient that they would need`
+2. Analyze and select the generation for comparison.
+3. Ablate one head from each category again and record results.
+**What to expect**: For this more complex prompt:
+- **Syntactic heads** may matter more (there are grammatical dependencies like "doctors...they")
+- **First/Positional heads** may show more impact because the sentence structure is more complex
+- The pattern of which categories matter may shift compared to the simple prompt
+### Step 6: Compare Results Between Prompts
+| Category | Simple Prompt Impact | Complex Prompt Impact |
+|----------|---------------------|----------------------|
+| Previous-Token | | |
+| First/Positional | | |
+| Bag-of-Words | | |
+| Syntactic | | |
+| Other | | |
+## What You Should Learn
+- No single head category is always the "most important" -- it depends on the prompt
+- Simple prompts tend to rely more on Previous-Token heads (local patterns)
+- Complex prompts with grammatical dependencies may rely more on Syntactic heads
+- Some heads are redundant for certain inputs but critical for others
+- Ablation is most informative when you compare across conditions (categories, prompts, or both)
+## Advanced Challenge
+Try ablating **two heads simultaneously** from the same category. Does removing two Previous-Token heads have a bigger effect than removing one? Or does the model have enough redundancy to compensate?

rag_docs/experiment_exploring_attention.md ADDED Viewed

	@@ -0,0 +1,62 @@

+# Experiment: Exploring Attention Patterns
+## Goal
+Learn to read attention visualizations and understand what different attention head categories reveal about how the model processes text.
+## Prerequisites
+Complete "Your First Analysis" first so you're familiar with the basic workflow.
+## Steps
+### Step 1: Run an Analysis
+1. Select **GPT-2 (124M)** and enter the prompt: `The cat sat on the mat because it was`
+2. Click **Analyze**.
+3. This prompt is ideal because it contains a pronoun ("it") that needs to be resolved -- the model must figure out that "it" refers to "the cat."
+### Step 2: Open the Attention Stage
+1. Expand **Stage 3 (Attention)** in the pipeline.
+2. Look at the **head categories** section at the top.
+### Step 3: Explore Previous-Token Heads
+1. Click on **"Previous-Token"** to expand the category.
+2. Note which heads are listed (e.g., L0-H3, L1-H7).
+3. In the **BertViz visualization** below, double-click on one of those head squares. This will show only that head's attention pattern.
+4. **What to look for**: You should see a strong diagonal pattern -- each token attends heavily to the token directly before it. Lines should mostly connect each word to the previous word.
+### Step 4: Explore First/Positional Heads
+1. Click on **"First/Positional"** to see which heads focus on the first token.
+2. Double-click one of those heads in BertViz.
+3. **What to look for**: You should see many tokens sending attention lines to "The" (the first token). This is a common pattern -- the first token acts as a "sink" for attention when there's no better target.
+### Step 5: Explore Bag-of-Words Heads
+1. Find a **"Bag-of-Words"** head in the categories.
+2. View it in BertViz.
+3. **What to look for**: Attention should be spread broadly and evenly across many tokens. Lines will be thin and numerous rather than thick and focused. This head is gathering a general summary of the whole input.
+### Step 6: Look for Interesting Patterns
+1. Now single-click to select multiple heads from different categories.
+2. Look for heads where the token "it" attends strongly to "cat" -- this would suggest the head is helping resolve the pronoun reference.
+3. Try hovering over the word "it" on the left side of BertViz to see which words it attends to most strongly.
+### Step 7: Try Different Prompts
+Run the analysis again with different prompts and compare:
+- `Alice gave the book to Bob because she` (pronoun resolution: does "she" attend to "Alice"?)
+- `The dogs in the park were` (subject-verb agreement: does the model connect "dogs" to "were"?)
+- `1 2 3 4 5` (number sequence: what patterns emerge with non-natural-language input?)
+## What You Should Learn
+- Different attention heads serve different purposes
+- Previous-Token heads are the easiest to identify visually (strong diagonal pattern)
+- The same prompt can reveal different patterns in different heads
+- BertViz is a powerful tool for understanding attention, but it takes practice to read fluently
+- Not all heads have obvious patterns -- and that's okay. The "Other" category captures complex, context-dependent behavior.

rag_docs/experiment_first_ablation.md ADDED Viewed

	@@ -0,0 +1,79 @@

+# Experiment: Your First Ablation
+## Goal
+Learn how ablation works by removing an attention head and observing how it changes the model's prediction. Discover which heads matter and which are redundant.
+## Prerequisites
+- Complete "Your First Analysis"
+- Complete "Exploring Attention Patterns" (so you know about head categories)
+## Steps
+### Step 1: Set Up the Analysis
+1. Select **GPT-2 (124M)** and enter the prompt: `The cat sat on the`
+2. Set **Number of Generation Choices** to 1 and **Number of New Tokens** to 5.
+3. Click **Analyze**.
+4. Note the model's prediction in Stage 5 and the generated sequence.
+### Step 2: Select a Sequence for Comparison
+1. In the generated sequences section, click **"Select for Comparison"** on the generated sequence.
+2. This stores the original generation so the ablation experiment can compare against it.
+### Step 3: Find a Head to Ablate
+1. Expand **Stage 3 (Attention)** and look at the head categories.
+2. Find a head from the **Previous-Token** category. Let's say it's **L0-H3** (yours may differ).
+3. Note this head -- Previous-Token heads often have noticeable effects when removed.
+### Step 4: Set Up the Ablation
+1. Scroll down to the **Investigation Panel** and make sure the **"Ablation"** tab is selected.
+2. In the **Layer** dropdown, select the layer of your chosen head (e.g., 0).
+3. In the **Head** dropdown, select the head number (e.g., 3).
+4. Click the **+** button to add it. You should see a chip appear: "L0-H3".
+### Step 5: Run the Ablation
+1. Click **"Run Ablation Experiment"**.
+2. Wait for results to appear.
+### Step 6: Analyze the Results
+Look at the ablation results:
+- **Full Generation Comparison**: Compare the original text to the ablated text. Did the generated sequence change?
+- **Probability Change**: Look at the immediate next-token probability change. For example, "72.3% → 45.1% (-27.2%)" would mean removing this head significantly reduced the model's confidence.
+### Step 7: Try Ablating a Different Head
+1. Click **"Clear Selected Heads"** to reset.
+2. Now pick a head from the **"Other"** category (these often have less obvious roles).
+3. Add it and run the ablation again.
+4. Compare: was the effect larger or smaller than the Previous-Token head?
+### Step 8: Compare Your Results
+| Head | Category | Probability Change | Generation Changed? |
+|------|----------|-------------------|-------------------|
+| L0-H3 | Previous-Token | (fill in) | (yes/no) |
+| L?-H? | Other | (fill in) | (yes/no) |
+**Typical findings**:
+- Previous-Token heads in early layers often cause noticeable probability drops when ablated
+- Many "Other" heads have minimal impact for simple prompts
+- The same head may matter more or less depending on the specific prompt
+## What You Should Learn
+- Ablation is a tool for measuring the importance of individual model components
+- Not all heads are equally important -- some are redundant
+- The effect of ablation depends on the specific input prompt
+- This technique is used by researchers to understand how models work internally
+## What's Next?
+Move on to **Experiment: Token Attribution** to learn a different approach -- instead of removing components, measure which input tokens drive the prediction.

rag_docs/experiment_first_analysis.md ADDED Viewed

	@@ -0,0 +1,67 @@

+# Experiment: Your First Analysis
+## Goal
+Learn how to run your first analysis and walk through each pipeline stage to understand how a transformer model processes text.
+## Prerequisites
+None -- this is the starting experiment.
+## Steps
+### Step 1: Select a Model
+1. In the **generator section** at the top, find the "Select Model" dropdown.
+2. Choose **"GPT-2 (124M)"** from the list.
+3. Wait for the model to load. You'll see a status message indicating the model is ready.
+### Step 2: Enter a Prompt
+1. In the **"Enter Prompt"** textarea, type: `The cat sat on the`
+2. Leave the generation settings at their defaults (1 beam, a few tokens).
+### Step 3: Run the Analysis
+1. Click the **"Analyze"** button.
+2. Wait for the analysis to complete. The pipeline stages and generation results will appear.
+### Step 4: Explore the Generated Sequences
+Look at the **generated sequence(s)** below the generator. You should see how GPT-2 continues your prompt. Common completions might include "mat," "floor," "bed," or similar words.
+### Step 5: Walk Through the Pipeline
+Now expand each of the **5 pipeline stages** by clicking on them:
+**Stage 1 - Tokenization**: Click to expand. You'll see your prompt split into tokens. Notice how each word (and its leading space) becomes a separate token. Count the tokens -- "The cat sat on the" should produce about 5 tokens.
+**Stage 2 - Embedding**: Click to expand. You'll see that each token was converted into a 768-dimensional vector. This is GPT-2's hidden dimension.
+**Stage 3 - Attention**: Click to expand. This is the richest stage:
+- Look at the **head categories**. You should see heads grouped into Previous-Token, First/Positional, Bag-of-Words, Syntactic, and Other.
+- Click on a category (like "Previous-Token") to see which specific heads belong to it.
+- Below the categories, you'll see the **BertViz visualization**. Try clicking on individual head squares to see their attention patterns.
+**Stage 4 - MLP**: Click to expand. You'll see the expand-compress pattern: 768 → 3072 → 768. This shows GPT-2's feed-forward network dimensions.
+**Stage 5 - Output**: Click to expand. You'll see:
+- Your prompt with the predicted next token highlighted
+- The confidence percentage
+- A top-5 bar chart showing the model's top predictions
+### Step 6: Reflect
+Think about what you observed:
+- How many tokens did your prompt become?
+- What was the model's top prediction? How confident was it?
+- Were there any surprising alternative predictions in the top 5?
+## What's Next?
+Try changing the prompt and running the analysis again. Compare results with different inputs:
+- A factual prompt: "The capital of France is"
+- A creative prompt: "Once upon a time, there was a"
+- A technical prompt: "The function takes an input and"
+Then move on to **Experiment: Exploring Attention Patterns** to dive deeper into what the attention heads are doing.

rag_docs/experiment_token_attribution.md ADDED Viewed

	@@ -0,0 +1,76 @@

+# Experiment: Understanding Token Attribution
+## Goal
+Learn how to use token attribution to identify which parts of your input most influenced the model's prediction. Compare two attribution methods and see how results change with different target tokens.
+## Prerequisites
+- Complete "Your First Analysis"
+## Steps
+### Step 1: Run an Analysis with a Meaningful Prompt
+1. Select **GPT-2 (124M)** and enter the prompt: `The capital of France is`
+2. Click **Analyze**.
+3. Check Stage 5 -- the model should predict something like "Paris" or "the" with high confidence. Note the top prediction.
+### Step 2: Open the Attribution Panel
+1. Scroll down to the **Investigation Panel**.
+2. Click the **"Token Attribution"** tab.
+### Step 3: Run Simple Gradient Attribution
+1. Select **"Simple Gradient (faster, less accurate)"** as the attribution method.
+2. Leave the **Target Token** dropdown empty (this defaults to the top prediction).
+3. Click **"Compute Attribution"**.
+### Step 4: Read the Results
+Look at the two visualizations:
+**Color-coded tokens**: Your input tokens are displayed as colored boxes. Darker blue means higher influence.
+- You should see that **"France"** has a very dark color -- it's the most relevant token for predicting "Paris."
+- **"capital"** likely also has a notable color -- it sets up the context for a city name.
+- Function words like **"The"**, **"of"**, and **"is"** should be lighter -- they contribute less to this specific prediction.
+**Bar chart**: Shows the same information as horizontal bars with scores. Longer bars = more influence.
+**Hover over any token chip** to see the exact attribution score.
+### Step 5: Compare with Integrated Gradients
+1. Now switch to **"Integrated Gradients (more accurate, slower)"**.
+2. Click **"Compute Attribution"** again.
+3. Compare the results. Integrated Gradients should give a more refined picture:
+   - The relative ordering of token importance may shift slightly
+   - Integrated Gradients tends to produce more reliable scores, especially for distinguishing tokens of moderate importance
+### Step 6: Change the Target Token
+1. In the **Target Token** dropdown, select a different token from the top-5 predictions (e.g., if the model also considered "a" or "the" as alternatives to "Paris").
+2. Run attribution again.
+3. **What to look for**: Different target tokens are driven by different input tokens. For example:
+   - "Paris" might be strongly driven by "France" and "capital"
+   - A generic token like "the" might be driven more by "is" (as a common grammatical continuation)
+### Step 7: Try a Different Prompt
+Run attribution on: `Alice gave the book to Bob because she`
+- Which tokens drive the prediction of the next word?
+- Does "Alice" have high attribution (suggesting the model connects "she" to "Alice")?
+- Does "Bob" have lower attribution than "Alice" for this prediction?
+## What You Should Learn
+- Token attribution reveals which input tokens "caused" a particular prediction
+- Content words (nouns, verbs) typically have higher attribution than function words (the, of, is)
+- Different target tokens can be driven by completely different input tokens
+- Integrated Gradients is more accurate but slower; Simple Gradient gives a quick approximation
+- Attribution helps you understand the "why" behind a model's prediction
+## What's Next?
+Move on to **Experiment: Comparing Heads** to combine ablation with your understanding of head categories, or try **Experiment: Beam Search** to explore how the model generates longer sequences.

rag_docs/gpt2_overview.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# GPT-2 Overview
+## What Is GPT-2?
+GPT-2 (Generative Pre-trained Transformer 2) is a language model created by OpenAI in 2019. It was one of the first models to demonstrate that scaling up transformers could produce impressively fluent text. It remains one of the most well-studied and accessible models for learning about transformer internals.
+## Architecture Details
+| Property | Value |
+|----------|-------|
+| Parameters | ~124 million (small variant) |
+| Layers | 12 |
+| Attention Heads | 12 per layer (144 total) |
+| Hidden Dimension | 768 |
+| MLP Dimension | 3072 (4x hidden) |
+| Vocabulary Size | 50,257 tokens |
+| Positional Encoding | Learned absolute positions |
+| Max Sequence Length | 1024 tokens |
+| Normalization | LayerNorm |
+| Activation Function | GELU |
+## Why Start with GPT-2?
+GPT-2 small is the **recommended starting model** for learning with this dashboard:
+- **Fast**: Small enough to load quickly and run interactively
+- **Well-studied**: More research papers have analyzed GPT-2's internals than almost any other model. Many examples and references use GPT-2.
+- **Clear patterns**: With 12 heads per layer, the attention patterns are easy to visualize and categorize
+- **Manageable size**: 144 total attention heads is small enough to explore systematically
+## What to Expect in the Dashboard
+When analyzing GPT-2, you'll typically see:
+- **Tokenization**: GPT-2 uses BPE (Byte-Pair Encoding). Common words are single tokens; rare words get split. Spaces are typically attached to the beginning of the following token.
+- **Embeddings**: 768-dimensional vectors, which capture rich semantic information despite being relatively compact.
+- **Attention patterns**: You'll see a good mix of head categories. Expect several Previous-Token heads (especially in early layers), some First/Positional heads, and a variety of other patterns.
+- **Output**: GPT-2 can produce reasonably coherent text for simple prompts. For factual prompts, it sometimes produces outdated or incorrect facts (it was trained on data from before 2019).
+## GPT-2 Variants
+The dashboard supports all GPT-2 sizes, though only the small variant is in the default dropdown:
+- **GPT-2 Small** (124M params, 12 layers) -- in dropdown as "GPT-2 (124M)"
+- **GPT-2 Medium** (355M params, 24 layers) -- enter `gpt2-medium` in the dropdown
+- **GPT-2 Large** (774M params, 36 layers) -- enter `gpt2-large`
+- **GPT-2 XL** (1.5B params, 48 layers) -- enter `gpt2-xl`
+Larger variants have more layers and heads but use more memory and are slower.
+## HuggingFace Model IDs
+- `gpt2` or `openai-community/gpt2`
+- `gpt2-medium` or `openai-community/gpt2-medium`
+- `gpt2-large` or `openai-community/gpt2-large`
+- `gpt2-xl` or `openai-community/gpt2-xl`

rag_docs/head_categories_explained.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Attention Head Categories Explained
+## What Are Head Categories?
+The dashboard automatically analyzes all attention heads in the model and categorizes them based on their behavior patterns. This helps you understand what each head is doing without having to inspect every attention map manually.
+Head categories appear in **Stage 3 (Attention)** of the pipeline. Click any category to expand it and see which specific heads (like L0-H3, L2-H11) belong to it.
+## The Five Categories
+### Previous-Token Heads
+**What they do**: These heads strongly attend to the **immediately preceding token**. For every token at position *i*, the head focuses most of its attention on position *i-1*.
+**Why they matter**: Previous-token heads help the model track local context -- the word that just came before. They're important for bigram patterns (common two-word combinations like "of the" or "in a").
+**Detection**: A head is classified as Previous-Token if, on average, more than 40% of each token's attention goes to the token directly before it.
+**In the dashboard**: These heads are labeled with a purple color. Ablating them often causes noticeable changes in predictions.
+### First/Positional Heads
+**What they do**: These heads focus heavily on the **first token** in the sequence or show strong **positional patterns** (always attending to a specific position regardless of content).
+**Why they matter**: The first token often serves as a "default" attention target. Positional heads help the model keep track of where it is in the sequence.
+**Detection**: Classified when average attention to the first token exceeds 25%.
+### Bag-of-Words (BoW) Heads
+**What they do**: These heads spread their attention **broadly and evenly** across many tokens, without focusing strongly on any particular one.
+**Why they matter**: BoW heads capture a general summary of the entire input. They help the model maintain an overall sense of what the text is about.
+**Detection**: Classified when the attention distribution has high entropy (≥ 0.65 normalized) and no single token receives more than 35% attention.
+### Syntactic Heads
+**What they do**: These heads attend to tokens at **consistent distances**, suggesting they track grammatical or structural relationships (like subject-verb pairs).
+**Why they matter**: Syntactic heads help the model understand grammar and sentence structure. They might connect a verb to its subject or a pronoun to what it refers to.
+**Detection**: Classified when tokens consistently attend to other tokens at similar distances, with low variance in attention distances.
+### Other
+**What they do**: Heads that don't clearly fit any of the above patterns. They may have mixed or context-dependent behavior.
+**Why they matter**: "Other" doesn't mean unimportant. These heads may serve specialized roles that only activate for certain inputs. They're worth investigating through ablation experiments.
+## Using Categories for Experiments
+Head categories are especially useful for guiding ablation experiments:
+- Ablate a **Previous-Token** head to see if local context patterns break
+- Ablate a **BoW** head to see if the model loses global context
+- Compare the effect of ablating heads from different categories on the same prompt

rag_docs/interpreting_ablation_results.md ADDED Viewed

	@@ -0,0 +1,56 @@

+# Interpreting Ablation Results
+## Quick Reference
+When you ablate an attention head and see the results, here's how to interpret what happened.
+## Probability Changes
+The dashboard shows the immediate next-token probability before and after ablation (e.g., "72.3% → 45.1% (-27.2%)").
+### Large Probability Drop (>10%)
+The ablated head was **important** for this prediction. It was actively contributing to the model's confidence in the top token. This head likely plays a significant role in processing this specific input.
+**Example**: Ablating a Previous-Token head when the model is predicting a word that commonly follows the previous word (like predicting "the" after "on").
+### Small Probability Drop (1-10%)
+The head has **some contribution** but isn't critical. Other heads or MLP layers may provide overlapping information. The model has some redundancy that compensates for the missing head.
+### Negligible Change (<1%)
+The head was likely **redundant for this input**. It may serve a function that isn't relevant to this particular prompt, or other heads provide the same information.
+**Important**: This doesn't mean the head is useless -- it might be critical for other prompts. Try the same head with different inputs.
+### Probability Increase
+Occasionally, ablating a head can **increase** the probability of the top prediction. This means the head was actually pulling the model away from this prediction -- it was a "competing signal." This is an interesting finding that suggests the head was promoting a different output.
+## Generation Changes
+The full generation comparison shows whether the ablated model produces different text.
+### Generation Changed
+The head was important enough that removing it altered the model's entire output sequence. This is a strong signal of importance. Look at where the texts diverge -- the point of divergence tells you where the head's contribution was most critical.
+### Generation Stayed the Same
+Even if the probability shifted, the model still chose the same tokens. This means the head's contribution wasn't large enough to cross the decision boundary. The model is robust to losing this head for this particular input.
+## Multi-Head Ablation
+When you ablate multiple heads simultaneously:
+- **Additive effects**: If ablating heads A and B together has a bigger effect than either alone, the heads contributed independently to the prediction.
+- **Redundant heads**: If ablating both has about the same effect as ablating just one, the heads may have been providing the same information.
+- **Synergistic effects**: Rarely, ablating two heads together can have a much larger effect than the sum of their individual effects. This suggests the heads work together as a circuit.
+## Tips for Interpretation
+- Always compare ablation effects across different head categories on the same prompt
+- Try the same head on multiple prompts to see if its importance is consistent or input-dependent
+- A head's category (Previous-Token, Syntactic, etc.) gives you a hypothesis about why it matters -- ablation lets you test that hypothesis
+- Remember that ablation is a blunt tool: removing a head removes all of its functions, not just the one you're interested in

rag_docs/interpreting_attention_maps.md ADDED Viewed

	@@ -0,0 +1,70 @@

+# Interpreting Attention Maps
+## Quick Reference
+The BertViz attention visualization in Stage 3 shows how tokens attend to each other. Here's how to read the patterns.
+## Reading the Visualization
+The BertViz display shows:
+- **Left column**: Each token in your input (the "query" -- the token doing the looking)
+- **Right column**: The same tokens (the "keys" -- what's being looked at)
+- **Lines between them**: Attention connections. Each line shows how much one token attends to another.
+### Line Properties
+- **Thicker, more opaque lines** = stronger attention (the model focuses more on this connection)
+- **Thin, faint lines** = weak attention (some attention, but not much)
+- **No line** = very little or no attention between those tokens
+### Interacting with BertViz
+- **Single-click** a head square at the top to select/deselect it
+- **Double-click** a head square to view only that head (deselects all others)
+- **Hover** over a token or line to see exact attention weights
+## Common Attention Patterns
+### Diagonal Pattern (Previous-Token)
+You see each token strongly attending to the token directly before it, creating a diagonal line of strong connections.
+**What it means**: This head tracks local word order. It's useful for bigram patterns -- sequences of two words that commonly appear together.
+**Looks like**: A staircase pattern of thick lines, each shifted one position to the left.
+### Vertical Stripe (First-Token / Positional)
+You see many or all tokens attending to the same position (usually the first token), creating a vertical column of lines.
+**What it means**: The first token often serves as a "default sink" for excess attention. This is an artifact of the softmax function -- attention weights must sum to 1.0, so when a head has nothing specific to attend to, it sends attention to the first token.
+**Looks like**: Many thick lines all pointing to the same token on the right side.
+### Uniform / Diffuse (Bag-of-Words)
+You see many thin lines spreading from each token to many other tokens, with no strong focus on any particular one.
+**What it means**: This head is gathering a broad summary of the entire input, rather than focusing on specific relationships. It helps the model maintain an overall sense of context.
+**Looks like**: A dense web of thin, similarly-weighted lines.
+### Structured Connections (Syntactic)
+You see specific, purposeful-looking connections that skip across tokens -- like a token attending to a word several positions away in a consistent pattern.
+**What it means**: This head may be tracking grammatical relationships. For example, a verb attending to its subject, or a pronoun attending to its antecedent.
+**Looks like**: A few thick lines making specific connections, often spanning several token positions.
+### Mixed / Context-Dependent
+Some heads show patterns that change based on the input. They might show one pattern for factual prompts and another for creative prompts.
+**What it means**: These heads are flexible and context-sensitive. They don't have a single fixed pattern but adapt to the input.
+## Tips for Reading Attention Maps
+- **Start with one head at a time**: Double-click individual heads to isolate their patterns. Looking at all heads simultaneously is confusing.
+- **Compare heads across layers**: The same type of pattern (like Previous-Token) may appear in early layers but not late layers, or vice versa.
+- **Match patterns to categories**: Use the head categories as a guide. If a head is categorized as "Previous-Token," look for the diagonal pattern to confirm.
+- **Hover for details**: The exact attention weight tells you more than just the visual thickness. Two lines might look similar but have meaningfully different weights.
+- **Context matters**: The same head can show different patterns for different prompts. Try multiple inputs to understand a head's full behavior.

rag_docs/interpreting_attribution_scores.md ADDED Viewed

	@@ -0,0 +1,59 @@

+# Interpreting Attribution Scores
+## Quick Reference
+Token attribution scores tell you how much each input token influenced a specific prediction. Here's how to read and interpret them.
+## Understanding the Scores
+Attribution scores are **normalized** so that the most influential token gets a score of 1.0, and all other scores are relative to it.
+### High Score (0.7 - 1.0)
+This token was **highly influential**. The model relied heavily on this token when making its prediction. For factual predictions, these are usually the content words that carry the key information.
+**Example**: In "The capital of France is" → "Paris", the token "France" typically gets the highest score because it directly determines which capital the model predicts.
+### Medium Score (0.3 - 0.7)
+This token had **moderate influence**. It contributed context that helped the prediction but wasn't the primary driver.
+**Example**: In the same prompt, "capital" might get a medium score -- it tells the model to predict a city name, but "France" specifies which one.
+### Low Score (0.0 - 0.3)
+This token had **minimal influence** on this specific prediction. It may be a function word (the, of, is) or a word that doesn't directly relate to what's being predicted.
+## Comparing Attribution Methods
+### Integrated Gradients vs. Simple Gradient
+- **Integrated Gradients** averages gradients over many intermediate steps between a "blank" baseline and the actual input. This produces more reliable, less noisy scores. Use it when you want trustworthy results.
+- **Simple Gradient** takes a single gradient measurement. It's faster but can be noisy -- scores may overemphasize some tokens or miss subtle contributions. Good for quick exploration.
+**When they disagree**: If the two methods give very different rankings, the attribution is likely noisy. Trust Integrated Gradients for the more accurate picture.
+## Why Results Vary by Target Token
+Attribution is computed **with respect to a specific target token**. Different targets can be driven by entirely different input tokens.
+**Example** with prompt "Alice gave Bob a gift because she":
+- Target "liked": High attribution for "Alice" (she liked something) and "Bob" (she liked Bob)
+- Target "was": High attribution for "she" and "Alice" (describing Alice's state)
+- Target "wanted": High attribution for "gift" (she wanted to give a gift)
+This is one of the most powerful uses of attribution -- it reveals which input tokens support different possible continuations.
+## Common Patterns
+- **Content words dominate**: Nouns, verbs, and adjectives typically have higher attribution than function words
+- **Recent tokens often matter more**: Tokens closer to the prediction point tend to have higher attribution, especially for local patterns
+- **Distant tokens can matter too**: For long-range dependencies (like pronoun resolution), distant tokens can have surprisingly high attribution
+- **Punctuation varies**: Commas and periods sometimes have notable attribution because they signal sentence structure
+## Tips
+- Try the same prompt with multiple target tokens to see how attribution shifts
+- Short prompts (5-10 tokens) give the clearest attribution results
+- If all scores are roughly equal, the model may be uncertain or the prediction may not depend on any single token
+- Use attribution alongside ablation for a fuller picture: attribution tells you which input tokens matter; ablation tells you which internal components matter

rag_docs/key_terminology.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# Key Terminology
+An extended glossary of terms you may encounter while using the Transformer Explanation Dashboard.
+## Core Concepts
+**Token**: A small piece of text that the model processes. Can be a word, part of a word, or a punctuation mark. The model's fundamental unit of input and output.
+**Embedding**: A vector (list of numbers) that represents a token's meaning. Similar tokens have similar embeddings.
+**Attention**: The mechanism that lets each token look at all other tokens to gather relevant context. Uses Queries, Keys, and Values.
+**Attention Head**: One instance of the attention mechanism. Each layer has multiple heads that look for different patterns simultaneously.
+**Layer**: One complete processing step in the Transformer, containing both attention and MLP components. GPT-2 has 12 layers; larger models have more.
+**MLP / Feed-Forward Network (FFN)**: The component in each layer that processes tokens individually, storing and retrieving factual knowledge. Uses an expand-then-compress pattern.
+## Architecture Terms
+**Residual Stream**: The "conveyor belt" of information running through all layers. Each layer reads from it and adds back its contribution. This preserves information from earlier layers.
+**Layer Normalization (LayerNorm)**: A technique applied before or after each sublayer that stabilizes the numbers, keeping them in a reasonable range. This helps training and makes the model more robust.
+**Parameters / Weights**: The learnable numbers in the model. These are adjusted during training to improve predictions. GPT-2 has ~124 million parameters.
+**Hidden Dimension**: The size of the internal vector representations. For GPT-2, this is 768 -- meaning each token is represented by 768 numbers at each layer.
+**Vocabulary**: The complete set of tokens the model knows. GPT-2 has a vocabulary of about 50,257 tokens.
+## Training and Inference
+**Training**: The process of adjusting the model's parameters by showing it billions of text examples. The model learns to predict the next token and its parameters are updated to reduce prediction errors.
+**Inference**: Using the trained model to make predictions on new text. This is what happens when you click "Analyze" in the dashboard -- no learning occurs, the model just processes your input.
+**Forward Pass**: One complete trip of data through the model, from input tokens to output predictions. The dashboard visualizes this forward pass.
+**Gradient**: A measure of how much each parameter contributed to the model's prediction error. Used during training to update parameters, and in attribution experiments to measure token importance.
+**Loss**: A number measuring how wrong the model's predictions are. During training, the goal is to minimize this. Lower loss means better predictions.
+**Fine-tuning**: Taking a pre-trained model and training it further on a specific dataset to specialize its behavior.
+## Prediction Terms
+**Logits**: The raw, unnormalized scores the model assigns to every possible next token before converting to probabilities.
+**Softmax**: The function that converts logits into probabilities (positive numbers that sum to 1.0).
+**Probability Distribution**: The complete set of probabilities over all possible next tokens. The dashboard shows the top 5.
+**Temperature**: A setting that controls prediction confidence. Low temperature = more focused; high temperature = more spread out.
+**Beam Search**: A generation strategy that explores multiple possible sequences simultaneously instead of just picking the single best token at each step.

rag_docs/llama_overview.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# LLaMA Overview
+## What Is LLaMA?
+LLaMA (Large Language Model Meta AI) is a family of open-weight language models developed by Meta. First released in 2023, LLaMA models introduced several architectural improvements over GPT-2 and became the foundation for many other models (Mistral, Qwen, etc.). In the dashboard, models labeled "LLaMA-like" share this architecture.
+## Architectural Differences from GPT-2
+LLaMA models use several key innovations:
+### RoPE (Rotary Position Embeddings)
+Instead of GPT-2's learned absolute position embeddings, LLaMA uses **Rotary Position Embeddings (RoPE)**. RoPE encodes position information by rotating the query and key vectors in attention. This means:
+- The model can generalize better to different sequence lengths
+- Position information is baked into the attention computation itself
+- Attention patterns may look different from GPT-2 because of how positions are encoded
+### RMSNorm Instead of LayerNorm
+LLaMA uses **RMSNorm** (Root Mean Square Normalization) instead of the standard LayerNorm used in GPT-2. RMSNorm is simpler and slightly faster -- it only normalizes the magnitude of the vectors without centering them first.
+### SiLU Activation
+Where GPT-2 uses GELU activation in the MLP, LLaMA uses **SiLU** (Sigmoid Linear Unit, also called "Swish"). This is a smooth activation function that tends to produce slightly different MLP behavior.
+### Grouped-Query Attention (GQA)
+Larger LLaMA variants use **Grouped-Query Attention**, where multiple query heads share the same key and value heads. This reduces memory usage and speeds up inference without significantly hurting quality. This means the number of key/value heads may be smaller than the number of query heads.
+## Models Using LLaMA Architecture
+The dashboard's "llama_like" family includes:
+- **Meta LLaMA**: LLaMA 2 (7B, 13B, 70B), LLaMA 3 (1B, 3B, 8B, 70B)
+- **Qwen**: Qwen2, Qwen2.5 (0.5B to 72B) -- available in the dashboard dropdown as "Qwen2.5-0.5B"
+- **Mistral**: Mistral-7B, Mixtral-8x7B
+## What to Expect in the Dashboard
+When using a LLaMA-like model (such as Qwen2.5-0.5B):
+- **More layers and heads**: Even the small Qwen2.5-0.5B has 24 layers and 14 heads, compared to GPT-2's 12 layers and 12 heads
+- **Different attention patterns**: RoPE-based attention may show different positional patterns compared to GPT-2
+- **Different tokenizer**: LLaMA-family models use a different BPE vocabulary, so the same text may tokenize differently
+- **Comparing with GPT-2**: Running the same prompt on both GPT-2 and a LLaMA-like model is a great way to see how architecture affects predictions
+## HuggingFace Model IDs
+- `Qwen/Qwen2.5-0.5B` (in default dropdown)
+- `meta-llama/Llama-3.2-1B`, `meta-llama/Llama-3.1-8B`
+- `mistralai/Mistral-7B-v0.3`

rag_docs/mechanistic_interpretability_intro.md ADDED Viewed

	@@ -0,0 +1,58 @@

+# Introduction to Mechanistic Interpretability
+## What Is Mechanistic Interpretability?
+Mechanistic interpretability (often called "mech interp") is a field of AI research that aims to understand **how** neural networks work internally -- not just what they predict, but why. Instead of treating models as black boxes, researchers open them up and study the individual components (neurons, attention heads, layers) to figure out what each one does.
+This dashboard is a tool for doing exactly that kind of investigation.
+## How This Dashboard Relates
+The experiments available in this dashboard are real techniques used in mechanistic interpretability research:
+- **Ablation** (removing heads to test their importance) is a standard tool for identifying which components are responsible for specific behaviors
+- **Token attribution** (measuring input influence via gradients) is used to trace how information flows from inputs to outputs
+- **Attention pattern analysis** (categorizing heads by behavior) helps researchers build a map of what each head does
+- **Head categorization** (Previous-Token, BoW, Syntactic, etc.) builds on research that has identified recurring head types across models
+## Key Concepts in the Field
+### Circuits
+A **circuit** is a small subnetwork within the model that performs a specific function. For example, researchers have found "induction circuits" -- combinations of attention heads across layers that work together to complete patterns like "A B ... A" → "B" (if the model has seen "A B" before, when it sees "A" again, it predicts "B").
+In the dashboard, you can start to identify circuits by ablating combinations of heads and seeing which combinations have outsized effects.
+### Superposition
+**Superposition** is the idea that neural networks represent more features than they have dimensions. A 768-dimensional embedding might encode thousands of different concepts by overlapping them. This makes interpretation challenging because a single neuron can participate in many features.
+### Induction Heads
+**Induction heads** are one of the best-understood circuits. They are pairs of attention heads (typically one in an early layer and one in a later layer) that work together to copy patterns from context. If the model has seen "Harry Potter" earlier in the text and encounters "Harry" again, induction heads help it predict "Potter."
+You might observe induction-like behavior in the dashboard when using prompts with repeated patterns.
+### Polysemanticity
+Neurons and heads are often **polysemantic** -- they respond to multiple unrelated features. An attention head might handle both pronoun resolution and list formatting, depending on the input. This is why head categories are approximate: the same head may behave differently for different prompts.
+## Notable Research Groups
+These organizations have published influential work in mechanistic interpretability:
+- **Anthropic**: Published foundational work on transformer circuits, superposition, and dictionary learning for interpreting neural networks
+- **EleutherAI**: Open-source AI research group that has contributed tools and analysis for model interpretability
+- **Redwood Research**: Focuses on alignment-relevant interpretability, including causal interventions on model behavior
+- **DeepMind (Google)**: Research on understanding internal representations and how models store knowledge
+## Further Reading
+If you want to explore the research behind this dashboard's techniques:
+- "A Mathematical Framework for Transformer Circuits" (Elhage et al., Anthropic) -- foundational paper on how attention heads compose into circuits
+- "In-context Learning and Induction Heads" (Olsson et al., Anthropic) -- how models learn to copy patterns from context
+- "Locating and Editing Factual Associations in GPT" (Meng et al.) -- how facts are stored in MLP layers
+- "Attention Is All You Need" (Vaswani et al., 2017) -- the original Transformer paper
+These papers are referenced here for context. The dashboard provides a hands-on way to explore many of the concepts they describe.

rag_docs/mlp_layers_explained.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# MLP (Feed-Forward) Layers Explained
+## What Are MLP Layers?
+After attention gathers context from other tokens, each token's representation passes through a **Multi-Layer Perceptron (MLP)**, also called a **Feed-Forward Network (FFN)**. While attention handles relationships between tokens, the MLP processes each token independently -- and this is where much of the model's **factual knowledge** is stored.
+## What They Do
+Think of the MLP as the model's **memory bank**. During training, the MLP weights learned to encode facts, patterns, and associations from the training data. When the model processes "The capital of France is," the MLP layers help recall that "Paris" is the answer.
+Researchers have found that specific facts are often stored in specific MLP neurons. This is one of the key findings in mechanistic interpretability research.
+## The Expand-Then-Compress Pattern
+Each MLP layer follows a distinctive pattern:
+1. **Expand**: The token's representation is projected into a much larger space (typically 4x the hidden dimension). For GPT-2, this means going from 768 dimensions to 3072 dimensions.
+2. **Activate**: A non-linear activation function is applied (like GELU or SiLU), which allows the network to represent complex patterns.
+3. **Compress**: The expanded representation is projected back down to the original size (768 for GPT-2).
+**Why expand then compress?** The expansion creates space for many individual neurons to each "vote" on whether a specific concept or fact is relevant. The compression then combines these votes into a refined representation. Each neuron in the expanded layer can activate for specific concepts.
+## Attention + MLP = One Layer
+In each Transformer layer, attention and MLP work together:
+1. **Attention** gathers relevant context from other tokens
+2. **MLP** retrieves stored knowledge and transforms the representation
+3. The result is added back to the **residual stream** (the running representation)
+This happens in every layer. GPT-2 has 12 such layers; each one further refines the model's understanding.
+## What You See in the Dashboard
+In **Stage 4 (MLP/Feed-Forward)** of the pipeline, you can see:
+- The expand-compress flow: Input dimension → Expanded dimension → Output dimension
+- The number of layers in the model
+- An explanation of why the expansion matters for knowledge storage

rag_docs/model_selector_guide.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# Model Selector Guide
+## How to Choose a Model
+The dashboard supports several transformer model families. You can select a model from the dropdown menu in the generator section at the top of the page.
+### Available Models
+Currently, the dashboard offers:
+- **GPT-2 (124M)**: OpenAI's GPT-2 small model. 12 layers, 12 attention heads, 768-dimensional embeddings. This is the best model to start with -- it's small, fast, and well-studied.
+- **Qwen2.5-0.5B**: A LLaMA-like model from Alibaba's Qwen family. 24 layers, 14 attention heads, 896-dimensional embeddings. Slightly larger and uses different architectural features (RoPE, SiLU activation).
+You can also enter a custom **HuggingFace model ID** in the dropdown (type it in). The dashboard supports GPT-2, LLaMA, OPT, GPT-NeoX, BLOOM, Falcon, and MPT model families.
+### What Happens When You Load a Model
+1. The model is downloaded from HuggingFace (this may take a moment the first time)
+2. The dashboard **auto-detects** the model's architecture family
+3. Internal hooks are automatically configured to capture attention patterns, MLP activations, and other data
+4. The layer and head dropdowns in the sidebar and ablation panel are populated based on the model's structure
+### Auto-Detection
+The dashboard has a registry that maps model names to their architecture family. When it recognizes a model, it automatically configures:
+- Which internal modules to hook for attention capture
+- Which normalization parameters to track
+- The correct patterns for extracting layer outputs
+If you enter an unknown model, the sidebar's configuration dropdowns may need manual adjustment.
+### Tips for Choosing
+- **Start with GPT-2**: It's small, fast, and the most widely studied. Most educational resources reference GPT-2.
+- **Try Qwen2.5-0.5B for comparison**: It uses a different architecture (LLaMA-style). Comparing results between GPT-2 and Qwen can highlight how architectural differences affect attention patterns.
+- **Larger models are slower**: Models with more parameters take longer to load and analyze. Stick to small models for interactive exploration.
+- **Memory matters**: Larger models require more RAM. If the dashboard becomes unresponsive, try a smaller model.
+### Generation Settings
+After selecting a model and entering a prompt, you can configure:
+- **Number of Generation Choices (Beams)**: 1-5 beams. More beams explore more paths but take longer.
+- **Number of New Tokens**: 1-20 tokens to generate. Shorter is faster.
+Click **Analyze** to run the model and see results in the pipeline and generation sections.

rag_docs/opt_overview.md ADDED Viewed

	@@ -0,0 +1,48 @@

+# OPT Overview
+## What Is OPT?
+OPT (Open Pre-trained Transformer) is a family of language models released by Meta in 2022. OPT was designed to replicate GPT-3's architecture and performance while being openly available to researchers. It uses a decoder-only transformer architecture similar to GPT-2 but with options for much larger sizes.
+## Architecture Details
+OPT's architecture is close to GPT-2 but has some differences:
+| Property | OPT-125M | OPT-350M | OPT-1.3B |
+|----------|----------|----------|----------|
+| Parameters | 125M | 350M | 1.3B |
+| Layers | 12 | 24 | 24 |
+| Attention Heads | 12 | 16 | 32 |
+| Hidden Dimension | 768 | 1024 | 2048 |
+| Vocabulary Size | 50,272 | 50,272 | 50,272 |
+### Key Differences from GPT-2
+- **Learned positional embeddings**: Like GPT-2, OPT uses learned absolute position embeddings (unlike LLaMA's RoPE)
+- **LayerNorm placement**: OPT uses pre-norm LayerNorm (applied before each sublayer), which is slightly different from GPT-2's original arrangement
+- **Larger variants available**: OPT scales up to 175 billion parameters, though only smaller variants are practical for interactive use
+### Similarities to GPT-2
+- Same general decoder-only architecture
+- Same tokenizer style (BPE with ~50K vocabulary)
+- Same attention mechanism (standard multi-head self-attention)
+- Similar training objective (next-token prediction)
+## What to Expect in the Dashboard
+When using OPT models:
+- **OPT-125M is very similar to GPT-2**: Same number of layers (12), heads (12), and hidden dimension (768). You'll see similar attention patterns and predictions.
+- **Different module paths**: The dashboard auto-detects OPT's internal structure (e.g., `model.decoder.layers.N.self_attn`), so hooking works automatically.
+- **Tokenization**: OPT's tokenizer is very similar to GPT-2's, so the same text usually produces similar (but not identical) token sequences.
+- **Good for comparison**: Running the same prompt on GPT-2 and OPT-125M can show how similar architectures with different training data produce different predictions.
+## HuggingFace Model IDs
+- `facebook/opt-125m`
+- `facebook/opt-350m`
+- `facebook/opt-1.3b`
+- `facebook/opt-2.7b`
+Note: OPT models are not in the default dropdown but can be loaded by typing the model ID directly.

rag_docs/output_and_prediction.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# Output and Prediction
+## How Does the Model Choose the Next Token?
+After your text has passed through all the Transformer layers (attention + MLP in each), the model needs to make a prediction: what token comes next? This final step converts the model's internal representation into a probability distribution over its entire vocabulary.
+## Logits: Raw Scores
+The model's final hidden state for the last token is multiplied by the embedding table (in reverse) to produce a score for **every token in the vocabulary**. These raw scores are called **logits**. A higher logit means the model thinks that token is more likely.
+For GPT-2, this means producing about 50,257 scores -- one for each token in its vocabulary.
+## Softmax: Turning Scores into Probabilities
+Raw logits can be any number (positive or negative). To get actual probabilities, the model applies a function called **softmax**, which:
+1. Converts all scores to positive numbers
+2. Makes them all add up to 1.0 (100%)
+3. Preserves the ranking (higher logits → higher probabilities)
+After softmax, we can say things like "the model predicts 'mat' with 45% probability."
+## Temperature
+**Temperature** is a setting that controls how "confident" or "creative" the model's predictions are:
+- **Low temperature (e.g., 0.1)**: Makes the model very confident -- the top prediction gets almost all the probability. Good for factual, predictable text.
+- **High temperature (e.g., 1.5)**: Spreads probability more evenly, making less likely tokens more probable. Good for creative, varied text.
+- **Temperature = 1.0**: The default, unmodified distribution.
+## Greedy Decoding vs. Sampling
+Once we have probabilities, how do we pick the actual next token?
+- **Greedy decoding**: Always pick the token with the highest probability. Simple but can be repetitive.
+- **Sampling**: Randomly pick a token weighted by the probabilities. More varied but less predictable.
+- **Beam search**: Explore multiple possible sequences simultaneously and pick the best overall path. This is available in the dashboard's generation controls.
+## What You See in the Dashboard
+In **Stage 5 (Output Selection)** of the pipeline:
+- The **predicted token** is highlighted after your prompt text, along with its confidence percentage
+- A **top-5 bar chart** shows the five most likely next tokens and their probabilities
+- A note explains how beam search and other techniques can influence the final selection beyond just the top-1 token

rag_docs/pipeline_stages.md ADDED Viewed

	@@ -0,0 +1,53 @@

+# Pipeline Stages
+## Overview
+The pipeline visualization shows the 5 stages your text passes through inside the transformer model. A flow indicator at the top shows the path: **Input → Tokens → Embed → Attention → MLP → Output**. Click any stage to expand it and see details.
+## Stage 1: Tokenization
+**Icon**: Puzzle piece | **Summary shows**: "X tokens"
+This stage displays how your input text was split into tokens. Each row shows:
+- The **token** (the text piece, displayed in a blue box)
+- An arrow pointing to its **ID** (the number the model uses internally, in a purple box)
+Notice that spaces are often attached to the beginning of the following word (e.g., " cat" with a leading space). This is normal for models like GPT-2 that use BPE tokenization.
+## Stage 2: Embedding
+**Icon**: Cube | **Summary shows**: "X-dim vectors"
+This stage shows how token IDs are converted into numerical vectors using a pre-learned embedding table. You'll see:
+- A visual flow: Token ID → Lookup Table → Vector
+- The embedding dimension (e.g., 768 for GPT-2)
+- An explanation of how the lookup table was learned during training
+## Stage 3: Attention
+**Icon**: Eye | **Summary shows**: "X heads × Y layers"
+This is the most detailed stage. It includes:
+- **Head Categories**: Attention heads are automatically categorized by their behavior pattern (Previous-Token, First/Positional, Bag-of-Words, Syntactic, Other). Click each category to see which specific heads belong to it.
+- **BertViz Visualization**: An interactive attention map showing which tokens attend to which. Lines connect tokens on the left to tokens on the right. Thicker lines mean stronger attention.
+**Navigating BertViz**: Single-click a head square to select/deselect it. Double-click to show only that head. Hover over tokens or lines to see exact attention weights.
+## Stage 4: MLP (Feed-Forward)
+**Icon**: Network | **Summary shows**: "X layers"
+This stage shows the expand-then-compress pattern of the feed-forward network:
+- Input dimension (e.g., 768) → Expanded dimension (e.g., 3072, which is 4x larger) → Back to input dimension
+- An explanation of why this expansion matters for storing factual knowledge
+- The total number of layers in the model
+## Stage 5: Output Selection
+**Icon**: Bullseye | **Summary shows**: "→ [predicted token]"
+This stage reveals the model's prediction:
+- Your full prompt with the **predicted next token** highlighted
+- A **confidence percentage** for the top prediction
+- A **top-5 bar chart** showing the five most likely next tokens and their probabilities
+- A note about how beam search and other techniques can influence the final output

rag_docs/recommended_starting_points.md ADDED Viewed

	@@ -0,0 +1,64 @@

+# Recommended Starting Points
+## Best First Model
+**GPT-2 (124M)** is the ideal starting model because:
+- It loads quickly and runs fast
+- It has a manageable size (12 layers, 12 heads = 144 heads total)
+- It's the most studied model in mechanistic interpretability research
+- Most educational examples and tutorials reference GPT-2
+## Good Starter Prompts
+### For Exploring Basic Predictions
+| Prompt | What It Tests |
+|--------|--------------|
+| `The cat sat on the` | Simple object prediction (mat, floor, bed) |
+| `The capital of France is` | Factual recall (Paris) |
+| `1 + 1 =` | Basic arithmetic |
+| `Once upon a time` | Creative story continuation |
+### For Exploring Attention Patterns
+| Prompt | What It Shows |
+|--------|--------------|
+| `The cat sat on the mat because it was` | Pronoun resolution: does "it" attend to "cat" or "mat"? |
+| `Alice gave the book to Bob because she` | Gendered pronoun resolution |
+| `The dogs in the park were` | Subject-verb agreement across a prepositional phrase |
+| `I went to the store and bought` | Sequential event prediction |
+### For Ablation Experiments
+| Prompt | Why It's Good for Ablation |
+|--------|---------------------------|
+| `The cat sat on the` | Simple enough that ablating one head can change the prediction |
+| `The president of the` | Factual prompts show clear ablation effects on knowledge retrieval |
+| `She picked up the phone and` | Action continuation is sensitive to Previous-Token head ablation |
+### For Attribution Experiments
+| Prompt | What Attribution Reveals |
+|--------|------------------------|
+| `The capital of France is` | "France" should have highest attribution for "Paris" |
+| `The doctor told the nurse that she` | Which noun drives the pronoun prediction? |
+| `The large red ball rolled down the` | Do adjectives or nouns matter more? |
+## Suggested Experiment Order
+If you're new to the dashboard, follow this path:
+1. **Experiment: Your First Analysis** -- Learn the basics with GPT-2 and a simple prompt
+2. **Experiment: Exploring Attention Patterns** -- Understand what attention heads do
+3. **Experiment: Your First Ablation** -- Remove a head and see what happens
+4. **Experiment: Token Attribution** -- See which input tokens drive predictions
+5. **Experiment: Comparing Heads** -- Systematically compare head categories
+6. **Experiment: Beam Search** -- Explore alternative generation paths
+## After the Basics
+Once you've completed the guided experiments:
+- **Compare models**: Run the same prompt on GPT-2 and Qwen2.5-0.5B to see architectural differences
+- **Try longer prompts**: See how attention patterns change with more context
+- **Combine techniques**: Use attribution to find important tokens, then ablate heads to find the components that process those tokens
+- **Explore edge cases**: Try prompts in other languages, code snippets, or mathematical expressions

rag_docs/tokenization_explained.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# Tokenization Explained
+## What Is Tokenization?
+Tokenization is the very first step in how a language model processes your text. Models cannot read raw text -- they need it broken into small, numbered pieces called **tokens**. Tokenization converts your input string into a sequence of these tokens.
+## Why Not Just Use Words?
+You might wonder why we don't just split text by spaces. The problem is that there are too many possible words (including misspellings, rare terms, and words in other languages). Instead, modern models use **subword tokenization**, which breaks text into smaller, reusable pieces.
+For example, the word "unhappiness" might become three tokens: "un", "happiness", or "un", "happ", "iness" -- depending on the specific tokenizer.
+## How It Works: BPE
+Most models (including GPT-2 and LLaMA) use a method called **Byte-Pair Encoding (BPE)**. Here's the intuition:
+1. Start with individual characters as your vocabulary
+2. Find the most common pair of adjacent characters in the training data (e.g., "t" + "h" = "th")
+3. Merge that pair into a new token
+4. Repeat thousands of times
+This builds a vocabulary of common subwords. Frequent words like "the" become single tokens, while rare words get split into pieces.
+## Token IDs
+Each token has a unique **ID** -- a number that the model uses internally. For example, in GPT-2's vocabulary, the token "the" might have ID 262, while "cat" might have ID 9246. The model never sees the text itself; it only works with these IDs.
+## What You See in the Dashboard
+In **Stage 1 (Tokenization)** of the pipeline, you can see exactly how your input text was split:
+- Each row shows a **token** (the text piece) and its **ID** (the number)
+- The summary shows the total number of tokens
+- Notice how spaces are often attached to the following word (e.g., " cat" with a leading space is one token)
+This stage helps you understand that the model's "unit of thought" is the token, not the word.

rag_docs/transformer_architecture.md ADDED Viewed

	@@ -0,0 +1,39 @@

+# The Transformer Architecture
+## What Is a Transformer?
+A **Transformer** is the specific type of neural network architecture used by modern LLMs like GPT-2, LLaMA, and others. It was introduced in the 2017 paper "Attention Is All You Need" and quickly became the dominant approach for language tasks.
+## The Key Innovation: Attention
+Before Transformers, language models processed words one at a time, left to right. Transformers changed this by introducing the **attention mechanism**, which lets the model look at all words in the input simultaneously and figure out which ones are relevant to each other.
+For example, in "The cat sat on the mat because it was tired," attention helps the model connect "it" back to "the cat."
+## How Layers Stack
+A Transformer is built from identical **layers** stacked on top of each other. Each layer has two main parts:
+1. **Attention**: Looks at relationships between all tokens
+2. **MLP (Feed-Forward Network)**: Processes each token's information individually, retrieving stored knowledge
+A small model like GPT-2 has 12 layers. Larger models may have 32, 64, or more.
+Information flows through these layers sequentially. After each layer, the model's understanding of the text becomes more refined. Early layers tend to capture basic patterns (like grammar), while later layers capture more complex meanings.
+## Encoder vs. Decoder
+The original Transformer had two halves:
+- **Encoder**: Reads and understands the full input (used in models like BERT)
+- **Decoder**: Generates text one token at a time (used in GPT-style models)
+Most LLMs you'll encounter in this dashboard are **decoder-only** models. This means they generate text left-to-right, predicting one token at a time based on everything that came before it. Each token can only "see" the tokens to its left -- it cannot look ahead.
+## The Residual Stream
+There is an important concept called the **residual stream** (or "residual connection"). Think of it as a conveyor belt running through all the layers. Each layer reads from this stream, does some processing, and adds its result back. This means information from early layers is preserved and can be used by later layers.
+## How This Connects to the Dashboard
+The dashboard's 5-stage pipeline follows the exact path data takes through a Transformer: Tokenization, Embedding, Attention, MLP, and Output. When you expand each stage, you're seeing what happens at that point in the architecture.

rag_docs/troubleshooting_and_faq.md ADDED Viewed

	@@ -0,0 +1,73 @@

+# Troubleshooting and FAQ
+## Common Issues
+### Model takes a long time to load
+**Why**: The first time you load a model, it must be downloaded from HuggingFace. GPT-2 (124M) is about 500MB; larger models are much bigger.
+**Fix**: Be patient on the first load. Subsequent loads should be faster because the model is cached locally. If loading is consistently slow, try a smaller model.
+### Ablating a head has no effect
+**Why**: Not every head is important for every input. Many attention heads are redundant -- the model has learned to distribute work across multiple heads, so removing one doesn't always change the output.
+**Fix**: This is actually an interesting finding! Try:
+- Ablating a head from a different category (Previous-Token heads often show more effect)
+- Using a different prompt (some prompts depend more on specific heads)
+- Ablating multiple heads simultaneously to see if their combined removal has an effect
+### Attribution takes too long
+**Why**: Integrated Gradients is computationally expensive because it runs the model multiple times (typically 50 steps) to build up the attribution scores.
+**Fix**: Switch to "Simple Gradient" for faster (though less accurate) results. Or use a shorter prompt -- fewer tokens means faster computation.
+### The model's prediction seems wrong or nonsensical
+**Why**: Small models like GPT-2 (124M) have limited knowledge and can produce incorrect facts, repetitive text, or non-sequiturs. The model was trained on data from before 2019 and has a limited understanding of the world.
+**Fix**: This is expected behavior for small models. The dashboard is designed for exploring *how* the model works, not for getting useful outputs. Try different prompts or a different model.
+### BertViz visualization is hard to read
+**Why**: With 12+ heads selected simultaneously, the attention lines overlap and become a dense mess.
+**Fix**: Double-click on a single head in the BertViz visualization to isolate it. Then explore heads one at a time. Use the head categories to guide which heads to investigate.
+### The dashboard becomes slow or unresponsive
+**Why**: Larger models require more memory and computation. Running multiple experiments without refreshing can also accumulate memory usage.
+**Fix**: Try a smaller model. Refresh the browser page if things get sluggish. Close other memory-intensive applications.
+## Frequently Asked Questions
+### Which model should I start with?
+**GPT-2 (124M)** is the best starting model. It's small, fast, well-studied, and has clean attention patterns that are easy to understand. Move to Qwen2.5-0.5B once you're comfortable for a comparison.
+### What prompts work best for learning?
+Start with short, simple prompts (5-10 words) that have clear, predictable continuations:
+- "The cat sat on the" (predict a location)
+- "The capital of France is" (predict a fact)
+- "Once upon a time there was a" (predict a story element)
+These give clear, interpretable results in the pipeline and experiments.
+### Can I use my own model?
+Yes! Type any HuggingFace model ID into the model dropdown. The dashboard supports GPT-2, LLaMA, OPT, GPT-NeoX, BLOOM, Falcon, and MPT architectures. Unknown architectures may need manual configuration in the sidebar.
+### What's the difference between the pipeline and the investigation panel?
+The **pipeline** (5 stages) shows what happens during the model's forward pass -- how your input is processed step by step. The **investigation panel** (ablation + attribution) lets you run experiments to understand *why* the model made a specific prediction.
+### How do head categories get determined?
+The dashboard automatically analyzes each attention head's pattern using heuristic rules (based on thresholds for attention distributions). For example, a head is classified as "Previous-Token" if more than 40% of each token's attention goes to the immediately preceding token. These categories are computed fresh for each analysis.
+### Can I save my results?
+Currently, results are displayed in the browser and aren't saved between sessions. You can take screenshots or copy text from the chatbot (using the copy button on messages) to record your findings.

rag_docs/what_is_an_llm.md ADDED Viewed

	@@ -0,0 +1,36 @@

+# What Is a Large Language Model (LLM)?
+## The Big Idea
+A Large Language Model is a computer program that has learned to read and write text by studying enormous amounts of human writing. Think of it like an incredibly well-read assistant that has absorbed millions of books, articles, and websites, and can now predict what word comes next in a sentence.
+## How It Works (Simply)
+At its core, an LLM does one thing: **next-token prediction**. Given some text like "The cat sat on the", the model predicts the most likely next piece of text (called a "token") -- perhaps "mat" or "floor."
+This might sound simple, but to do it well, the model has to understand grammar, facts, context, and even some reasoning. All of that understanding is encoded in the model's **parameters** -- billions of numbers that were learned during training.
+## What Is a Neural Network?
+An LLM is built on a type of computer program called a **neural network**. A neural network is loosely inspired by the brain: it's made of layers of simple processing units that pass information forward, transforming it step by step. Each layer takes input numbers, multiplies and adds them, and passes the result to the next layer.
+When you stack many layers together and train them on lots of data, the network learns complex patterns -- like how words relate to each other.
+## What Makes It "Large"?
+The "large" in LLM refers to two things:
+- **Many parameters**: Modern LLMs have billions of learnable numbers (GPT-2 has 124 million; larger models have tens or hundreds of billions).
+- **Massive training data**: They train on huge text datasets -- sometimes trillions of words from the internet, books, and code.
+## How Does This Connect to the Dashboard?
+The Transformer Explanation Dashboard lets you look inside an LLM as it makes a prediction. When you enter a prompt and click "Analyze," you can see:
+- How the model breaks your text into tokens (Stage 1)
+- How those tokens become number vectors (Stage 2)
+- How the model figures out which words relate to each other (Stage 3: Attention)
+- How knowledge is retrieved from the model's memory (Stage 4: MLP)
+- What the model predicts as the next token (Stage 5: Output)
+This step-by-step view helps you understand what happens inside the "black box" of an LLM.

todo.md CHANGED Viewed

@@ -182,3 +182,14 @@
   - 1536 dimensions, high quality
 - [x] Remove local `sentence-transformers` dependency (simpler, no TF conflicts)
 - [x] Estimated cost: ~$1.50/month for moderate usage

   - 1536 dimensions, high quality
 - [x] Remove local `sentence-transformers` dependency (simpler, no TF conflicts)
 - [x] Estimated cost: ~$1.50/month for moderate usage
+## Completed: Enhance RAG Documents for Chatbot
+- [x] Category 1: 8 general LLM/Transformer knowledge files (what_is_an_llm.md through key_terminology.md)
+- [x] Category 2: 7 dashboard component documentation files (dashboard_overview.md through model_selector_guide.md)
+- [x] Category 3: 3 model-specific documentation files (gpt2_overview.md, llama_overview.md, opt_overview.md)
+- [x] Category 4: 6 step-by-step guided experiment files (experiment_first_analysis.md through experiment_beam_search.md)
+- [x] Category 5: 6 interpretation/troubleshooting/research files (interpreting_*.md, troubleshooting_and_faq.md, recommended_starting_points.md, mechanistic_interpretability_intro.md)
+- [x] Delete embeddings_cache.json, update rag_docs/README.md with full inventory
+- [x] Update todo.md and conductor docs
+- Total: 30 RAG documents covering transformer concepts, dashboard usage, guided experiments, interpretation, troubleshooting, and research context