Spaces:

cdpearlman
/

LLMVis

Running

what_is_an_llm.md - Neural networks, language models, next-token prediction
transformer_architecture.md - Layers, encoder/decoder, residual stream
tokenization_explained.md - Subword tokenization, BPE, token IDs
embeddings_explained.md - Lookup tables, vector spaces, positional encodings
attention_mechanism.md - Q/K/V, multi-head attention, intuitive explanations
mlp_layers_explained.md - Feed-forward networks, knowledge storage, expand-compress
output_and_prediction.md - Logits, softmax, temperature, greedy vs. sampling
key_terminology.md - Extended glossary of ML/transformer terms

dashboard_overview.md - Layout tour, navigation, typical workflow
pipeline_stages.md - What each of the 5 pipeline stages shows
ablation_panel_guide.md - How to use the ablation experiment panel
attribution_panel_guide.md - How to use the token attribution panel
beam_search_and_generation.md - Beam search, generation controls
head_categories_explained.md - Previous-Token, Positional, BoW, Syntactic, Other
model_selector_guide.md - Choosing models, auto-detection, generation settings

gpt2_overview.md - GPT-2 architecture, why it's a good starter, variants
gpt_neo_overview.md - GPT-Neo architecture, local attention, comparison with GPT-2
pythia_overview.md - Pythia architecture, RoPE, parallel attn+MLP, interpretability focus
opt_overview.md - OPT architecture, ReLU activation, comparison with GPT-2
qwen_overview.md - Qwen2.5 (LLaMA-like) architecture, RMSNorm, SiLU, GQA

experiment_first_analysis.md - Your first analysis with GPT-2
experiment_exploring_attention.md - Reading attention patterns and head categories
experiment_first_ablation.md - Removing a head and observing the effect
experiment_token_attribution.md - Measuring token influence with gradients
experiment_comparing_heads.md - Systematic comparison across head categories
experiment_beam_search.md - Exploring alternative generation paths

Notes